Advertisement

Foriegn Country Unicode, Interesting.

Started by June 26, 2018 04:21 PM
16 comments, last by L. Spiro 6 years, 2 months ago

While working with some Graphical User Interface, there is an example of Unicode.  The text for my control is appearing to be Asian in nature.  It is interesting to ask how is Unicode use in foreign countries?  Are there actual pictograms, etc. on the keyboard?  Maybe they're using some heavier duty software?  My guess is an English conversion/completion to displayed foreign possibilities.

 

Josheir  

As far as inputting Asian characters (there are more than could ever fit on a keyboard of reasonable size), there are software IMEs (Input Method Editors):  https://en.wikipedia.org/wiki/Input_method

They work sort of the same as autocomplete in a programming language editor.

Advertisement

When using an IME the user enters the characters such as "wei" and the IME intercepts them replacing them with the glyph. When there are multiple glyphs possible the IME prompts the user to switch to different glyphs. 

Google Translate has a fairly good implementation. Open their site, select a language like Japanese or Chinese, use the dropdown to select how you want to enter the text, then type away.  Even if you don't speak the language, type some words you know and you'll quickly run across glyphs represented by letter combinations to see it in action. The words you type may be nonsense but you can watch how an IME functions.

image.png.8670f271dde7980a8466a0790220764d.png

Most versions of Windows also have IMEs built in.  You can install other languages and if they have one, their IME will be available.  Many languages also have downloadable handwriting recognition support - mostly useful if you have a stylus, but you can also scribble with a mouse.

Since you haven't specify what 'Asian' language you are interested in, let me add my mother language :).

For Thai, there is no IME. The user type what they see on the keyboard, and it appears on the screen directly just like Latin characters. One caveat is, each key has two character on it, one with Shift and one without. In Latin, most key are 2 character which one of them is the capitalize form of another (a/A). In Thai, those 2 characters have no relationship what so ever (eg. ด/โ share the same key). 

And ... Thai keyboard layout does not contains Arabic number keys. We need separate numeric keys for that matter, or switch back to English just for keying in numbers. The reason is we have our own number charters (๑๒๓๓๔๕๖๗๘๙๐)

Also there is no conversion. What you type in is what you get, just like typewriter. For example, ดัน is functionally ด + ะ + น, but it is written as ด + ั + น. If you type in ดะน, it won't get converted to ดัน. This is also how the string is kept as well.

Although there're some kind of glyph substitution going on, it is considered as display functionality (eg. OpenType Shaping). 

To summarize, input works like English/Latin layout. String is kept as is. If you're working on some kind of software keyboard or IME. be sure not to do auto capitalize (as it will be very annoying for us Thai). 

PS. Input is easy for us, display is much more interesting. Most game engine in the market does not display Thai properly (including 10+ engine/framework I've checked ... Unity and Unreal included).

http://9tawan.net/en/

The era is great... :) 

Josheir

 

Advertisement

You can see Unicode combining characters that mr_tawan talked about in the Unicode character tables as glyphs with a dotted circle in the center of it.  For example:  http://www.fileformat.info/info/unicode/char/0e31/index.htm

Like he mentioned, the actual string in RAM includes these combining characters immediately after the character they modify and it's up to the software to support them when rendering them to the screen/printer/etc.

Side trivia:  Some users online have exploited font rendering engines' combining character support to make usernames with unorthodox combining marks that look completely corrupted or even begin rendering outside of the area they're supposed to.

Since I lived and worked in Thailand, France, Japan, the UK, and the USA…

If the native alphabet is small enough then they try to use pictograms directly on the keyboards.  This was my keyboard in Thailand:
KB_50079UB_Thai_zoom.jpg

 

In “weird” places (France and friends) they may swap letters around.  This was my keyboard in France (notice the Q etc.)
49864_1_government-france-planning-stand

UK’ians are also weird.
I used to use this in the UK but I had to get it replaced because of that damned short-as-hell Shift key:
614fgvHOT5L._SL1193_.jpg

Notice that other symbols are moved around as well, not just letters.

 

All of these work like normal keyboards, except you press different buttons to get the same result. Thai represents 2 special cases though: A larger alphabet (44 consonants and 15 vowel symbols) and 2 alphabets (English and Thai).

All keyboards with roots not in Roman maintain the standard English alphabet and American layout along with the native alphabet.  You switch modes to type in one or the other, usually via Shift-Tilde.

 

In Japan, this was my keyboard:
81-dsL4za4L._SL1500_.jpg

Hiragana is written next to the English characters, which implicitly align with Katakana characters since they are 1-for-1 (Katakana characters are just a different way to draw Hiragana characters).
Surrounding the space bar are keys to select input methods and alphabets.

For the most part, you completely ignore the Hiragana characters.  You can enter a special mode to type them directly just as with Thai, but that means relearning to type so no one does this.  Instead it basically boils down to typing in English directly, or typing in Japanese phonetically and letting that get turned into Japanese based on which alphabet you have active.

If I type “ku”, if I am in Hiragana it becomes く, if I am in Katakana it becomes ク.
If I then hit the space bar I get options for KU.
Untitled-3.png.db6eb1a2b56e4629dc89c84a6b5f4ed3.png

Now I can select which Kanji I want or select the Hiragana form or (a little lower) the Katakana form, etc.

The IME pop-up that you see there learns which Kanji you use most often and puts them at the top.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

1 hour ago, L. Spiro said:

This was my keyboard in France (notice the Q etc.)

Good ole AZERTY...

The letters aren't even as bad as the . key requiring shift, as do all the numbers.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Speaking of keyboards, the Italian layout has the backquote and tilde characters, among others, in the ALT-number keyboard attic: ideal for programmers.

Omae Wa Mou Shindeiru

This topic is closed to new replies.

Advertisement