by Michael S. Kaplan, published on 2006/02/20 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/02/20/535260.aspx
The other day, Shou-Ching Schilling (LAM) asked me via email:
Hi, you have helped me with many keyboards questions before, so I thought you might know the answer to this one or know who else to contact.
I am doing some testing on Arabic keyboards. Sometimes the layout for ( ) and < > (or anything that have left and right version) are in the same order as in English in the label and sometimes they are not. Are there some rules or a different set of resources I can refer to?
It all starts because there is the idea of mirroring explained in UAX #9 (The Bidrectional Algorithm). Section 6, entitled Mirroring, goes as follows:
The mirrored property is important to ensure that the correct character codes are used for the desired semantic. This is of particular importance where the name of a character does not indicate the intended semantic, such as with U+0028 "(" LEFT PARENTHESIS. While the name indicates that it is a left parenthesis, the character really expresses an open parenthesis — the leading character in a parenthetical phrase, not the trailing one.
Note that in some contexts, some of the characters that have the mirrored property are sometimes not rendered with mirrored glyphs. A higher level protocol can limit mirroring action (rule L4) to a subset of those with the mirroring property. See also Section 4.3 Higher-Level Protocols. Except in such cases, mirroring must be done by an application of rule L4, to ensure that the correct character code is used to express the intended semantic of the character.
Implementing rule L4 calls for mirrored glyphs. These glyphs may not be exact graphical mirror images. For example, clearly an italic parenthesis is not an exact mirror image of another: "(" vs ")". Instead, mirror glyphs are those acceptable as mirrors within the normal parameters of the font in which they are represented.
In implementation, sometimes pairs of characters are acceptable mirrors for one another: for example, U+0028 "(" LEFT PARENTHESIS and U+0029 ")" RIGHT PARENTHESIS or U+22E0 "⋠" DOES NOT PRECEDE OR EQUAL and U+22E1 "⋡" DOES NOT SUCCEED OR EQUAL. Other characters such as U+2231 "∱" CLOCKWISE INTEGRAL do not have corresponding characters that can be used for acceptable mirrors. The informative Bidi Mirroring data file [Data], lists the paired characters with acceptable mirror glyphs. A comment in the file indicates where the pairs are "best fit": they should be acceptable in rendering, although ideally the mirrored glyphs may have somewhat different shapes.
Ok, so basically what it means is the glyphs will be expected to "flip" under some circumstances.
And they even give you a BidiMirroring.txt file in the Uncode Character Database that gives you a simple data file you can use for the mirrorings.
Seems easy enough, right?
Well, if you just said 'yes' then you probably have not thought too much about the consequences of characters that will simply flip depending on the context of what is around them.
As one learns growing up, peer pressure of that sort is seldom easy.
But okay, I'll take you at your word.
Now let's add keyboards to the mix.
(many of the examples I give below use the Hebrew keyboard, but the same basic issues come up with the Arabic, Persian (a.k.a. Farsi), and Urdu keyboards, ignoring the multilingual nature of the Hebrew layout with its uppercase English)
If you up to the Windows Keyboard Layouts site to look at the Hebrew keyboard, you will get a small dynamic layout that can be used to display the following five available "shift" states:
Note first how the square brackets [ and ] (U+005b and U+005d) actually seem to flip on the keyboard layout depending on whether you are in the English or the Hebrew "mode" of the layout?
Well, remember that what is displayed will entirely depend on the context of what is around it, and then try to type a word like שלום surrounded by parentheses.
Incidentally, that is the word for hello, goodbye, and peace in Hebrew. By the time you play with this for a bit you will not know what is coming or going on this keyboard and you will want to be left alone in peace. Which makes it a great word for our current purposes.
If you are using a US keyboard you would type A K U O to get the word (just to save some of the experimentation). So now, armed with all of this knowledge, try to type the following in Notepad:
שלום (שלום) שלום
Then for giggles, flip the reading order and see what it looks like. Then start over with the other reading order and try to type it again.
And this is an easy word since get a completely Hebrew context around the parentheses when you are done. Imagine what would happen if you had to type something that ended with a parenthesis....
It is amazing how little what is painted on the faces of a Hebrew keyboard layout has to do with what appears to be typed. While this may be something that a native typist in a languge can understand, it is clearly learned behavior as there is no way on earth to consider any of this to be intuitive.
Especially considering the fact that the parentheses ( and ) (U+0028 and U+0029) are only on the "English" shift states of this keyboard so you don't even get the behavior approaching intuitive that some might argue the brackets have.
Imagine if that Ultimate Keyboard were more than fictional -- if it is hard typing certain types of punctuation with a static keyboard, imagine how much harder it would be to handle one based on a constantly changing one -- especially at the end of text. I hope that it would stay stable, for that reason....
This of course indirectly answers some related non-fictional questions around the OSK (On-Screen Keyboard) and the Tablet PC Soft Keyboard. You probably would not want to try to make them change whether the mirrored or unmirrored glyph should appear, based on what would be about to be typed.
Perhaps you disagree. Hey, no worries, people often disagree with me.
So how do you determine which glyph to show, since you are so sure I am mistaken? :-)
You will find that you are basically re-implementing the Unicode Bidirectional Algorithm so you can tell what level you are currently at. For any text, any time a cursor is inserted somewhere, any time you are using a Hebrew, Ararbic, Persian, or Urdu keyboard.
Oh wait -- what about when you are typing parentheses from the French keyboard in the middle of Arabic text? I guess you had better make this happen at all times, for all keyboard layouts.
How comfortable would you be trying to create such a system?
And then ask yourself how comfortable people would be trying to type in such a situation, again?
I am going to go lie down for a bit, my head is hurting and I need some שלום.
This post brought to you by "﴾" (U+fd3e, a.k.a. ORNATE LEFT PARENTHESIS)
(As you may have guessed, the ornate parentheses are nor mirrored. If you ever meet me in person feel free to ask me why!)
# Nick Lamb on 20 Feb 2006 10:39 AM:
# Michael S. Kaplan on 20 Feb 2006 10:49 AM:
# Michael S. Kaplan on 20 Feb 2006 12:21 PM:
# Michael Dunn_ on 20 Feb 2006 3:05 PM:
# Michael S. Kaplan on 20 Feb 2006 4:05 PM:
# Michael S. Kaplan on 20 Feb 2006 4:12 PM:
My new virtual hebrew keyboard on 8 Aug 2010 6:23 AM:
I`ve created a new mikledet ( hebrew keyboard ) at http://hebrew-keyboard.com
Nice one with multiple skins...
Michael S. Kaplan on 8 Aug 2010 7:32 AM:
Are you saying it solves this problem? If so, how does it do it?
2008/08/25 The Bidi Algorithm's own SEP Field
go to newer or older post, or back to index or month or day