by Michael S. Kaplan, published on 2012/02/09 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2012/02/09/10265712.aspx
Today's blog starts with an inspirational video (warning: fairly safe for work other than some language) to set the stage:
Now we laugh -- and especially fans and regular readers of this Blog will laugh -- at the idea of a video game meant to be so amazingly and intensely realistic that could get something as basic as the language wrong.
Keep in mind that both Bing and Google do with "Pakistan Language" -- for those fans of Bing out there!
Anyway, regular readers may remember how back in December I blogged The evolving Story of Locale Support, part 12: Logic dictates that we keep a sense of proportion about the RATIO, telling the story how in Windows 8 some work was being put into using RATIO (U+2236) rather than COLON (U+0039) in some of the time values, since the placement of the lower dot in the RATIO is much more like the way you would expect to see it in digital clocks and such.
Since the thing you always thought was a COLON kind of wasn't!
I had some people ask me offline whether the locale data would also be changing so that the LOCALE_STIMEFORMAT and LOCALE_SSHORTTIME and LOCALE_SDURATION would be changing for all our locales.
No way! is what I told them -- who would want to change all the code to accept is as a new "expected" LOCALE_STIME character or change every single time and duration format to have a bunch of literals in them, plus change the data either way?
Nasty.
But there is another good reason, one that you can find in the Unicode Character Database UnicodeData.txt info for the two characters:
003A;COLON;Po;0;CS;;;;;N;;;;;
2236;RATIO;Sm;0;ON;;;;;N;;;;;
These two Unicode characters have more differences than the slightly-better-put-together bottom on the RATIO.
One (the COLON) is GeneralCategory==Po (Punctuation, Other) and BidiClass==CS (Common Number Separator).
So saying 9:01 is saying "this is one number".
The other (RATIO) is GeneralCategory==Sm (Symbol, Math) and BidiClass==ON (Other Neutrals).
So saying 9∶01 is saying "this is two numbers with a non-specific neutral character between them".
So, try to imagine what the latter means for Bidi scenarios!
It is to my mind mildly ironic that perhaps the most appealing visual representation of something can so easily be considerered the most terrible technical way to get the job done. :-)
The scenarios are made "artifically" easier by the fact tat the "time" value should always be Left-to-Right, but even still it can be really involved to fiiigure it out!
Our "toolkit" today is made up of a few key Unicode control characters:
I'll enumerate the relevasnt cases here, using the one and only Notepad, looking at four string in each scenario:
The exact Unicode characters being used are shown below to make it easier for those who want to follow along at home.
Here we go! :-)
Scenario #1: Left to right reading order, no digit substitution:
All four strings look good.
Scenario #2: Right to Left reading order, no digit substitution:
Strings 1-3 look good, string 4 is backward.
Scenario #3: Left to right reading order, context digit substitution:
All of strings once again look good.
Scenario #4: Right to left reading order, context digit substitution:
Strings 1 and 2 look good, String 3 is half-substituted and String 4 looks substituted and backward.
Scenario #5: Left to right reading order, national (always) digit substitution:
All four strings look okay.
Scenario #6: right to left reading order, always (national) digit substitution:
Strings 1-3 all look substituted and good, but String 4 looks substituted yet backward.
Perhaps this explains yet another reason to not update the locale data from using the COLON as in previous versions.
Because when you the colon, all four strings look good in all six scenarios!
Armed with the knowledge that only the first two strings will work consistently in all cases, someone trying for better aesthetics can do quite well; someone not as well armed will sometimes do a good job and sometimes embarrass themselves.
Putting the user in that same situation, and I doubt they will all be enchanted by the aesthetics -- they'll get pretty mad at us for mucking with their locale's data!
Now of course there are other soutions involving LRO and other placements of such Unicode control characters, but you get the point. And now you know to keep a senxe of proportion about the RATIO, if you need to....
Simon Buchan on 9 Feb 2012 1:53 PM:
Would perhaps user a specialized "time font" where Windows displays time be more semantically preferable?
Oh man, I knew Call of Duty screwed up the Arabic (they use isolated glyphs for all the signs you see in game, and I think it's LTR as well - probably due to them rendering them in Flash with the crappy old TextField [the new TLF text is pretty damn good at Unicode]), but I didn't know they got the damn *language* wrong!
Michael S. Kaplan on 9 Feb 2012 6:48 PM:
If it had been up to me, I'd have just updated the font! But Si only knows that's just one of the reasons I'd never work out in his group....
referenced by