The pros and cons of context, aka CONTEXT is a four-letter-word

by Michael S. Kaplan, published on 2008/04/21 09:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/04/21/8413627.aspx


Content of Michael Kaplan's personal blog not approved by Microsoft (see disclaimer)!
Regular readers should keep in mind that all I said in The End? still applies; the allusion to the X-Files continues for people who understand such references....

It was just hours ago that I pointed out in Windows doesn't let you choose the pinch hitter in digit substitution cases that there are even more problems with digit substitution than I had mentioned before.

Let's review:

Could it get worse?

Of course it can!

Let's look at this exciting case in Vista....

First we'll take a machine with an Arabic user locale and make sure that it is set to context:

Now with context definition:

the shape depends on the previous text in the same output

the meaning seems pretty clear. In fact you can even see it from the sample -- in an LTR context we'll get the regular Arabic-Indic digits1, and in an RTL context we'll get Hindi digits2.

Looks like it only cares about the user locale setting here for the purposes of the sample, right?

Now file paths, which start with drive letters, tend to start off with a rather LTR context, thus it is no surprise that we see Arabic digits:

and also in the breadcrumb bar view:

And that is kind of expected.

But then when we switch to an Arabic user interface language, while the full path looks the same:

The breadcrumb bar view is another story:

What happened?

Suddenly my "of course there is no difference in English" claim is refuted somewhat -- it is the correct and expected result, but for the wrong reason.

Because the Breadcrumb bar is looking at each chunk3 as if the context has been reset, and thus a folder that does not start with an LTR character is given RTL context, even if the overall path looks at things differently.

Also, it is not the user locale that runs the "context" setting, it is the user interface language. Despite the fact that the setting itself is tied to the user locale.

And if you have any setting other than context, then it is just the user locale. The implication is therefore that the context setting can have two very different levels of requirements:

  1. The context underlying the whole string is expected to be an RTL one, and
  2. The context of the beginning of the string itself is also expected to be an RTL one.

Another fun experiment might be to look at whether that Arabic/Farsi limitation applies to the user interface language requirement the same way that it applies to the user locale setting. And on the assumption that Regional and Language Options is not synthesizing the user locale dependency for the sample and that USER controls are affected, it might be good to understand why Explorer and the Shell seem to be working under this additional level, and what rules it is working under.

I'll give away the answer in case you haven't spotted it -- if the control itself is an RTL one, then it gets the RTL rules. For whatever reason, Explorer in this case has rules for what kind of controls to create that are based on rules ties to the user interface language. It is unclear whether it uses the Uniscribe-like LANG_FARSI/LANG_ARABIC rules or the NLS rules, though. Hopefully it is the latter which would be more correct even if also more confusing and less consistent (since even former Uniscribe owners often consider the Arabic/Farsi limitations to be bugs!).

Ironically, no one has been fast to fix these problems, in large part due to the fact that the context substitution setting is not widely used in these other places where it is broken. So there is not a huge scenario to make something work better that most people won't want anyway? :-)

No matter what, the whole area is still incredibly confusing and not-very-well-documented!

 

1 - Although they are called by some Arabic numerals, they are not really used for Arabic (which uses different digits)
2 - Although they are called by some Hindi digits4, they are not used for Hindi5
3 - I guess these items could be called something else, I don't know the terminology; if the technology were called a breadloaf bar, I'd call them slices!
4 - Note that ٠١٢٣٤٥٦٧٨٩ are actually (according to their names) "Arabic-Indic digits" in Unicode, despite the fact that they are actually known to some as Hindi digits
5 - Hindi itself uses what some call Devanagari digits, which are technically Indic (though they are not to be confused with the Arabic-Indic digits, which are actually Hindi digits)

 

This blog brought to you by ۶ (U+06f6, aka EXTENDED ARABIC-INDIC DIGIT SIX, which is also not an Arabic-Indic digit!)


no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2010/11/12 Suddenly, in a bit more time than a blink of an eye, "standards support" becomes "less i18n support"

2008/10/02 When swimming in a sea of CONTEXT, applications can drown (and there is no lifeguard)

2008/06/10 When Unicode's "PDF" character isn't supported, you really *can* say that the app's Bidi support doesn't POP!

go to newer or older post, or back to index or month or day