Where is the locale? "Its Invariant." In where?

by Michael S. Kaplan, published on 2004/12/08 02:39 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2004/12/08/278170.aspx


Old joke, updated for Windows XP:

Q -- Where is the locale?
A -- Its invariant.
Q -- Where is variant?
A -- Its ten miles south of communicado, and five miles east of Cognito.

The invariant locale is pretty weird. Lets take a look at its interesting chracteristics.

So why is it there?

Well, it all comes back to collation. Like everything else that is worthwhile in life. :-)

In collation for Windows, there is a default table that gives the ordering of every code point in Unicode. As I noted in in my article about how Microsoft does not use the UCA, the default table has been around for a long time, adding code points from version to version as more languages have become supported by Windows. And the thing about the default table is that it supports every language that can co-exist without conflicts -- like English and Greek and Arabic and German (not a complete list!). It is not that they have the same sort -- they don't. It is that they do not have anything in their sorts that conflict with the others on the list (because they either do not share characters at all or they do not sort any of the characters that are shared differently). They all have the non-conflicting rules for the following characters:

A

a

Å

o

Ö

Z

α

β

γ

ب

ح

د

Now you can contrast the way that Swedish would look at the same characters (differences are marked in red):

A

a

o

Z

Å

Ö

α

β

γ

ب

ح

د

So obviously, Swedish cannot be handled in the default table due to the fact that in Swedish "Å" and "Ö" are both considered to be separate letters that sort after "Z", rather than being treated as an "A" and an "O" with accessories (diacritics), like they are in English or like something that sorts after but near to "A" and "O" like in German. So it is not handled by the default table, like those other languages are.

(For fuller and cooler examples of this sort of thing, see Appendix D from the first edition of Developing International Software for Windows 95 and Windows NT. Though not attributed, Cathy Wissink did those tables and it was how I got to be "impressed in advanced" about her work at Microsoft. Even though I had no idea who she was, and would not find out for another half decade.)

Now often people would get into trouble trying to LOCALE_USER_DEFAULT or LOCALE_SYSTEM_DEFAULT for sorts that were not supposed to change. Either of those, however, would change any time a setting was changed by the user. And that would cause bugs in people's code. On the NLS team, we would recommend that people use MAKELCID(MAKELANGID(LANG_ENGLISH, SUBLANG_ENGLISH_US)), not because we were trying to be provinical (after all, German or Arabic or Russian or Greek or many other languages would have been fine), but to force an unchanging result on sorts that were not supposed to change on the whim of the user's settings.

Of course, people would often look at using 0x0409 (also known as US English) as Microsoft just being a provincial US corporation. So rather than fight that perception, since the only real goal anyway was to "use the default table" for sorting, a new locale was added. One that would not change, would not vary. It would be.... INVARIANT. And thus, LOCALE_INVARIANT was born in Windows XP, the 136th locale added to Windows.

Not really such a bad thing to do, since thats all that the folks on the NLS team were trying to do anyway, right?

The same thing exists in the .NET Framework, with its static member CultureInfo.InvariantCulture. It is just as weird for everything in its locale fields for dates and numbers and such. But it has consistent results for sorting that use the default table.


# Adrian Burton on 6 Jan 2005 4:24 PM:

What locale should one nominate when making a keyboard layout (in MSKLC) that contains elements from the Combining Diacritical Marks block?

Error message reads:

-----quote----
WARNING: The character ̣ (U+0323) exists in the entry for VK_OEM_PERIOD, ShiftState 'Ctrl' of the layout table and is not in the default system code page (1252) of the English (United States) language you specified. This may cause compatibility problems in non-Unicode applications.
--------end-------------

Is this a use for the Invariant Language?

# Michael Kaplan on 6 Jan 2005 7:19 PM:

Heh heh heh.... no, it isn't. But thats pretty funny, in any case! :-)

referenced by

2010/03/04 Having neither army nor navy, Invariant is apparently just a dialect

2007/05/12 The exception that proves the rule that was the exception that proves another rule (aka On the variability of the Invariant)

2006/01/24 Invariant is a really good choice (any time there is no better choice)

2005/12/29 What's a secondary distinction?

2005/11/30 Expectations around collation

2005/10/15 If you are using INVARIANT then you are probably MISusing it, #1

2005/09/12 How does Microsoft assign new collation weights?

2005/04/13 Invariant and Ordinal Redux

2005/04/03 TechEd Bloggers does not work for this site?

2004/12/29 Comparison confusion: INVARIANT vs. ORDINAL

go to newer or older post, or back to index or month or day