Why do LCIDs skip around so much?

by Michael S. Kaplan, published on 2005/03/02 01:35 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/03/02/383458.aspx

I have people ask me with an alarming frequency why LCIDs jump around so that for example en-US is 0x0409 and en-UK is 0x0809, and so forth. Why wouldn't it be 0x409, 0x0509, 00609, et.?

To answer the question, let's look at the diagram of LANGID contents, found in winnt.h:

//  A language ID is a 16 bit value which is the combination of a
//  primary language ID and a secondary language ID.  The bits are
//  allocated as follows:
//       +-----------------------+-------------------------+
//       |     Sublanguage ID    |   Primary Language ID   |
//       +-----------------------+-------------------------+
//        15                   10 9                       0   bit
//  Language ID creation/extraction macros:
//    MAKELANGID    - construct language id from a primary language id and
//                    a sublanguage id.
//    PRIMARYLANGID - extract primary language id from a language id.
//    SUBLANGID     - extract sublanguage id from a language id.

If you look at the makeup, the primary language is 10 bits, and the sublanguage id is 6. Therefore, the "count by 0x0400" is there because that is what happens when you start in the eleventh bit (210).

Now there is another important consequence of this, as more and more LCIDs are added. Soon we we will run out of eight-bit LCIDs, and suddenly there will be LCID values of 0x0500, 0x0501, 0x0502, and so on. At that point, all of the people who do not use the macros to parse through LCIDs (preferring to chop these 16-bit values into two equal 8-bit pieces) will assume that 0x0501 is a locale based on LANG_ARABIC. Yikes!

Unfortunately, I have seen a bunch of code over the years that does this.

If you look back to the post I did about Lions and tigers and bearsELKs, Oh my! then you'll see that at the rate we are adding them, it will not take too much longer to hit this issue. If you have written or are the owner of such code, you have been warned. Fix your code today!

Now some people actually like to look at these numbers in decimal form (Raymond Chen talked about this last year when he asked What are these directories called 0409 and 1033?). The split seems to be:

HEXIDECIMAL: Windows, Windows CE


Since I have actually done work at one time or another for all of these groups, I got pretty good at knowing both of them (and doing fast conversions for unfamilar ones). The hexidecimal LCIDs feel more natural to me, for what it's worth. Though some may think this is due to a bias now that my main home is Windows, for me it is just easier to parse through what the LCIDs are this way....

I'll talk more about LCIDs another time. There are a fascinating number of oddities with them....


This post brought to you by "Λ" (U+039b, GREEK CAPITAL LETTER LAMDA)

# Jonathan on 2 Mar 2005 6:16 AM:

Hex vs. Decimal - many people look at the decimal, completely missing the point of bitfields. I once had to explain someone about MSI Custom Action Attributes, where the doc basically says:
256 - A
512 - B
768 - AB
1024 - C
And he seriously asked me "but you can get 1024 by adding 768+256, so is it AB+A or C?"


# Michael Kaplan on 2 Mar 2005 9:00 AM:

Excellent point. And now that I think about it, that is the one thing that is harder to do in decimal than in hex -- understanding bitfields....

referenced by

2010/09/30 Office skipped version "13" (makes you wonder what they will do in 2013!)

2007/07/04 I think they might kind of get it now

2006/08/21 Decimal vs. hexadecimal LCIDs, backcompat, and being weird

2006/08/16 Walking off the end of the eighth bit

2005/05/17 The weird, weird world of the SUBLANGID

go to newer or older post, or back to index or month or day