LOCALE_SABBREVLANGNAME is so not an ISO-639 code

by Michael S. Kaplan, published on 2005/02/17 04:55 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/02/17/375235.aspx


Sometimes it does not matter how often you say something; people will simply not hear what you are saying, process it, or learn a darn thing from it.

One such time is when they are dead. Clearly if someone has passed on, they are not going to be of much use when it comes to learning new things. The appearance schedule of most people slows down after they die. Though I guess if you read the tabloids you might find Elvis has a busy schedule these days....

Another such time is when they are asleep. It is rude to shake someone and then ask them if they are awake, for obvious reasons. And there are some times that may even say more when you think they are not awake enough to hear...

A third such time, and the main subject of this post, is when you explain that the LOCALE_SABBREVLANGNAME used by GetLocaleInfo is not intended to be an ISO standard.

It is amazing how often people have trouble with such a simple idea.

You might think the fact that so many codes do not match would be a telling factor here. But these people just assume that is due to a bug and that Microsoft just got the code wrong.

You might also assume that the Locale Information topic in MSDN and the Platform SDK might also be a clue (emphasis in RED is mine):

LOCALE_SABBREVLANGNAME

Abbreviated name of the language. In most cases it is created by taking the two-letter language abbreviation from the ISO Standard 639 and adding a third letter, as appropriate, to indicate the sublanguage.

but again that does not these convince people (since they never read the documentation anyway).

Or you perhaps may think that since the LOCALE_SABBREVLANGNAME have been around for at least five years before the three-letter ISO 639 codes described in ISO 639-2 were published that it is obvious that they would not be the same. Microsoft is not psychic, after all! But most people have no sense of history.

And (probably frustrated at this point and getting a glimpse of how sometimes our entire day feels!) you may note that "sublanguage" in LCID terms is really talking about a SUBLANGID which is actually the "regional" part of the LANGID. And if there is one thing that ISO-639 codes do not handle that the LOCALE_SABBREVLANGNAME does, it is regional differences. But while you are frantically drawing out how ESP, ESM, ESN, ESG, ESC, ESA, ESD, ESV, ESO, ESR, ESS, ESF, ESL, ESY, ESZ, ESB, ESE, ESH, ESI, and ESU are all ways of looking at different locales with a PRIMARYLANGID of LANG_SPANISH, most of these people just start nodding off.

But if they are still awake at this point, they may be feeling a little foolish for not recognizing this apparently fundamental truth about LOCALE_SABBREVLANGNAME. So they start trying to pick holes. They realize that it was kind of our fault (not theirs) since it was really the name that threw them off -- this tag is another way of looking at a LANGID, not at just a language.

"Nolo contendere," I have to say1.

And then, realizing that they might be on a roll here, they yank out that Platform SDK page again:

Abbreviated name of the language. In most cases it is created by taking the two-letter language abbreviation from the ISO Standard 639 and adding a third letter, as appropriate, to indicate the sublanguage.

"What's with that 'in most cases' stuff?", they wonder aloud.

Well, the following locales that are different then those rules as of Windows Server 2003's release:

LCID       SABBREVLANGNAME  SISO639LANGNAME   SENGLANGUAGE          SENGREGION                 
0x0404           CHT             zh            Chinese               Taiwan
0x0411           JPN             ja            Japanese              Japan
0x0414           NOR             nb            Norwegian (Bokmål)    Norway
0x0421           IND             id            Indonesian            Indonesia
0x044b           KAN             kn            Kannada               India
0x044e           MAR             mr            Marathi               India
0x0450           MON             mn            Mongolian             Mongolia
0x0457           KNK             kok           Konkani               India
0x045a           SYR             syr           Syriac                Syria
0x0465           DIV             div           Divehi                Maldives
0x0804           CHS             zh            Chinese               People's Republic of China
0x0814           NON             nn            Norwegian (Nynorsk)   Norway

Often this has to do with the Language Bar (the UI that shows the available selected keyboards). This thing takes the LOCALE_SABBREVLANGNAME and chops off that extra third letter and puts the first two letters in a nice blue square.

Sometimes, if the remaining two letters always came from ISO 639 then there would be no way to tell apart Korean (ko) from Konkani (kok) once you chopped off that extra letter (Konkani has no two-letter ISO code).

And sometimes there are no two-letter codes so we have to use the three letter codes. But they do mostly match, at least (so far!).... :-)

Other times it may have seemed more important for Norwegian to have one consistent two-letter keyboard code, whether the user has selected Bokmål or Nyorsk -- without forcing people to take one or the other.

And still other times it can just be the fact that when looking for the best three-letter abbreviation that is sometimes chopped into a two-letter abbreviation, the answer comes up a little different. Since it is already promising to not match ISO-639-1 100% of the time, why not be asthetically pleasing? :-)

But the kicker is of course that the ISO name is still available. Whether you use the GetLocaleInfo and the LOCALE_SISO639LANGNAME or the .NET CultureInfo class's TwoLetterISOLanguageName or ThreeLetterISOLanguageName, you can get that ISO name if you need it. And you can also get that Windows name, too (if you wanted to get that keyboard code string, maybe!).


1 - Being up to year fifteen of Law & Order has really helped my Latin skills!

This post brought to you by "Σ" (U+03a3, a.k.a. GREEK CAPITAL LETTER SIGMA)


# Michael Kaplan on 19 Feb 2005 11:43 PM:

I talked to Cathy on Friday and she was surprised that no one had commented about the SABBREVLANGNAME thing, especially considering how often she gets questions about it.

I told her that those who might have responded were too embarrassed to admit that they were one of the people having trouble with the concept.

I actually think it was just a yawner of a topic -- important to post so we can all point to it the next time someone asks, but not the sort of thing that is useful otherwise. :-)

referenced by

2013/05/02 Out of the ashes of the old...

2013/04/16 Because, unlike Andrew West, *I* have a sense of history... 😄

2010/11/08 Header files are the wrong place to be less than helpful

2010/04/26 Look out Maharashtra, the Marathi LIP is now available!

2010/02/26 There is no "I" in "Uyghur". Oh. Um. Well, except in the Windows Language Bar....

2008/01/28 Behold the Table Driven Text Service, Part 5 (All about the language, baby!)

2006/11/02 MFC + Spanish version of Windows = Oops!

2006/09/27 LOCALE_SABBREVLANGNAME is more than just an ISO-639 code

2006/03/19 How many ways can you identify a language?

go to newer or older post, or back to index or month or day