LOCALE_SABBREVLANGNAME is more than just an ISO-639 code

by Michael S. Kaplan, published on 2006/09/27 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/09/27/773341.aspx

It was over a year ago that I posted about how LOCALE_SABBREVLANGNAME is so not an ISO-639 code.

But perhaps the title of this post covers the situation a little more accurately, or at least a little more clearly....

The rules are simple enough (though perhaps more complex then I laid out originally), so I will just lay them out, here and now. These are actually descriptive, nor prescriptive, which is to say that I am describing how a bunch of decisions ended up being made. I am not describing some mystical set of rules in a data handbook or anything. :-)

RULE #1: If you take every single one of these three letter codes, then each language within the full set of locales must have the first two letters of the code uniquely represent the language. Thus EN must be English, AR must be Arabic, and so on. This is true for every locale that uses one of these individual languages, so that the Language Bar can have a two letter code to use.

RULE #2: If a language has multiple locales (e.g. the aforementioned EN and AR), then usually the ISO-639 TWO LETTER CODE will be used for the first two letters of LOCALE_SABBREVLANGNAME, with the third letter chosen to uniquely identify the locale. The exception to this is when the uniqueness rule in #1 are not met, in which case a change will be made to make them unique.

RULE #3: If a language has only one locale, such as Japanese or Korean, then usually the ISO-639 THREE LETTER CODE will be used for LOCALE_SABBREVLANGNAME. The exception to this is once again when the unique two letter rule in #1 is not met, in which case a change will be made to make it unique.

Thus, to give the example of a new locale in Vista, Uighur (PRC).... Uighur's two letter ISO 639 code is ug, and its three letter ISO 639 code is uig. since the Uighur language is not used for any other locales, and further since UIG and its first two letters UI do not conflict with any other language or locale, the three letter ISO 639 code is used here.

Now of course this approach is going to upset anyone who prefers Uyghur when the whole Uighur or Uyghur question is raised, but as a choice it is not designed to choose sides, it is simply using the ISO 639 three letter code, which happens in this case to not have its first two letters match the ISO 639 two letter code.

The end result is a code that is uniquely qualified to upset people who feel that their language or locale is being misrepresented in the Language Bar....


This post brought to you by ئ (U+0626, a.k.a. ARABIC LETTER YEH WITH HAMZA ABOVE)

# Anonymous on 27 Sep 2006 9:23 AM:

If you wanted a two letter code for the language bar, why not use the ISO-639 TWO LETTER CODE instead of munging the three letter code into two letters?

# Michael S. Kaplan on 27 Sep 2006 10:41 AM:

Lots of reasons!

1) The data for the ISO two-letter codes did not exist in Windows till Vista, and there is over a decade of existing expectations for the language name and the two letter codes in the language bar.

2) Not all languages used in locales have two-letter codes available, and some of the ones with only three letter codes conflict when the three-letter code is truncated.

3) Another team owns the language bar, so it is a bit more complicated then just changing it ourselves.

# oyghan on 27 Sep 2006 1:30 PM:

1) I agree. We've asked repeatedly for an easy way to do so. but, sigh...

2) This is not the case of Uyghur. "ug" is still available.

3) It is important to have a discussion between team members :)

# Michael S. Kaplan on 27 Sep 2006 2:00 PM:

#3 is the rule that is used for Uyghur/Uighur. I know this leaves you with UI, which I know you don't want, but the complaint needs to be with ISO for assigning the three letter code that they did, not for Microsoft, that is using ISO's three letter code.

# teklimakan on 27 Oct 2006 1:46 AM:

Recommendation for the English transcription of the word

“ ئۇيغۇر ” /《维吾尔》

The constant advances of the reform and opening policy and the growing trade and telecommunications exchanges between people in all sectors and the outside world, as well as increased activities in publication, news reporting and international affairs have for some time renewed the need for a uniform and normalized ethnonym for the major nationality of Xinjiang, known in its own language as [ujγur].

At present, there is utter confusion on how to render and use in English the name of that nationality, with no fewer than seven different spellings attested: Uyghur, Uygur, Uighur, Uighuir, Uiguir, Uigur and Weiwuer. This situation causes a number of problems in our work and daily lives. Therefore, the Terminology Normalization Committee for Ethnic Languages of the Xinjiang Uyghur Autonomous Region based on research and consultations with relevant experts on this issue recommends that the spelling Uyghur, corresponding to the pronunciation [ujγur], be used as the English transcription of the word.

Government organizations and individuals are invited to conform to the present notice.

The Terminology Normalization Committee for Ethnic Languages of the Xinjiang Uyghur Autonomous Region

October 11, 2006




