Lots of Locale questions

by Michael S. Kaplan, published on 2010/01/08 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/01/08/9945487.aspx

The Suggestion Box has been getting way too busy during my extended absence from blogging, so I thought I'd clear out a few today. :-)

First, from Jeroen Ruigrok van der Werven, who asked:

Hi Michael,

any idea why the locale identifier is missing for Corsican at http://msdn.microsoft.com/en-us/library/ms776260

Also, what additional locales does Windows 7 have over Vista?

Now I'll be honest with you about these two topics (Locale Identifier Constants and Strings and Language Identifier Constants and Strings). I don't like them. They are huge lists that are static looks at a list of locales but neither says when it is from, specifically. Which is really the only useful purpose for HAVING such a static list (knowing when the list is designed to encompass). You could use the list itself to work backward and discern the dates under the if you already know the answer you'll be able to find out the answer doctrine, but that's kind of not the point.

The previous paragraph gives a hint as to how once can understand answer to the original question -- obviously the one missing an LCID came before the one not missing it. Further along the lines of FDA (Forensic Documentation Analysis, a common science seen on the tv show CSI: MSDN) one can look at the top line info on each page, the latter with

MSDN -> MSDN Library -> Win32 and COM Development -> User Interface -> International Support -> Globalization Services -> National Language Support -> National Language Support Reference -> National Language Support Constants -> Language Identifier Constants and Strings

and the former with

MSDN -> MSDN Library

The fact that the one has several locales marked Vista and later and the other has a locale not in the first make it clear that one is from Vista and the other is from Windows 7, and that the Vista topic is "orphaned" from the full indexing of a Table of Contents.

So the topic without the locale in question should probably just be removed or something.

The second question comes from Jan Kučera:

Hi, glad to see you back, Michael!

Actually I've already tried twice to post the following question some time ago, but it did not go through. However, I know how you like sorting topics, so I thought I might try it once more… here we go:

Say I have a web page, and I'm providing the content in several languages. Now the question is, what do you thing is the best way to present the languages available, in which language and the most interesting – in which order?

I've told to myself – okay, let's see how the MSDN does it. If I open the MSDN web, I see "Česká Republika - Česky" in the top right corner. Hmm well, if you happen to not know the Czech language, I guess you have no idea this is the language selector. Anyway, here is the list expanded:

What is the reason for listing the country first and then the language? I also see both country and language are displayed in the native language, which prevents you discovering the selector if the page is displayed in a language you are not familiar with. Now after couple of your posts about sorting, I see this list is sorted neither by language, nor by country (the non-Latin characters would go down, right?). So my first guess is that the list is sorted by country language in English – an item not in the list – a bit confusing, especially if you don't know the English names – though quite interesting idea for me. And now, what sort is used? Assuming the 中国 thing is China, it seems to be sorted using English rules, because in Czech (in which the web is shown), this would go after Colombia. Wasn't it me to whom did you advised to use the sorting expected by the user? Funny is that if you click to display more languages, the combo box is sorted different way, I would say by native country names (non-Latin at the bottom), accent insensitive (Česká republika before Chile) using English rules (Chile before Colombia) – at least that page is in English only. Though for me, looking for Česká republika in the middle of 'C' names is really weird.

Looking at another sites, I see everyone implemented it differently. So...I wonder, do you have any thoughts what could be the most correct way?


I don't know how back I was in April when Jan put this question up, though I think I'm back now. :-)

There are several questions packed in there, let's try and get all of them.

First there is the country followed by language is just a choice that several of the pages and subwebs of microsoft.com do to underscore the reality of the different subsidiaries (both to help foster a sense of "ownership" and to make it clearer who is best to be looking after the interests of accuracy whethr subsidiary or localizer, etc.).

Now the putting the region/language in the native language is a common way to keep the readers on the pages they are most likely to find useful - if you can't read it, then why would you click on it? :-)

As for the two lists, the first is the one in the dropdown:

where the second list is on the separate page:

It is clearly based on the less sexy list sorted alphabetically, though not by a specific language, as Jan mentioned.

One of the bad things bout it is that will place all of the non-Latin script names at the end, which almost hads the unfortunate connotation of putting some entries at the "back of the bus". So the smaller list, which is indeed sorted in the order of the English spelling of the entry but which even though you may not see the order but can almost certainly find your language if its there, and the bigger list is fun to look at the source and you'll see the locale names there, too:

<option value="es-ar">Argentina (Espa&#241;ol)</option>
<option value="en-au">Australia (English)</option>
<option value="nl-be">Belgi&#235; (Nederlands)</option>
<option value="fr-be">Belgique (Fran&#231;ais)</option>
<optio/n value="es-bo">Bolivia (Espa&#241;ol)</option>
<option value="pt-br">Brasil (Portugu&#234;s)</option>
<option value="en-ca">Canada (English)</option>
<option value="fr-ca">Canada (Fran&#231;ais)</option>
<option value="cs-cz">Česk&#225; republika (Česky)</option>
<option value="es-cl">Chile (Espa&#241;ol)</option>
<option value="es-co">Colombia (Espa&#241;ol)</option>
<option value="es-cr">Costa Rica (Espa&#241;ol)</option>
<option value="da-dk">Danmark (Dansk)</option>
<option value="de-de">Deutschland (Deutsch)</option>
<option value="es-ec">Ecuador (Espa&#241;ol)</option>
<option value="es-sv">El Salvador (Espa&#241;ol)</option>
<option value="es-es">Espa&#241;a (Espa&#241;ol)</option>
<option value="fr-fr">France (Fran&#231;ais)</option>
<option value="es-gt">Guatemala (Espa&#241;ol)</option>
<option value="es-hn">Honduras (Espa&#241;ol)</option>
<option value="en-in">India (English)</option>
<option value="id-id">Indonesia (Bahasa Indonesia)</option>
<option value="en-ie">Ireland (English)</option>
<option value="it-it">Italia (Italiano)</option>
<option value="es-mx">M&#233;xico (Espa&#241;ol)</option>
<option value="nl-nl">Nederland (Nederlands)</option>
<option value="en-nz">New Zealand (English)</option>
<option value="es-ni">Nicaragua (Espa&#241;ol)</option>
<option value="nb-no">Norge (Norsk)</option>
<option value="de-at">&#214;sterreich (Deutsch)</option>
<option value="es-pa">Panam&#225; (Espa&#241;ol)</option>
<option value="es-py">Paraguay (Espa&#241;ol)</option>
<option value="es-pe">Per&#250; (Espa&#241;ol)</option>
<option value="pl-pl">Polska (Polski)</option>
<option value="pt-pt">Portugal (Portugu&#234;s)</option>
<option value="es-pr">Puerto Rico (Espa&#241;ol)</option>
<option value="es-do">Rep&#250;blica Dominicana (Espa&#241;ol)</option>
<option value="ro-ro">Rom&#226;nia (Rom&#226;nă)</option>
<option value="de-ch">Schweiz (Deutsch)</option>
<option value="en-sg">Singapore (English)</option>
<option value="sk-sk">Slovensko (Slovensky)</option>
<option value="en-za">South Africa (English)</option>
<option value="fr-ch">Suisse (Fran&#231;ais)</option>
<option value="fi-fi">Suomi (Suomi)</option>
<option value="sv-se">Sverige (Svenska)</option>
<option value="tr-tr">Turkiye (T&#252;rk&#231;e)</option>
<option value="en-gb">United Kingdom (English)</option>
<option selected="selected" value="en-us">United States (English)</option>
<option value="es-uy">Uruguay (Espa&#241;ol)</option>
<option value="es-ve">Venezuela (Espa&#241;ol)</option>
<option value="el-gr">Ελλάδα (Ελληνικά)</option>
<option value="bg-bg">България (Български)</option>
<option value="kk-kz">Қазақстан (Русский)</option>
<option value="ru-ru">Россия (Pусский)</option>
<option value="uk-ua">Україна (Українська)</option>
<option value="ko-kr">한국(한국어)</option>
<option value="zh-cn">中国(简体中文)</option>
<option value="zh-tw">台灣(繁體中文)</option>
<option value="ja-jp">日本 (日本語)</option>
<option value="ar-sa">الشرق الأوسط - العربية</option>
<option value="he-il">ארצות הברית - אנגלית</option>

See what I mean? Interesting, right? And clearly not in the same order as the smaller list.

Kind of interesting the letters they chose to use NCRs for too, since they did not use all of them.

I actually get asked the final question about what would the best order be all the time, at least 10 times thie year (mostly from inside Microsoft but a few from the outside). the really aren't good rules for this, though I find the English only ordering to be pretty wrong myself - not accent insensitive since in the English sort that Č really is just a letter with an accent that is a tertiary distinction, so the primary distinction later in the letter will win. Maybe they'll do better if we switch languages - say to French:

Well they did localize the More... to Plus... but the list itself is the same, and the list when you click that list is identical, which I would call unfortunate as I was hoping they'd be using the locale name, which they have in the list, for the sorting. Though they did not do so. That really seems less than ideal to me. But there aren't a whole lot of standards out there for sorting these lists so almost anything one does isn't "wrong" in a technical sense.

The web is rife with different answers to this question, though.... on the whole I don't really like any of them....

greenlight on 8 Jan 2010 2:41 PM:

Wikipedia has an interesting sort as well. E.g. Japanese is sorted under "n" (Nihongo), Chinese is sorted last (Zhongguohua).

And they're not just using the subdomains since "Acèh" (ace.wikipedia.org) sorts before "Аҧсуа" (ab.wikipedia.org)

Jan Kučera on 9 Jan 2010 5:25 AM:

Nice, thanks! I also see I did some confusing mistakes in the question, but it made the point.

- What is the reason for listing the country first and then the language?

So no linguistic reason here, just to separate the markets. Actually it is a more "Country/Region" selector as we can see above the larger list (no hint in the small list). Makes me feel a bit that the language change is only a side-effect of the region selection. :)

And it makes impossible to see what is offered in the Spain market unless you can read Spanish, though this is sure not a requirement for living there.

Also seems that if all you want is to offer your web site or application in different languages, country list not appropriate. Though I am not sure how many teams out there do separate translations for en-US and en-GB for example.

- Now the putting the country/language in the native language is a common way to keep the readers on the pages they are most likely to find useful - if you can't read it, then why would you click on it? :-)

Leaving out the curious linguistic fans, the drawback of this is if you can't read the page at all, how do you find where to switch it? I mean if it opens to you that way already.

I don't know if all scripts have a possibility to be transliterated to Latin, but since the Unicode names are kept in English I believe it should be possible, and as most of the time localization happens to the languages covered by Unicode, it could be an interesting idea to use the transliterated language names for sorting. Though I'm not sure how this would work with hieroglyphes for example. :-)

Anyway, since there does not seem to be a correct answer, I would be interested what preferences do other readers have about the content and sort of the language selection list.

Michael S. Kaplan on 9 Jan 2010 8:20 PM:

I talk in this blog about weaknesses in the subsidiary model, and some of the issues you raise are related; others are new issues I had not mentioned....

Mihai on 15 Jan 2010 10:44 AM:

I kind of prefer to have each languages written in its native form and its own script.

If I am interested in Thai, then it is also likely I will be able to recognize the Thai from the list.

With transliteration (or with English), if a Thai native does not know the Latin script, he is stuck, he cannot recognize his own language.

Or think the other way around: you are American, and stumble on the Japanese page (which you don't understand). If the list of language would be localized to match the page (Japanese), how do you reconize English, to switch back to it?

To get a taste: go to Google, "Language Tools", and in "Use the Google Interface in Your Language" select Telugu.

Now switch back to English (ignoring "Google.com in English" which is again discriminating against non-English speakers).

go to newer or older post, or back to index or month or day