by Michael S. Kaplan, published on 2005/05/17 02:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/05/17/418372.aspx
I think I may have said in the past that the SUBLANGID is an odd beast.
They are defined in the winnt.h header file in the SDK (and ntdef.h in the DDK). Here is an excerpt of the ones there (which mostly have the value of 1 or 2):
#define SUBLANG_DEFAULT 0x01 // user default
#define SUBLANG_SYS_DEFAULT 0x02 // system default
#define SUBLANG_ARABIC_SAUDI_ARABIA 0x01 // Arabic (Saudi Arabia)
#define SUBLANG_ARABIC_IRAQ 0x02 // Arabic (Iraq)
#define SUBLANG_AZERI_LATIN 0x01 // Azeri (Latin)
#define SUBLANG_AZERI_CYRILLIC 0x02 // Azeri (Cyrillic)
#define SUBLANG_CHINESE_TRADITIONAL 0x01 // Chinese (Taiwan)
#define SUBLANG_CHINESE_SIMPLIFIED 0x02 // Chinese (PR China)
#define SUBLANG_CROATIAN_CROATIA 0x01 // Croatian (Croatia)
#define SUBLANG_DUTCH 0x01 // Dutch
#define SUBLANG_DUTCH_BELGIAN 0x02 // Dutch (Belgian)
#define SUBLANG_ENGLISH_US 0x01 // English (USA)
#define SUBLANG_ENGLISH_UK 0x02 // English (UK)
#define SUBLANG_FRENCH 0x01 // French
#define SUBLANG_FRENCH_BELGIAN 0x02 // French (Belgian)
#define SUBLANG_GERMAN 0x01 // German
#define SUBLANG_GERMAN_SWISS 0x02 // German (Swiss)
#define SUBLANG_ITALIAN 0x01 // Italian
#define SUBLANG_ITALIAN_SWISS 0x02 // Italian (Swiss)
#define SUBLANG_MALAY_MALAYSIA 0x01 // Malay (Malaysia)
#define SUBLANG_MALAY_BRUNEI_DARUSSALAM 0x02 // Malay (Brunei Darussalam)
#define SUBLANG_NORWEGIAN_BOKMAL 0x01 // Norwegian (Bokmal)
#define SUBLANG_NORWEGIAN_NYNORSK 0x02 // Norwegian (Nyorsk)
#define SUBLANG_PORTUGUESE 0x02 // Portuguese
#define SUBLANG_PORTUGUESE_BRAZILIAN 0x01 // Portuguese (Brazilian)
#define SUBLANG_SERBIAN_LATIN 0x02 // Serbian (Latin)
#define SUBLANG_SERBIAN_CYRILLIC 0x03 // Serbian (Cyrillic)
#define SUBLANG_SPANISH 0x01 // Spanish (Castilian)
#define SUBLANG_SPANISH_MEXICAN 0x02 // Spanish (Mexican)
#define SUBLANG_SPANISH_MODERN 0x03 // Spanish (Modern)
#define SUBLANG_SWEDISH 0x01 // Swedish
#define SUBLANG_SWEDISH_FINLAND 0x02 // Swedish (Finland)
#define SUBLANG_UZBEK_LATIN 0x01 // Uzbek (Latin)
#define SUBLANG_UZBEK_CYRILLIC 0x02 // Uzbek (Cyrillic)
Some of it boils down to that evil use of the word DEFAULT coming back to bite us you know where. After all, the decision of which SUBLANGID comes in what order is due to an arbitrary combination of alphabetical order and historical assignment. If we did not give assignments for any of the SUBLANGID==1 entries, it would imply that the first LCID in the series was somehow, you know, like the default, as opposed to the rest of the LCIDs in the series. Because in most cases, it isn't.
Of course if you ask me, the train already left the station for the ones that have no country in them. Which is to say that SUBLANG_SWEDISH, SUBLANG_PORTUGUESE, SUBLANG_ITALIAN, SUBLANG_GERMAN, and SUBLANG_FRENCH already sort of says something along those lines by not being SUBLANG_SWEDISH_SWEDEN, SUBLANG_PORTUGUESE_PORTUGAL, SUBLANG_ITALIAN_ITALY, SUBLANG_GERMAN_GERMANY, and SUBLANG_FRENCH_FRANCE, respectively. Don't they?
Might have saved a few lines in the header file, if nothing else....
Although sometimes the comments name a country (like Finland), other times they name a way of doing something in a country (like Mexican). So there is no need to read into patterns; there are so many different ones that you can do almost anything and still be consistent with an entry that is already there.
I guess we could fix the comments to be more consistent -- there is no backcompat issue if we changed the one to Finnish or the other to Mexico; the code would still compile the same way.
Now they do have to all be in ASCII, since the C standard did not recognize the ability to have anything other than ASCII in header files. I guess that only hurts the comment on SUBLANG_NORWEGIAN_BOKMAL, but everyone can probably pretend it is actually "// Norwegian (Bokmål)" and call it a day. I tend to think of those comments as a lot of overhead to maintain, even if you ignore the occasional geopolitically sensitive issue....
Also, as far as I can tell, our Right Honourable Data Lady1 never really worked during the years to deal with these values directly -- she always assigned LCIDs and some dev would do the math to fill in the header file (after getting the text to use as the comment, of course -- the best protection against the aforementioned geopolitical issues is involving people who understand them!).
Back in the Summer of 2001, I was asked by the Unicode Technical Committee to provide updated information for a comparative table of Language Codes and Country Codes. After consulting with various people at Microsoft it was decided that a list of SUBLANGID values as "country codes" was pretty much a useless idea, so we went with some country codes we had defined, instead.
Which are not useless in and of themselves -- they are mostly2 just international dialing codes. Though I am guessing that if people are going to the Unicode site they are not looking for a number to dial. But they are at least more useful than a bunch of random numbers that without the context of macros like MAKELANGID and information about the construction of LCIDs serve no useful purpose ever. It is the same value as that returned by GetLocaleInfo with the LOCALE_ICOUNTRY LCTYPE. Though note the documentation for that constant gives us a cooler explanation:
LOCALE_ICOUNTRY Country/region code, based on international phone codes, also referred to as IBM country/region codes.
So maybe we can blame IBM, especially for that "mostly" part? :-)
Oh, never mind. It was just a thought....
Which reminds me that I should be having someone give us new CTRY_* constants for all of the new locales in countries that we have never had locales in before. Darn, I had forgotten about those. At least I know I was not the only one! :-)
Some of the codes are really not needed, like SUBLANG_LITHUANIAN and SUBLANG_KOREAN, since they are lone reeds and are really the only one we expect for the respective languages. But then again, we did not use SUBLANG_LITHUANIAN_LITHUANIA and SUBLANG_KOREAN_KOREA so someone knew that was where we were heading?
Though perhaps I am mistaken about that, who knows? It certainly means that the dozens of entries that are not included at all have a chance to feel slighted, though I hope they do not. I wish we could take some of them out, only we can't.
Anyway, we have a bunch of codes defined. We cannot ever undefine them (who knows where they may be used?). But we could certainly wait to define them until/unless we need them.
Now remember that for NLS, where there is no way to get data for neutrals, defining PRIMARYLANGID and SUBLANGID values and not LCID values is not entirely useful anyway. But those PRIMARYLANGID values are useful combined with SUBLANG_NEUTRAL for the sake of resource loading, and you can't just have those values dangling out there alone.
So in the end it all makes sense as to why they exist. But it took a long post like this one to give it enough context that one could say that. :-)
1 - There is a really funny story (IMHO) about how Cathy was "given" that title, but I will ask for her permission before telling it (I may ask her to tell it to me again to make sure I do not fumble it!)
2 - I think they are all internationl dialing codes, other than CTRY_CANADA which is defined as 2 in winnls.h, even though their dialing code is 1.
This post brought to you by "ফ" (U+09ab, a.k.a. BENGALI LETTER PHA)
# Mihai on 17 May 2005 11:19 AM:
# Michael S. Kaplan on 17 May 2005 5:10 PM:
# Michael S. Kaplan on 17 May 2005 5:15 PM:
referenced by
2008/08/11 The fault is ~60% functionality, ~40% documentation
2006/03/01 About that Portuguese localization question...
2005/08/31 Sometimes it *does* pay to be neutral