Language groups -- the vestigial tail of NLS

by Michael S. Kaplan, published on 2005/02/16 11:02 -05:00, original URI:

Today, I am going to talk about language groups. They are a little confusing....

First we will take their names, straight from our master NLS header file, winnls.h (taken from the one that ships with VS.NET 2003):

//  Language Group ID Values.
#define LGRPID_WESTERN_EUROPE        0x0001   // Western Europe & U.S.
#define LGRPID_CENTRAL_EUROPE        0x0002   // Central Europe
#define LGRPID_BALTIC                0x0003   // Baltic
#define LGRPID_GREEK                 0x0004   // Greek
#define LGRPID_CYRILLIC              0x0005   // Cyrillic
#define LGRPID_TURKISH               0x0006   // Turkish
#define LGRPID_JAPANESE              0x0007   // Japanese
#define LGRPID_KOREAN                0x0008   // Korean
#define LGRPID_TRADITIONAL_CHINESE   0x0009   // Traditional Chinese
#define LGRPID_SIMPLIFIED_CHINESE    0x000a   // Simplified Chinese
#define LGRPID_THAI                  0x000b   // Thai
#define LGRPID_HEBREW                0x000c   // Hebrew
#define LGRPID_ARABIC                0x000d   // Arabic
#define LGRPID_VIETNAMESE            0x000e   // Vietnamese
#define LGRPID_INDIC                 0x000f   // Indic
#define LGRPID_GEORGIAN              0x0010   // Georgian
#define LGRPID_ARMENIAN              0x0011   // Armenian

This list will help you start feeling the confusion. They seem to be based on language. No thats not right, it must be script. No, maybe based on region. No? Perhaps something else entirely.

Heck, the English - New Zealand locale shows up under the "Western European" language group. Riddle me that one, won't you?

In Windows 2000 language groups matched the big list of "languages your system supports" in Regional Options and that is when language groups were analagous to a feature in the operting system.

Ok, maybe NLS Terminology will set us straight on what they are supposed to be:

Purpose: Provides all keyboard layouts, IMEs, TrueType fonts, font links, LPKs, bitmap fonts and code page translation tables needed by the system for a group of languages. Therefore impacts all other settings in this list.

Expository text: The language group controls which system locale, user locales, input locales, and user interface (UI) languages can be selected. For example, Windows installs the Western Europe and United States language group by default. This default cannot be removed. For each localized version, the specified language group is the default and cannot be removed. Thus, if the English version of Windows is installed in a non-English speaking country/region, the user will typically install another language group.

When adding a language group, Windows copies (but does not activate) the necessary keyboard files, Input Method Editors (IMEs), TrueType Font files, bitmap font files, and National Language Support (.nls) files. Adding a language group also adds registry values for font linking and installs scripting engines for complex script languages (Arabic, Hebrew, Indic, and Thai).

Giving (for example) Armenian its own language group did not really serve as much purpose here since the font was pretty small, there is no IME, and no special system support is required like with other language groups. So they do not feel like a big group of equal partners....

Then, starting in Windows XP and Server 2003 the notion has largely been replaced by two checkboxes in the second tab of Regional and Language Options -- essentially giving us just three groups:

  1. East Asian languages (basically ideographic scripts e.g. Chinese)
  2. Complex script languages (e.g. Thai, Hebrews, Indic, Arabic script)
  3. Everyone else the system supports

The idea was that Windows would support everybody unless it required a ton of IME/font support files (category 1) or turning on complex script support throughout the OS (category 2).

And in XP and Server 2003, even trying to install language group support a-la-Q289125 will go in and decide which of the above categories the request was in, and install that entire category's underlying technological support. So now language groups (which were never too clearly defined anyway) are at the level of an appendix or a vestigial tail!

What happens with them going forward? Who can say? In a world where parts of Australia are classified as being Western European, anything is possible! :-)


This post brought to you by "Ƣ" (U+01a2, a.k.a. LATIN CAPITAL LETTER OI)

# Jeff Lewis on 16 Feb 2005 8:49 AM:

Just another example of how public schools are failing us.... ;-)

# Chris on 16 Feb 2005 1:21 PM:

> In a world where parts of Australia are classified as being Western European
> he English - New Zealand locale shows up under the "Western European" language group

I don't think that the Kiwis appreciate being lumped in with the Aussies much less with the Europeans!

# Mike Williams on 16 Feb 2005 1:36 PM:

It's only because Americans confuse Australia and Austria all the time. The first week I was at Microsoft Sydney I intercepted a call from Roger Heinen's PA, who wanted to ensure an English-speaking person would be present for his conference call the next day. A couple of years later, one of our product managers got a call from a marketing manager in Office who said she was going to Photokina in Germany and that she should drop in on Sydney "while she was in the area".

Oh, and yes almost mail from Redmond to me was sent care of their European dispatch office, and almost always said MS AUSTRIA.

# Michael Kaplan on 16 Feb 2005 2:27 PM:

Chris -- heh heh heh. I was not lumping them together, I was giving another example of a non-Western European locale (English - Australia) ! :-)

Mike -- Also a good one, but not the reason in this case. Its mainly because the lines are mostly drawn by code page, so being in cp1252 makes you an eligible member of the club....

# Dean Harding on 16 Feb 2005 2:33 PM:

"Anybody could miss Canada... all tucked away down there." -- Homer Simpson.

I guess that's what you get when you expect Windows to teach you geography :)

# Mike Williams on 16 Feb 2005 2:58 PM:

Oh sure I appreciate the code-page reasons - but the comment does say "// Western Europe & U.S." thereby omitting the rest of the Americas, Australia, South Africa, Singapore, and a great number of other users of the Latin alphabet. I really don't think Microsoft is aware how often it bruises other cultures, and even when it is made aware - it doesn't do anything about it, unless there is an immediate threat to business.

# Michael Kaplan on 16 Feb 2005 3:01 PM:

I understand the generic argument, but I can promise you this is the one group that knows more about the world than just about anything else (including long. and lat. if you take GEO stuff into account!).

It probably would have been better to not have a feature that is so functionally schizophrenic as language groups. One version later and they are no longer needed but now we are stuck with them forever. :-(

Yuhong Bao on 9 Feb 2011 2:16 PM:

Well, what other collective name would you suggest for the region that speaks the languages covered by each Windows codepage?

Michael S. Kaplan on 9 Feb 2011 5:38 PM:

Grouping by code page is a flawed view.

Yuhong Bao on 9 Feb 2011 10:40 PM:

However, it is not the only thing guilty of this. I am sure there is a lot of docs that refer to cp1252 as the "Western European" codepage, dating back to at least the early 1990s.

referenced by

2009/09/01 For language settings, Walmart ain't all that; 7-Eleven is what gets it done!

2006/12/11 IsComplexEnoughForYou?

2006/02/10 Installing supplemental langauge support programatically

go to newer or older post, or back to index or month or day