What's up with the language names?

by Michael S. Kaplan, published on 2005/12/05 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/12/05/499949.aspx

Late last week, there was a bug in some builds of Vista where one of the language names in Regional and Language Options and the Input Languages dialogs were apparently being duplicated. I was working with my manager to fix a different issue and for a while we thought it might be related to the fix we were working on. After establishing it wasn't us, we decided a separate bug should probably just be reported....

Then on Saturday I was doing a review for a bug fix someone else was doing for a separate bug, which happened to be the cause of the very duplicated name problem we had seen. It all came toegther.

And then just yesterday, Serge Wautier asked me in the Suggestion Box:

Hi Michael,

Why does GetLocaleInfo(...,LOCALE_SLANGUAGE,...) include the country name when the sublanguage is not unique ? After all, there is LOCALE_SCOUNTRY for the ones who want.

Also, LOCALE_SENGLANGUAGE does not include the country name, apparently making it kind of inconsistent with its non-ENG pal.

How do I retrieve the localized name of a language without the country ?



And yes, this is related!

Let's take a look at all of the different language and country names here. First the managed:

And here is the unmanaged, all coming from GetLocaleInfo and some of its LCType values:

A few important conclusions that can be drawn from contrasting these two lists and adding some additional information:

  1. The .NET Framework names are clearly more consistent with each other across the various language names (English, Native, and localized name).
  2. In almost all cases, Windows keeps the language and country names separate.
  3. In the one case where it does not (LOCALE_SLANGUAGE), the string is used as the name in the list for both Regional and Language Opions and the Input Language selection dialog -- both of which have the requirement of not including the country/region name unless it is needed to distinguish between multiple locales with the same language name.

As in most cases, the needs of a particular customer scenario outweight the need to be consistent at the potential cost to usability, thus this long standing difference in LOCALE_SLANGUAGE has been there for quite some time.

For the .NET Framework, the separate language name without region can of course be found by using the neutral CultureInfo object rather than the specific one, an option that Windows has never had....

In an odd way, this difference reveals something about the nature of how the two functionality was created on the two platforms. The Windows platform had that locale information added item by item, as it was required for other functionality in NLS or other parts of Windows -- sometimes consistent and sometimes not (for both the names of the various constants and what they return).

The .NET Framework, on the other hand, had the advantage of being able to look at the over 200 data items and build up a consistent model for the bulk of them. So it has a very cool advantage in that respect....

I suspect over time as items are added to the .NET Framework based on customer requests that it may lose some of that advantage -- once the scenarios start to overtake all of that careful planning.

That bug? It was a case where a new locale was added and both LOCALE_SLANGUAGE values were the same since the regions were not there. But the bug has been fixed and Serge's question has been answered and all is well. :-)


This post brought to you by "ޗ" (U+0797, THAANA LETTER CHAVIYANI)

# Serge Wautier on 5 Dec 2005 3:02 PM:

When the title of this post appeared in my reader, I couldn't believe you addressed my request/suggestion in only slightly more than 12 hours.

Wow !!! You rock !

> and Serge's question has been answered and all is well.

Er... I'll assume the reply to my last question (How do I retrieve the localized name of a language without the country ?) - in unmanaged contexts - is then either "you don't" or "you have a wrong requirement". I believe the requirement is not wrong especially if using MAKELANGID(PrimaryLangId, SUBLANG_NEUTRAL). So I'm left with "you don't", which was unfortunately what I feared :-(

Thanks anyway. As always, your explanation/clarification would be worth adding to the docs ! (For the ones who still do not consider this blog as being the reference itself ;-)

# Michael S. Kaplan on 5 Dec 2005 4:07 PM:

It is in fact as you indicate -- there is no methodology to get only the locazlized language, without country and script (other thanj using the .NET Framework).

# Maurits [MSFT] on 5 Dec 2005 7:37 PM:

As a workaround, does the following extract the language name? (Forgive the pseudo-Perl...)

# get the locale slanguage
$language = get_locale_slanguage();

# $language is now "Español (España)"

# strip out the " (España)" part
# or anything of the form " (...)"
$language =~ s/ \(.*\)$//; # ew, regular expressions

# $language is now "Español"

# Michael S. Kaplan on 5 Dec 2005 9:43 PM:

I don't think that every version of Windows universally used language and then everything else (region and script) always in parentheses 100% of the time.

Plus there is no way to say what a localizer may do here with the string once it is in their reach....

referenced by

2012/06/19 Maybe they're just showing off their fancy fonts? ;-)

2007/06/11 LCID vs. Powerman and the Locale-name-go-round

2007/03/11 Yesterday's 'justified design' is tomorrow's 'bug we fixed'

2005/12/09 And they sort that list HOW, exactly?

2005/12/06 Questions about the unit and record separators

go to newer or older post, or back to index or month or day