What's up with the language names?
by Michael S. Kaplan, published on 2005/12/05 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/12/05/499949.aspx
Late last week, there was a bug in some builds of Vista where one of the language names in Regional and Language Options and the Input Languages dialogs were apparently being duplicated. I was working with my manager to fix a different issue and for a while we thought it might be related to the fix we were working on. After establishing it wasn't us, we decided a separate bug should probably just be reported....
Then on Saturday I was doing a review for a bug fix someone else was doing for a separate bug, which happened to be the cause of the very duplicated name problem we had seen. It all came toegther.
And then just yesterday, Serge Wautier asked me in the Suggestion Box:
Why does GetLocaleInfo(...,LOCALE_SLANGUAGE,...) include the country name when the sublanguage is not unique ? After all, there is LOCALE_SCOUNTRY for the ones who want.
Also, LOCALE_SENGLANGUAGE does not include the country name, apparently making it kind of inconsistent with its non-ENG pal.
How do I retrieve the localized name of a language without the country ?
And yes, this is related!
Let's take a look at all of the different language and country names here. First the managed:
- CultureInfo.DisplayName - Gets the culture name in the format "<languagefull> (<country/regionfull>)" in the language of the localized version of .NET Framework.
- CultureInfo.EnglishName - Gets the culture name in the format "<languagefull> (<country/regionfull>)" in English.
- CultureInfo.NativeName - Gets the culture name in the format "<languagefull> (<country/regionfull>)" in the language that the culture is set to display.
- RegionInfo.DisplayName - Gets the full name of the country/region in the language of the localized version of .NET Framework.
- RegionInfo.EnglishName - Gets the full name of the country/region in English.
- RegionInfo.NativeName - Gets the name of a region formatted in the native language of the region.
And here is the unmanaged, all coming from GetLocaleInfo and some of its LCType values:
- LOCALE_SCOUNTRY - Full localized name of the country/region. This is based on the localization of the product, thus it changes for each localized version.
- LOCALE_SENGCOUNTRY - Full English name of the country/region. This is always restricted to characters that can be mapped into the ASCII 127-character subset.
- LOCALE_SENGLANGUAGE - Full English name of the language from the International Organization for Standardization (ISO) Standard 639. This is always restricted to characters that can be mapped into the ASCII 127-character subset. This is not always equivalent to the English version of LOCALE_SLANGUAGE.
- LOCALE_SLANGUAGE - Full localized name of the language. This name is based on the localization of the product, thus the value changes for each localized version.
- LOCALE_SNATIVECTRYNAME - Native name of the country/region. For example, for the Spanish (Spain) locale, the native country/region name would be España.
- LOCALE_SNATIVELANGNAME - Native name of the language. For example, for the native language name of Armenian (Armenia) is Հայերեն.
A few important conclusions that can be drawn from contrasting these two lists and adding some additional information:
- The .NET Framework names are clearly more consistent with each other across the various language names (English, Native, and localized name).
- In almost all cases, Windows keeps the language and country names separate.
- In the one case where it does not (LOCALE_SLANGUAGE), the string is used as the name in the list for both Regional and Language Opions and the Input Language selection dialog -- both of which have the requirement of not including the country/region name unless it is needed to distinguish between multiple locales with the same language name.
As in most cases, the needs of a particular customer scenario outweight the need to be consistent at the potential cost to usability, thus this long standing difference in LOCALE_SLANGUAGE has been there for quite some time.
For the .NET Framework, the separate language name without region can of course be found by using the neutral CultureInfo object rather than the specific one, an option that Windows has never had....
In an odd way, this difference reveals something about the nature of how the two functionality was created on the two platforms. The Windows platform had that locale information added item by item, as it was required for other functionality in NLS or other parts of Windows -- sometimes consistent and sometimes not (for both the names of the various constants and what they return).
The .NET Framework, on the other hand, had the advantage of being able to look at the over 200 data items and build up a consistent model for the bulk of them. So it has a very cool advantage in that respect....
I suspect over time as items are added to the .NET Framework based on customer requests that it may lose some of that advantage -- once the scenarios start to overtake all of that careful planning.
That bug? It was a case where a new locale was added and both LOCALE_SLANGUAGE values were the same since the regions were not there. But the bug has been fixed and Serge's question has been answered and all is well. :-)
This post brought to you by "ޗ" (U+0797, THAANA LETTER CHAVIYANI)
# Serge Wautier on 5 Dec 2005 3:02 PM:
When the title of this post appeared in my reader, I couldn't believe you addressed my request/suggestion in only slightly more than 12 hours.
Wow !!! You rock !
> and Serge's question has been answered and all is well.
Er... I'll assume the reply to my last question (How do I retrieve the localized name of a language without the country ?) - in unmanaged contexts - is then either "you don't" or "you have a wrong requirement". I believe the requirement is not wrong especially if using MAKELANGID(PrimaryLangId, SUBLANG_NEUTRAL). So I'm left with "you don't", which was unfortunately what I feared :-(
Thanks anyway. As always, your explanation/clarification would be worth adding to the docs ! (For the ones who still do not consider this blog as being the reference itself ;-)
# Michael S. Kaplan on 5 Dec 2005 4:07 PM:
It is in fact as you indicate -- there is no methodology to get only the locazlized language, without country and script (other thanj using the .NET Framework).
# Maurits [MSFT] on 5 Dec 2005 7:37 PM:
As a workaround, does the following extract the language name? (Forgive the pseudo-Perl...)
# get the locale slanguage
$language = get_locale_slanguage();
# $language is now "Español (España)"
# strip out the " (España)" part
# or anything of the form " (...)"
$language =~ s/ \(.*\)$//; # ew, regular expressions
# $language is now "Español"
# Michael S. Kaplan on 5 Dec 2005 9:43 PM:
I don't think that every version of Windows universally used language and then everything else (region and script) always in parentheses 100% of the time.
Plus there is no way to say what a localizer may do here with the string once it is in their reach....
go to newer or older post, or back to index or month or day