The road to standards compat is paved with app back-INcompat

by Michael S. Kaplan, published on 2011/08/16 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/08/16/10196054.aspx


The other day, Jacob Schäffer responded to Windows isn't Office (and vice versa) in a comment:

The fact is that I'm hunting a *stable* way to use Locale Names for lookup *AND* need access to locale data on XP as well as on newer OS versions - using unique Locale Names for input. However, since the Locale Names for some locales can't be directly built on XP - even with good will - this appear to be a problem for me, since the environment I work in is VBA7 in Office 2010 (which - by the way - don't implement ANY way to lookup locale information by Locale Name, but only by LCID).

What I see is that the Windows environment can NOT deliver a *stable* mapping from Locale Names to Locale Identifiers unless I implement all sorts of workarounds. That's perfectly fine with me, since the world goes on and *if* I need backward compatibility I'm asking for trouble. Period.

Anyway, please point to an ISO standard that define "029" as a proper country identifier. The MSDN documentation for the LOCALE_SISO3166CTRYNAME says that it should return the ISO 3166 name for a country. I'd like to know which country "029" represent and where to find the ISO standard that says so ???

Now, let's assume that such a standard doesn't exist. Then, can developers rely on LOCALE_SISO3166CTRYNAME after all, or is the ISO standard insufficient in this regard ???

All the best /Jacob

One of the big problems here is that Jacob's Windows versions of interest span the period between when Microsoft's NLS data was pretty much based on Microsoft's terms and when Microsoft decided to try and follow the relevant standards that were in the process of being formalized at the time.

Most of the actual differences are listed in Microsoft Knowledge Base Article 939949 (Error message when you run an application or try to access a Web site on a computer that has a particular .NET Framework 2.0 software update installed: "Culture name 'Culture' is not supported").

The article enumerates 13 changes:

Old culture name New culture name
az-AZ-Latn az-Latn-AZ
uz-UZ-Latn uz-Latn-UZ
sr-SP-Latn sr-Latn-CS
az-AZ-Cyrl az-Cyrl-AZ
uz-UZ-Cyrl uz-Cyrl-UZ
sr-SP-Cyrl sr-Cyrl-CS
bs-BA-Cyrl bs-Cyrl-BA
sr-BA-Latn sr-Latn-BA
sr-BA-Cyrl sr-Cyrl-BA
bs-BA-Latn bs-Latn-BA
iu-CA-Latn iu-Latn-CA
div-MV dv-MV
en-CB en-029

Most of the first eleven represent Microsoft trying to build names the same way it built LCIDs, in language-region[-script] order. But after the folks who wrote one of the early name RFCs refused to entertain the notion of "name aliases", Microsoft eventually bit the bullet and decided it was better to break backward compatibility with less commonly used locale names than have yet another Microsoft-specific standard....

The last item was to replace an ill-advised "private" name for "English in the Carribean" (which had no ISO 3166 name), with an ISO 3166 numeric code (used by the UN, too).

And the second-last item was to correct a mistake where Win2000 originally though Divehi had only an ISO 639-2 name, and no ISO 639-1 name.

Can we trust the changes? Well, whether you can or not, you can trust the direction. It's coming from a good place.

In the end, Microsoft decided to do "the right thing', and all it cost was a bit of compatibility....


Doug Ewell on 16 Aug 2011 10:25 AM:

Allowing language-region-script as a "name alias" for language-script-region would have made parsing and matching of language tags much more difficult.

Numeric code elements are all generated by UN Statistics Division, and later used in ISO 3166-1, but the ones for supranational regions (like '029' for "Caribbean") are only used in UN M.49 (and the Language Subtag Registry). ISO 3166 doesn't concern itself with such regions.

Thanks to MS for following the standards here.

Mihai on 17 Aug 2011 11:48 AM:

Agree with Doug: thanks to MS for (almost) following the standards here.

:-)

Added the "almost" because of the of the non-so-standard the sorting is specified. But it only affects few locales. Not to mention that at the time MS made this decision the standard way was to use @, and the -u- extension was not on the radar yet (reference to UTS #35 :-)

So sometimes being too early can get you in trouble (see UCS2/UTF-16 :-)

But maybe in the future? One can hope :-)

Jacob Schäffer on 21 Aug 2011 11:46 AM:

Excellent, thanks. I recognise that standards are followed as far as possible and seem to have been since Windows Vista. Great!

I have a suggestion, though. Please add the above mentioned list to the NLS API References at MSDN along with a proper explanation as to why things changed. A good place to start might be at "Language Identifier Constants and Strings" (msdn.microsoft.com/.../dd318693(v=VS.85).aspx) and similarly for the varity of .NET "CultureInfo" documentation and - al least - at the NLS information page at the Go Global Developer Center (msdn.microsoft.com/.../bb896001).


referenced by

2011/11/22 The evolving Story of Locale Support, part 10: Perhaps it is best to think of it as unintelligent design?

go to newer or older post, or back to index or month or day