How many ways can you identify a language?

by Michael S. Kaplan, published on 2006/03/19 12:01 -05:00, original URI:

Well, the answer to that question is too freaking many if you are talking about the CultureInfo class in the .NET Framework and referring not to the actual names of CultureInfo like DisplayName, EnglishName, and NativeName but to actual [nominally descriptive] identifiers:

Now the first three on this list can be used to create a CultureInfo, the fourth is usually an ISO 639 three letter code, as I described in this post, and the last two are somewhat self explanatory if you know about those ISO standards.

You can assume that the LCID is there mostly for compatibility with Windows (not a huge stretch, given its documentated description ("In most cases, the culture identifier is mapped to the corresponding National Language Support (NLS) locale identifier").

You can further assume that Name is the original ideal identifier in the .NET Framework that is somewhat more grounded in standards than the proprietary LCID mechanism.

And then you can further assume that the IetfLanguageTag is a later feature added in the 2.0 .NET Framework to fill in the holes in the Name property since if you need to conform to standards that somewhat becomes a four letter word, if you know what I mean.

Then you can further assume that the rest of the properties are informational pieces, supported because they were in Windows and because you never know when you will need them now, and also because they are interesting building blocks for the Name and the IetfLanguageTag.

Is it surprising that Scott Hanselman (my prototypical example of a smart developer!) would run into issues when he needs to create a CultureInfo from data that he has, as described in his post Not quite enumerating (iterating) enums? No, in my opinion, it is not.

While each of those names, and each of those identifiers, has a rich history in Windows and a precise reason for being where it is, the end result is still not only confusing but in the end quite possibly unable to produce the results that people need.

And it is a problem that is just as easy to run into when ASP.Net is trying to make decisions about the HTTP Accept language or Avalon (WPF) is trying to decide on something to put in the XML 'lang' tag in its XAML file, or MUI needs to move away from LCIDs in the Vista release, or any of another 1000 scenarios that people have been running into (or being beaten up by) for the last few years.

But in many (perhaps even most) cases, people do not need the power of a CultureInfo object; they simply need an identifier, and maybe occasionally a name. People find the CultureInfo because it is the only game in town, not because it is necessarily the right tool for the job.

Perhaps that suggests the perfectly reasonable need for a new lighter weight object (or objects), possibly based on the full measure of ISO 639 and ISO 3166, or maybe even RFC 3066. At which point the CultureInfo is just the small set of items for which all of the data exists and has gone through the proper review.

On the other hand, the way we got into this confusing tangle was perfectly reasonable men and women who were adding locale data items to meet perfectly reasonable customer needs.

So I wouldn't want to rush in to solve the problem by throwing new classes and methods and properties at it, just to find out that the confusion has only been added to.

A pretty problem, to be sure. One that is going to require some thought....


This post brought to you by "L" (U+ff2c, a.k.a. FULLWIDTH LATIN CAPITAL LETTER L)

Michael S. Kaplan on 8 Apr 2006 7:13 PM:

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day