by Michael S. Kaplan, published on 2010/12/10 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/12/10/10102521.aspx
Regarding the main title of today's blog, this still is not a @#%&*! series.
For today's blog's alternate title, I am relying on the fact that the vast majority of technical content on this blog refers to something with internationalization or globalization or localizability overtones.
I mean, I am not saying that your parents are stupid. I don't even know most of your parents, after all.
This is not an anti-Microsoft consp[iracy theory claiming that if you use .Net that this indicates your parents screwed something up.
And I am not saying that Anders Hejlsberg , the man who for all intents and purposes is the father of C#, is stupid. Because he isn't.
And I am really not saying that most of the .Net Parent properties are stupid, because (a) I don't know every one of them, and (b) most of the ones I do know aren't stupid. Statistically speaking, some may be, but none of them I know.
Well, with one exception.
CultureInfo.Parent is stupid. Really stupid.
Perhaps this would be an appropriate time (after the provocative eye-catching title and cutesy introduction with art and bold central statement - my basic formula) to explain the basis for the claims of the aforementioned statement.
I'll start from another topic, that talks about internal usage patterns of the property rather than the property itself.
Like PropertyInfo.SetValue Method (..., CultureInfo)'s description of its CultureInfo parameter:
The CultureInfo object that represents the culture for which the resource is to be localized. Note that if the resource is not localized for this culture, the CultureInfo.Parent method will be called successively in search of a match. If this value is null, the CultureInfo is obtained from the CultureInfo.CurrentUICulture property.
Or ResourceManager.GetString Method (..., CultureInfo)'s description of it's CultureInfo parameter:
The CultureInfo object that represents the culture for which the resource is localized. Note that if the resource is not localized for this culture, the lookup will fall back using the current thread's Parent property, stopping after looking in the neutral culture.
Not for nothing, but the fact that these two descriptions and others like them are different is also a bug. As are some of the descriptions themselves.
Anyway, you get the idea -- the usage of this property is for resource fallback.
So that if you look for resources of one culture but cannot find them then it knows to fall back to another culture, and so on.
This is not stupid.
Well, it is not too stupid. It is a little stupid since in most cases this can also be done by simply looking at the string and chopping pieces off successively. But this potential bit of stupidity is obviated by the fact that the are exceptions to that principle, like
zh-TW ---> zh-CHT --> {Invariant}
and such. Thus having it in a property is sensible.
Okay, now let's look at the property's own documentation and its description of this CultureInfo.Parent property whose usage is widely, if inconsistently, described:
The cultures have a hierarchy in which the parent of a specific culture is a neutral culture, the parent of a neutral culture is the InvariantCulture, and the parent of the InvariantCulture is the invariant culture itself. The parent culture encompasses only the set of information that is common among its children.
If the resources for the specific culture are not available in the system, the resources for the neutral culture are used. If the resources for the neutral culture are not available, the resources embedded in the main assembly are used. For more information on the resource fallback process, see Packaging and Deploying Resources.
Um, where do I start?
Well there is the whole Packaging and Deploying Resourcestopic that often has different rules, but we'll set that aside. Yes it makes the rules more complicated, but first let's focus on the simple design without dragging in signatures.
Okay, so we have established that Microsoft means for this property to support something more interesting than I can do myself parsing the name for dash delimiters.
But all of the interesting work that happens in LIPs to fall back like ca to es? Not in there.
The fact that most language (like Arabic and English and French) only ship one version despite having 5-10 or more locales yet the fallback does not fall back to that language that would help them out? That'd not in there either.
Office's LCID-based fallback model? Also not covered well -- they have to do their own thing.
And the early adopter of Windows language support via things like LOCALE_IDEFAULTLANGUAGE? That isn't supported, either.
Don't even get me started on claims in docs like
The parent culture encompasses only the set of information that is common among its children.
that fail the smell test for cases like sr-Latn-CS and sr-Cyrl-CS that have different scripts and code pages yet both fell back to sr for years. Same story with az-Cyrl-AZ and az-Latn-AZ have the same story, as does uz-Cyrl-UZ and uz-Latn-UZ.
Then let's talk about how everything falls back to Invariant -- which has a name of the empty string and thus cannot ever have a directory with resources in it since a directory with no name is illegal on Windows.
Okay. So we have a model that isn't going to match the majority of incoming or outgoing traffic. for Windows, Office, or anyone following Windows or Office (the largest and for most large scale purposes the only significant adopter of MUI on the platform -- including .Net's own defaults (that start from those of Windows). And which fails to match BCP-47 in key interesting areas for many years (as China and others might note).
This is just stupid. A terrible design bolted atop a platform that has a design that for better or worse supports more than 60% of Microsoft's revenue.
Now one could blame Windows/DEVDIV rivalries for these dumb incongruities or a slow burning dud of a Silverlight sorta fiasco later recovered, but one would be wrong in this case, since the design of CultureInfo and the CultureInfo.Parent were designed and implemented by Windows -- and they own the data as well, tho0ugh not the technoloogy that does the loading.
"But we were just following orders!" the resource loading code would claim at the "Code Crimes" trial.
Perhaps it is yet another conspiracy theory -- Microsoft screwing over the Developer Division by providing them with an unusable language resource loading model fallback plan that is not compatible with Windows even when managed components try to run on Windows?
Nah.
I'm overthinking it again like I did in Anti-Microsoft conspiracy theories are fun #5 (aka Microsoft is not supporting the terrorists, dammit!) and Anti-Microsoft conspiracy theories are fun #3 (aka Why the hell can't they just update Uniscribe?), ascrtibing to brilliance or malice what almost certainly not neither.
It's just stupid....
Mihai on 10 Dec 2010 12:15 PM:
For a bit of an outside perspective: overall I find most of the .NET globalization classes very well designed (especially when compared with other APIs/frameworks). There are some kinks in implementation? Yes. But overall the foundation is ok. And most of it was there since .NET version 1.0, which is not very often the case :-)
Anyway, back to the parent thing: I think the approach right now is kind of broken everywhere: RFC 4647, .NET, Java, you name it.
Because (for instance) nobody does any effort to match "sideways" or to a "mutually intelligible language".
If I ask for zh-TW and there is no zh-Hant, nobody tries zh-MO or zh-HK. In many cases there is no fallback from mo-Latn (Moldavian which "is a language" just because political agendas) to ro (or ro-RO or ro-MD). Or even fallback between no (deprecated), nn and nb. Might also be ok to serve mo-Latn or ro when one asks for mo-Cyrl-MD (Moldavia switched to Latin script 20 years ago you would expect that the old guy who still prefers Cyrillic can read Latin).
There are areas where you can work around these limitations (copy the ro resources as mo). But if you don't control the list of what's available, you can't do that (for instance when you search for content in a certain language).
Michael S. Kaplan on 10 Dec 2010 3:26 PM:
Mihai, I entirelty agree. In fact the only thing wrong with even this one property is the data underneath it, which is the cause of all of the problems I mention. There is a lot of potential for good results here -- but not with the data used there now. :-(
Michael S. Kaplan on 10 Dec 2010 3:32 PM:
The one other property in the globalization classes that is truly awful is the console fallback -- which is also a plan for resource fallback. Just terrible the namespace is at resource stuff!
alexcohn on 13 Dec 2010 12:53 PM:
I beg to differ. There are too many cultural (i.e. political) issues involved, to expect that automaI beg to differ. There are too many cultural (i.e. political) issues involved, to expect that automatic choice made by software will not cause disappointment or even rage. In Moldova, there are people for whom Cyrillic script is a matter of principle. I believe that the same is true for Serbia.
What's wrong with application explicitly asking the user preference?
Well, system or a web browser could keep the default or ordered list of defaults - but please, don't attempt to deside FOR the user .
referenced by