Leave it to Microsoft to take the most confusing thing and make it worse!

by Michael S. Kaplan, published on 2011/05/03 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/05/03/10160402.aspx

The other day when Tom asked me a question (discussed previously in Sometimes the things that used to be different aren't anymore), he didn't just ask me that one question....

His second question was about something else, though. So putting it in that same blog might have been kind of awkward.

A new blog seemed like a much more reasonable idea. :-)

Tom's following question?

Do all those cultures with LOCALE_ SMONGROUPING set to “3” really actually want it set to “3;0”, but there was just a mistake in understanding how to encode the preference?  (Similar question could probably apply to LOCALE_SGROUPING.)

Hmmmm. More locale stuff.

Let's start with the definitions again. Documentation is our friend, isn't it? I mean, it didn't work out so well last time, but maybe that was a fluke. Take 2:

Now it is would have been nice if LOCALE_SMONGROUPING could have samples like LOCALE_SGROUPING did. But basically it is all kind of inferred -- one is for numbers in general, and the other is for currency values.

Tom's question is interesting when you consider how easy it can be for someone to misunderstand the format and give the wrong value. One could easily imagine the format being wrong for a given locale if such a mistake is made....

Perhap more instructive is a look at the ones where these two values are different:

en-BZ English (Belize) 3;0 3
en-TT English (Trinidad and Tobago) 3;0 3
es-SV Spanish (El Salvador) 3;0 3
es-HN Spanish (Honduras) 3;0 3
es-NI Spanish (Nicaragua) 3;0 3
es-PR Spanish (Puerto Rico) 3;0 3
iu-Latn-CA Inuktitut (Latin, Canada) 3 3;0
moh-CA Mohawk (Mohawk) 3 3;0
ii-CN Yi (PRC) 3 3;0
ne-NP Nepali (Nepal) 3;2;0 3;0
si-LK Sinhala (Sri Lanka) 3;2;0 3;0
km-KH Khmer (Cambodia) 3 3;0
es-US Spanish (United States) 3 3;0

All the other locales have the values of these two properties identical.

Now I'll admit to being a little suspicious, of pretty much all of them. But in particular the notion where money is treated differently than other numbers. Though that could be due to my living for so long in a place where they are both the same.

Like much that is in locale data, it is entirely intuitive in you own locale and proof that some people out there are crazy in every other locale!

I haven't been to every locale in the world long enough to check then all out, but I know that some of them are formatted exactly as Windows suggests.And I doubt that is due to Windows being wrong but everyone using Windows.

At least not every time. :-)

In .Net, the attempt to make for something slightly more intuitive can be seen in NumberFormatInfo.NumberGroupSizes and NumberFormatInfo.CurrencyGroupSizes. That didn't turn out so well though.

You see, it has to do with not how they are documented, but how they are implemented (and then documented without the appropriate irony).

First let me take the same table as above and give you the .Net version of it (note that rather than a semicolon delimited string, the property is a one dimensional array):

Culture Name NumberGroupSizes CurrencyGroupSizes
en-BZ English (Belize) {3} {3,0}
en-TT English (Trinidad and Tobago) {3} {3,0}
es-SV Spanish (El Salvador) {3} {3,0}
es-HN Spanish (Honduras) {3} {3,0}
es-NI Spanish (Nicaragua) {3} {3,0}
es-PR Spanish (Puerto Rico) {3} {3,0}
iu-Latn-CA Inuktitut (Latin, Canada) {3,0} {3}
moh-CA Mohawk (Mohawk) {3,0} {3}
ii-CN Yi (PRC) {3,0} {3}
ne-NP Nepali (Nepal) {3,2} {3}
si-LK Sinhala (Sri Lanka) {3,2} {3}
km-KH Khmer (Cambodia) {3,0} {3}
es-US Spanish (United States) {3,0} {3}

I assume you can see the difference?

Yes, that is right. While Win32 says:

If the last value is 0, the preceding value is repeated.

While .Net says:

 If the last element of the array is not 0, the remaining digits are grouped based on the last element of the array. If the last element is 0, the remaining digits are not grouped.

Yes, that's right -- the semantics were pretty much reversed.

And to make matters weirder, Microsoft Locale Builder:

uses the .Net rules, yet it uses the native code (semicolon-delimited) format!

So, since both products have the same source now (both in the Microsoft locale data and in custom locales/cultures, the code in one of them has to reverse the information when building the string.

Not that it really matters, but they did technically choose the slower way to store it all -- want to make a wild guess which way that might be? :-)

To get back to Tom's question: I believe most of the values are correct, because over the years small corrections have been made based on bug reports. But the values are confusing enough that it is easy to imagine that any time the person providing the data doesn't look at a sample displaying how the value formats values the results will be wrong, until someone finally reports a bug.

This reversal, and this design, really makes very little sense to me, for what it's worth. I mean except for sentiment in the title: Leave it to Microsoft to take the most confusing thing and make it worse!

I might have to start insisting that Tom provide a beer -- one per question. The rules will vary for others but generally speaking I can't process one of his more interesting edge case questions without wanting a drink. :-)

ErikF on 3 May 2011 12:32 PM:

It's a good thing that the Brazilian real used before 1942 (en.wikipedia.org/.../Brazilian_real_%28old%29) isn't in use any more. How would you set up a system that allows for arbitrary symbols for separating arbitrary locations in a number? It sounds like a recipe for disaster! I'll let you localization guys sort that stuff out, thanks; I think that I take enough aspirin already!

Emperor XLII on 4 May 2011 7:05 AM:

Just from your description, the .NET behavior seems more logical to me. Effectively, the .NET rule is "repeat the last value", and it seems obvious that repeating zero will always produce zero (i.e. nothing). Whereas the native rule is "if the last value is zero, go back up the list and use the previous value, otherwise stop" (i.e. more complicated to explain . . . or maybe it's just my .NET bias showing through :).

Michael S. Kaplan on 4 May 2011 7:16 AM:

Ah but you have to weigh that against both the decade of prior behavior and the way everything was implemented -- seems geared to maximum befuzzling! :-)

Bjartur Thorlacius on 4 May 2011 2:11 PM:

As an Icelander, seeing the language code for Icelandic as written in Canada (using Latin script) struck me as odd. I was beginning to whonder whether the Icelandic spoken in the West-Iceland "colony" (i.e. the Icelandic ghetto in Canada) had it's own locale in Windows.

The language code for Inuktitut is iu.

Indeed! Fixed now. Sorry about that! -- Michael

referenced by

2011/05/04 Regarding the overthinking and underimplementing of names

go to newer or older post, or back to index or month or day