Burn Windows Burn (aka If we want to unsay *this* one, we cannot say "Mu")

by Michael S. Kaplan, published on 2010/03/06 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/03/06/9972239.aspx


This blog was written by a chap named Alan Smithee....

The use of Mu as a way to make it as if one had not said something is widely known (required Wikipedia reference here).

But in noting a particular reported bug and wanting to "unsay" it or somehow make it as if the bug was not unleashed upon the populace, I found a time where I could not say Mu.

Which is just as well since I cannot "fix" the bug in the version where it exists.

Even though it was fixed shortly thereafter.

Wait.

Maybe I should tell the whole story.

Not too long ago, John asked via the Contact link:

Hello!

I'm not sure if this is the best place to leave a message of this kind, however after wandering aimlessly through the web (and less so through your blog :-) I figured it was as good a place as any.

A couple of years back I arrived at your blog having "discovered" Vista's German dictionary/phonebook (of the letter ü/Ü) sort order bug (http://blogs.msdn.com/michkap/archive/2007/09/08/4831056.aspx)

Now, a couple of years and a few more dictionaries later, I seem to have run into another problem. However this one doesn't seem to be documented anywhere (at least none of my many dozens of searches with Google/Bing/Yahoo were able to turn up any hint of this).

The problem? When sorting Greek words the sort order for the Capital Letter Mu (Μ) is wrong! Basically what is happening is that the letter Μ (under Vista) is being sorted *after* the small letter μ. Curiously *only* the letter Μ is affected, and this change was introduced in Vista (sort order in XP is unaffected).

I don't suppose you're aware of this problem? And if you are, do you know of any way to *fix* it? The German ü/Ü problem was (fairly) easily fixed by considering the ü as "ue" (the apps in question are not unicode aware, so I couldn't make use of the original solution mentioned in the blog). But for this problem in Greek...

If the problem were limited to the initial letter it may not be too difficult to come up with a hacked solution, but the problem also affects abbreviations e.g. sort order in XP:

άλωση
αλώσιμος
Α.Μ.
άμα
αμαγείρευτος

and in Vista

αμφοτέρωθεν
άμωμος
Α.Μ.
αν
ανά

I'm not going to hold my breath for a solution to this problem, but at the least it might make for a new blog article one day :-)

Many thanks for listening,
John

Interesting question, that -- not just for its own sake, but the words themselves!

I'll explain what I mean.

It is this statement in particular:

Basically what is happening is that the letter Μ (under Vista) is being sorted *after* the small letter μ

Now all linguistic collations on Windows and .Net other than the Hungarian technical sort (described in Technically it *is* a hungarian sort) sort with the lowercase letters before the uppercase ones (as I mention in Which comes first, 'a' or 'A' ?).

So with an observation like John's I was at first confused -- the capital letters are always after the small ones. That is their weight.

But then I saw what he was talking about -- the small and the capital, which should have a tertiary (case) distinction (ref: here) have a primary distinction. As if they were entirely different letters. Looking at the underlying weights:

0x03bb 15 22 2 2  ;Greek Small Lambda
0x039b 15 22 2 18 ;Greek Capital Lambda
0x03bc 15 24 2 2  ;Greek Small Mu
0x039c 15 26 2 18 ;Greek Capital Mu
0x03bd 15 28 2 2  ;Greek Small Nu
0x039d 15 28 2 18 ;Greek Capital Nu

you can see the problem. Well I was wrong -- there is a tertiary distinctioin (the 2 vs. 18) thing. But I was also right in that there is a primary distinction (the 24 vs. 26 thing).

This problem is definitely a bug.

Luckily it was fixed in Server 2008 and is correct in all versions after that too (Windows 7 and Server 2008 R2). The new weights:

0x03bb 15 22 2 2  ;Greek Small Lambda
0x039b 15 22 2 18 ;Greek Capital Lambda
0x03bc 15 24 2 2  ;Greek Small Mu
0x039c 15 24 2 18 ;Greek Capital Mu
0x03bd 15 28 2 2  ;Greek Small Nu
0x039d 15 28 2 18 ;Greek Capital Nu

This is not great comfort if you are using Vista of course, even if I were able to say Mu to unsay this one (which I can't, for multiple reasons), but this could be yet another good reason to consider an upgrade. :-)


Mihai on 8 Mar 2010 10:36 AM:

There are other funny things about Hungarian technical sort when mixing the wide versions of the characters.

Below I will use the wide versions (U+FF21 and U+FF41) at left, and regular versions (U+0041 & U+0061) at right.

hu-HU_technl:

  a < a

  a > A

  A > A

  A > a

Ignore NORM_IGNOREWIDTH:

  a < a

  a == A

  A == a

  A > A

hu-HU seem to behave as expected.

(All on Win 7)


referenced by

2010/03/09 Coloring outside the lines in the a-ness of the Hungarian Technical Sort

go to newer or older post, or back to index or month or day