2001, a Correctness Odyssey (aka What's the matter with Ü?)

by Michael S. Kaplan, published on 2007/09/08 08:28 -07:00, original URI: http://blogs.msdn.com/michkap/archive/2007/09/08/4831056.aspx


Yes, as the title indicates this is post # 2001 of Sorting it all Out!

There have over the last few years on this blog and in the many years prior been recurrent themes related to the work I do, and the things that I care about. Today I am going to talk about one of those themes in particular.

You see, for several years now I have been both an observer and a fighter in the ongoing and never-ending battle between consistency/compatibility and correctness -- the desire to return the same results as previously even if they may be wrong and the desire to return the right results.

The struggle has been behind posts like Compatibility is inconsistent; consistency is not compatible...Consistency and correctness are both four letter words and many others.

While I understand the reasons for both, I am usually more convinced by the arguments of correctness, as I shudder to think about fighting for the right to be wrong....

I am not, however, a zealot about this (despite opinions you may have heard to the contrary), and there are many times that I have worked to make sure that consistent/compatible solutions are made available.

The real problem in that C/C vs. C battle is that one never knows to draw the line.

Until now, I mean....

You may recall when I pointed out in All right, mistakes were made #1 and All right, mistakes were made #2 how Ü and ü were being improperly expanded into UE and ue, respectively, in all locales in Vista.

This is obviously a very serious, customer reported issue. Although there has never yet been a release of Windows where changes that would fall in the major version category I talk about here and elsewhere have been made in a service pack, the fact that the results were so obviously incorrect and so obviously incompatible with the results with every prior version of Windows, it seemed obvious at the time to me and to the various folks triaging issues that this might be the time to make an exception to this rule.

In fact, as I mentioned in All right, mistakes were made #2 (What the %#$* is wrong with German Phonebook sorting?), the fix was put into SP1 of Vista, as well as in Longhorn Server (now called Windows Server 2008).

However, while it is usually the case that compatibility and consistency are put in the same category, it is crucial to decide what one needs to be consistent and compatible with. And after some pretty bad application compatibility issues were found with applications that were not using GetNLSVersion/IsNLSDefinedString to look out for a major version change1 (you may have seen some of this across the web -- the problems with Zune and Vista SP1 were pretty widely reported), it was clear that something had to be done.

After a great deal of investigation and effort, the fix is being backed out for this change in SP1 but kept in Server 2008 (we have a nice long history of being willing to make such changes in client versus server releases, and other changes that were required for some server-based products outweighed the passive desire to keep client and server releases exactly the same).

And thus the model which had previously been infomally architected is now confirmed and formally codified -- new versions are for correctness; service packs are for consistency and compatibility.

At every stage people involved in the debate have acted professionally, and trying to do the best thing for the customer, and his particular decision is one that I think people will be able to live with in the end. It is at some level unfortunate that we managed to lose something as basic as the correct alphabetical order for all of the languages that have Ü and ü in them2 (plus the ones like English that have a specific place where it "ought" to intuitively go). But applications breaking completely is clearly worse and it is only for this one version, which can now if nothing else make the claim that it is consistent with itself....

 

1 - In fairness the functions were not added in Server 2003 but ironically it was actually applications that did not even exist until after that which swayed opinions the most!
2 - In another burst of irony, someone from the Zune team also reported the incorrect way that Ü and ü were sorting in Vista, and asked when we planned to fix that!

 

This post brought to you by Ü and ü (U+00dc and U+00fc, a.k.a. LATIN CAPITAL LETTER U WITH DIAERESIS and LATIN SMALL LETTER U WITH DIAERESIS)


# Jonathan Wilson on Saturday, September 08, 2007 9:45 PM:

Is there any info out there about what enabling this change in Vista SP1 would actually do to an app? (a google search didnt find anything) Are there really apps out there that are going to break because they expect that Ü = UE on Vista?

# Michael S. Kaplan on Saturday, September 08, 2007 10:51 PM:

If one assumes that the results of a CompareString or LCMapString/sort key call within a version will not change, asnd they do, then it could lead to index corruption, which would lead to being unable to find results and other problems.

In this case there actually was an example (Zune) and anyone making that same assumption could have the same problems or worse....

# Sven Harazim on Monday, September 10, 2007 3:19 AM:

I wait for SP1 ;-)

# Rolf Frei on Monday, December 24, 2007 4:49 PM:

Seriously I can't understand that move to remove it from SP1!!!

Fact is that my DB-application doesn't anymore correctly work under Vista, as it did under an old Windows versions. So the point here ist that the change in Vista DID break the backward compatibility of old applications. The Databse System I use, uses the Windows language drivers for sorting, filtering etc.  Now the comlete german language driver in Vista is broken and if I understand you correctly, this will not be fixed at all?!

I can change the Language fo my DB tables to ANSI and everything works, but not anymore in an way my customers expect it, as now the sorting and filtering doenst anymore work as expecetd be any german user.

# Michael S. Kaplan on Monday, December 24, 2007 5:14 PM:

I am not going to disagree with you, Rolf. That was my call too -- I originally pushed for the fix, got it approved in triage and war, and checked it in. It was only later that forces more powerful than I set the bar higher (also slightly differently) and the fix had to be backed out.

But FWIW the fix IS in Server 2008, and if nothing else the German phone book sort does still happen to work even in Vista. Not exactly perfect, but that is where it is....

# Michael S. Kaplan on Monday, December 24, 2007 5:28 PM:

One workaround -- you can use U/u + combing diaresis (U+0055/U+0075 + U+0308) and they will both sort properly (in every collation except for the German phonebook sort)....

# Rolf Frei on Tuesday, December 25, 2007 9:55 AM:

As my application is non Unicode this isn't an option at all.

Whatever guy has decidet to remove that fix from SP1 is an idiot! Sorry for that words, but I think he hasn't realy checked the problem at all. The Vista behaviour breaks old applications and that fix will make it work again.  

To say it will not be inlcuded to not break applications is absolutly stupid, as Vista does now already break applications and a fix will fix the broken application and not break new applications. Any new application under Vista ist already broken, as it doesn't behave as it should for any german users.

Whatever person has removed that fix from SP1 should jumpin here to check what the problem realy is.

In additon I have written a small routine with CompareStrings, which does sort all caracters under XP and under Vista and I was just suprised how much the sorting has changed under Vista. Specially the lowest sortorder char (ASCII 179) which gets sortet below ASCII 0?!

On the following Links you can see the orderd list of the german ASCII table on XP and on Vista and how differetn they are:

http://www.eicom.ch/rolf/Charmap_XP.png

http://www.eicom.ch/rolf/Charmap_Vista.png

# Michael S. Kaplan on Tuesday, December 25, 2007 11:55 AM:

The major version of sorting was changed for for Vista, which means that applications should be checking for this fact and re-indexing as needed.

And there are different degrees of broken that we are talking about here -- the types of bugs that were hanging in the balance were index corruption, file list corruption, and applications being unable to boot. In the end it is hard to sustain a claim that such applications should be punished simply because they were not checking the sort version in a service pack when we have never made such a change in a service pack ever, just to make sure that Ü/ü sort properly.

By taking out the Ü/ü fix, these more serious problems have been averted.

# Rolf Frei on Wednesday, December 26, 2007 9:31 AM:

The main proböem is that CompareString under Vista doesn now handle u <> ü (and the same for the other Umlauts) and ü = ue which both is wrong.

On german rules it must be u=ü (or o=ö, a=ä,...) and ü<>ue (or ö<oe, ä<>ae,...). Vista violates this german language rules and produces wrong results. Sorry but this is a classical BUG and must be fixed.

I'm right sure if this is someting in the english language system and you Amis are affected by this, MS will release a fix imediatly, but as you are not directly affected by this bug, it is not that important for you (MS). That's a shame and I'm so frustrated now, that we germans are now lost alone and MS is not willing to fix the fault they have done!!!

That the Server version of Vista will get the fix but Vista itself not is another strange thing I can't understand....

# Michael S. Kaplan on Wednesday, December 26, 2007 12:03 PM:

You are preaching to the choir here, you really are about the overall issue.

But there is no German sort that says that u==ü, o==ö, a==ä on any version of Windows since there is at least a diacritic difference there....

# Ralf on Tuesday, February 26, 2008 6:42 AM:

I tested it with the RTM Vista SP 1 and I think it did not change !

Do you know something about it ?

# Michael S. Kaplan on Tuesday, February 26, 2008 8:31 AM:

Ralf,

The whole point of this post was to explain why the fix was backed out of SP1 (it was originally there but was removed, though it was left in Server 2008).

# Kirill Müller on Wednesday, March 05, 2008 4:02 PM:

Michael,

is there any chance that a hotfix for this problem will be published or available on request?

Our database system uses client-generated index data (using CompareString(NORM_IGNORECASE | SORT_STRINGSORT)) that can be accessed from clients running different operating systems (or different NLS versions, if you like). There is no central server, only file I/O. If I get the point (from your post "What makes a string meaningful?"), the correct approach would be to maintain separate indexes for each NLS version that has access to the database. A little bit of overkill, it kind of doubles (triples? quadruples?) index space consumption for our system. Is it ever possible that the sorting order of two meaningful strings changes between NLS versions?

You mentioned a workaround: "... you can use U/u + combing diaresis (U+0055/U+0075 + U+0308) and they will both sort properly (in every collation except for the German phonebook sort)...." So would this pseudo-code work?

'''''''''''''''''''''''

If IsWindowsVista() And lcid <> 0x10407

  s1 = Replace(s1, "ü", "u" + combing_diaresis)

  s1 = Replace(s1, "Ü", "Ü" + combing_diaresis)

  s2 = Replace(s2, "ü", "u" + combing_diaresis)

  s2 = Replace(s2, "Ü", "Ü" + combing_diaresis)

EndIf

Compare(s1, s2)

'''''''''''''''''''''''

Why does setting the application compatibility mode to XPSP2 not fix the problem? This seems to enable the "EmulateSorting" shim which sounded quite promising to me...

http://technet2.microsoft.com/WindowsVista/en/library/ac42bd59-3bd8-4b7d-b3ee-a4ec608d86181033.mspx?mfr=true

Thanks in advance for any advice on this. Your blog was the only source of information I found for this, as I would call it, show-stopper. Pity. This might be a minor issue for the vast majority of users, but still worth to be commented on officially.

Best regards,

Kirill

# Michael S. Kaplan on Wednesday, March 05, 2008 4:53 PM:

There is know way as far as I know to get a hotfix here (the original fix was rejected for Vista SP1 and had to backed out, though it was accepted for Server 2008).

But the workaround should work here -- if you denormalize the string so U Umlaut becomes U + Combining Umlaut then you shouldn't run into this problem....

The shim would have been a good place tp get something in, but it was not done though in retropect it probably should have been....

# Michael S. Kaplan on Wednesday, March 05, 2008 4:56 PM:

FWIW, I agree on the 'official comments" would be nice though I don't have any control over that at all these days. I generally put stuff here when I have no idea when/if it will be put somewhere official...


referenced by

2010/03/06 Burn Windows Burn (aka If we want to unsay *this* one, we cannot say "Mu")

2008/03/26 Vietnamese still ain't quite right

2008/02/19 Insanity defined: In the real world -0 == 0, in Vista -0 < 0, and in Windows Server 2008 -0 ≮ 0

2008/01/05 Zune 2.0 software is able to support Greek + Cyrillic + more Latin

2007/09/15 A&P of Sort Keys, part 5 (aka EXPANSIONing your horizons)

go to newer or older post, or back to index or month or day