"àèìòù" < "äëïöü" but "àèìòù " > "äëïöü"

by Michael S. Kaplan, published on 2006/10/31 04:37 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/10/31/906323.aspx


You may remember my post I need my SPACE, symbolically speaking from this past March.

There are some interesting consequences of this behavior, which I thought I would talk about a bit further since they have been the subject of several recent bug reports....

Let's take a simple string like

àèìòù (U+00e0 U+00e8 U+00ec U+00f2 U+00f9)

and compare it with

äëïöü (U+00e4 U+00eb U+00ef U+00f6 U+00fc)

Just pass them both to CompareStringW using 0x0409 for the LCID, and you will find that "àèìòù" < "äëïöü". But if you add a space to the first string, then you will see that "àèìòù " > "äëïöü".

Huh? How'd that happen?

Well, let's look at the sort keys of each of the three strings we are looking at here:

"àèìòù"
0e 02 0e 21 0e 32 0e 7c 0e 9f 01 0f 0f 0f 0f 0f 01 01 01 00

"äëïöü"
0e 02 0e 21 0e 32 0e 7c 0e 9f 01 13 13 13 13 13 01 01 01 00

"àèìòù "
0e 02 0e 21 0e 32 0e 7c 0e 9f 07 02 01 0f 0f 0f 0f 0f 01 01 01 00

Aha, things maybe are a little clearer now. The letters have consistent weights, as do the diacritics. And so the first string comparison sees equal primary weights but a difference in the secondary weights. And that second comparison sees a difference in the primary weights, so suddenly the order is reversed. Oops!

Now this will happen with any symbol (or for that matter anything with a primary weight, but for some reason the SPACE and similar characters have results that seem less intuitive!), though simply passing NORM_IGNORESYMBOLS will cause the space or other symbol to be ignored.

Now this is the first example. I will give some of the others in a later post. And maybe some thoughts about how the issue of intuitive results could perhaps be looked into, and why the solution is less obvious than it may seem at first....

Have I scared anyone yet? If so, then Happy Halloween! :-)

 

This post brought to you by " " (U+0020, a.k.a. SPACE)


llllllllllllllllllllllllllllllllllllllllllllllllllll on 16 Jun 2011 9:13 PM:

lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll

Name on 16 Jun 2011 9:14 PM:

did you mean alt?

Michael S. Kaplan on 16 Jun 2011 10:15 PM:

I do not understand the question....


referenced by

2006/11/01 If you add enough characters to a sort, intuitive distinction can suffer

go to newer or older post, or back to index or month or day