Punctuation... now, isn't that SPECIAL [weights] ?

by Michael S. Kaplan, published on 2006/11/26 02:31 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/11/25/1151550.aspx


Well, apologies to Dana Carvey and all, but actually, it isn't!

The other day when I talked about The problem of string comparisons, WORD sorts, and the minus that is treated like the hyphen, Oleg was thinking about the documentation for sort keys in LCMspString and maybe even asbout my post How do sort keys work?. And he realized that prior mappings that explained how sort keys worked had a flaw in them. After I pointed out that special weights were actually for some particular differnces in Kana, Oleg commented:

Then example string could be SO-DIMMソケット×2.

The sort key for this string is:

    0e 91 0e 7c 0e 1a 0e 32 0e 51 0e 51 22 16 22 0d 22 1c 22 1e 08 1c 0c 33 01 01 12 12 12 12 12 12 01 c6 c6 c4 ff 02 c4 c4 c4 c4 ff ff 01 80 0f 06 82 00

In this key "Special weights" piece is:

    c6 c6 c4 ff 02 c4 c4 c4 c4 ff ff

and the "punctuation weights" piece is:

    80 0f 06 82

And the LCMapString with the LCMAP_SORTKEY flag stores a sort key in the buffer, as an array of byte values in the following format:

    [all Unicode sort weights] 0x01 [all Diacritic weights] 0x01 [all Case weights] 0x01 [all Special weights] 0x01 [Punctuation weights] 0x00

"Punctuation weights" piece is specific for WORD sort and contains information about hyphen/apostrophe characters.

Is this correct description now?

 Indeed, Oleg's description is correct here. Because punctuation weights are not special weights. And THIS is how sort keys really work....

I'll try to get onto updating the documentation to give a slightly abbreviated version of this in future updates. :-)

 

This post brought to you by  (U+30bd, a.k.a. KATAKANA LETTER SO)


test on 2 Sep 2008 3:03 AM:

Some common dependent markers are: after, although, as, as if, because, before, even if, even though, if, in order to, since, though, unless, until, whatever, when, whenever, whether, and while.


referenced by

2010/06/10 WORD SORT...Why'd it have to be...WORD SORT?

2007/10/30 EXPECTED is in the eye of the [non-expecting type of ]expectant

go to newer or older post, or back to index or month or day