Unlike LCMapString, the sort keys for English characters precede the sort keys for Korean

by Michael S. Kaplan, published on 2004/12/31 12:38 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2004/12/31/344888.aspx


The title for this post actually comes from the SortKey help topic:

Each character in a string is given several categories of sort weights, including script, alphabetic, case, and diacritic weights. A sort key serves as the repository of these weights for a particular string. For example, a sort key might contain a string of alphabetic weights, followed by a string of case weights, and so on. SortKey is equivalent to the Windows API method LCMapString with the LCMAP_SORTKEY flag. However, unlike LCMapString, the sort keys for English characters precede the sort keys for Korean characters.

Someone asked me what the hell that text refers to!

Well, a decision was made back in the early days of Windows (that incidentally many have had cause to regret) to cause ideographs for Korean to be sorted in front of all of the other letters (including the Latin script letters of English). This code exists on all of the Windows NT-based platforms and on the Windows CE platforms, but when the time came to support sort keys on the .NET Framework a decision was made to explicitly not do this.

There is no real linguistic basis for either behavior, its arbitrary either way.

Though since the .NET Compact Framework uses the WinCE OS tables to do its work, it means that the WinCE results will differ from the .NET Framework everywhere else.

It is worth mentioning that text in the SortKey topic is a little confusing since it does not make clear that this only happens for the Korean LCID. And since the Windows behavior is not completely documented it does not cover the fact that neither Extension A nor Extension B ideographs are supported by it (though at present none of them are given intentional Korean-specific weight, a fact that will change in future versions).

In any case, thats why Korean has the Korean coming first. Though when you consider the fact that this affects over 20,000 ideographs, the image of someone asking if they could cut in line at the supermarket "because they only have 20,000 items" is a little scary. :-)

This post brought to you by "" (U+11f4, a.k.a. HANGUL JONGSEONG KAPYEOUNPHIEUPH)


# George on 31 Dec 2004 1:01 PM:

Why can't you just remove this re-ordering in a future version? It seems like a weird step best removed.

referenced by

2010/11/09 I [will have] told you so! Well, perhaps too late (all things considered)...

2008/02/19 The most important language in the whole wide world is yours, and you hardly even know yours! -- NOT!

2007/10/08 A&P of Sort Keys, part 12 (aka Han sorts first!)

go to newer or older post, or back to index or month or day