The most important language in the whole wide world is yours, and you hardly even know yours! -- NOT!

by Michael S. Kaplan, published on 2008/02/19 10:16 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/02/19/7791306.aspx


Some of you who grew up with Sesame Street may recall the old short they ran with the catchy lyric that inspired the title:

The most important person in the whole wide world is you, and you hardly even know you!

It is something that Paul C. Vitz (in his Psychology as Religion: The Cult of Self-Worship) has denigrated due to how it "...fills the empty self, but it perpetuates passivity and weakness."

Harsh words, but perhaps good ones to hear (I kind of ignored the Sesame Street bit, since even at a young age I knew that I was the most important person in the whole wide world to me even if not to the other misguided souls that are held by gravity to the third rock from the sun and didn't have a ton of respect for the people who needed this kind of thing to feel good about themselves, much preferring to "hanker for a hunka cheese" as the other little short would point out. In fact I suspect that this kind of sappy crap is actually where my cynicism was first born? :-)

Besides, as I just overheard, You don't go and change the color of the carpet when you are invited over for dinner. If you really were the most important person in the whole wide world, such behavior would be quite socially acceptable, as long as you did it. Do we need further proof in the flw in the reasoning? :-)

In any case, I was thinking about all of this just yesterday, when a question came in to product support from a customer who was seeing some strange behavior changes between versions of Windows:

Below are two lines of code that customer used to reproduce the discrepancy between Vista and XP.
Why the difference in behavior?  Is there a good work around?

printf("  using 0x409: %d\n", CompareStringW(0x409, 0, L"대담한", -1, L"Roman", -1));
printf("  using 0x412: %d\n", CompareStringW(0x412, 0, L"대담한", -1, L"Roman", -1));

Using these two lines of code on Windows XP (32-bit) the output was:
  using 0x409: 3
  using 0x412: 1

Under Vista (64-bit) the output was:
  using 0x409: 3
  using 0x412: 3

Under Vista (32-bit) the output was:
  using 0x409: 3
  using 0x412: 3

Just to decipher some of the hard-coded numbers for people who don't speak the MicrosoftReturnValue or MicrosoftLCID dialects, as described in MSDN in the Language Identifier Constants and Strings topic and the CompareString/CompareStringEx topics:

0x0409 == MAKELANGID(LANG_ENGLISH, SUBLANG_ENGLISH_US)

0x0412 == MAKELANGID(LANG_KOREAN, SUBLANG_KOREAN)

1 == CSTR_LESS_THAN

3 == CSTR_GREATER_THAN

Some may recall a previous blog, in fact one of the blogs from the very first month I started blogging, entitled Unlike LCMapString, the sort keys for English characters precede the sort keys for Korean. The blog was trying to explain some confusing text in the depths of the notes for the managed SortKey class help topic:

Working with the value of a SortKey object is equivalent to calling the Windows API LCMapString method with the LCMAP_SORTKEY value specified. However, for the SortKey object, the sort keys for English characters precede the sort keys for Korean characters.

The truth is that this Windows behavior that puts Han and Hangul first before all the other scripts in the Korean locale exists for both both sortkey generation and string comparison, even though it is only really ever documented in obscure places like that SortKey topic, that blog post reference above, and my later A&P of Sort Keys, part 12 (aka Han sorts first!).

Between you and me, the idea of "documentation" that is only contained in this blog is not something I am entirely comfortable with, and not only for the reasons that inspire disclaimer's like Raymond's. This is a topic that I plan to blog about another day since it includes just the mix of Policy, LCA, insanity, and inanity so as to delight and inspire the cynic in me. Stay tuned if this topic interests you! :-)

Now which script comes first -- Latin or Hangul, Han/Hangul vs. everything else, is entirely arbitrary. And the exact reasoning and nature of the Korean behavior being different, being at least a decade old, is no longer fully understood or known beyond the vague nature of a "request from the subsidiary" that people within the subsidiary can't recall the exact information about (or at least they didn't the last time I asked!) though given the generic idea that being first is somehow "better" is as good of a theory as any. I suppose we are just lucky that no other subsidiary had such a thing done for their language, too!

Reader George commented in that early blog:

Why can't you just remove this re-ordering in a future version? It seems like a weird step best removed.

Funny George should mention that, huh?

That is exactly what happened in Windows >= Vista -- this weird re-ordering was removed from the product.

Of course as a side effect we are seeing the very issue that is central to the original customer complaint that inspired this blog!

There is no real "workaround", though. The update to the collation tables that happened in Vista was accompanied a major version change as well, which means such changes can be expected....

Now you may disagree with this direction, in which case you should leave your address in the comments with an invitation to dinner and I will come by to change the color of your carpets as you explain your point of view on the matter to me. :-)

 

This blog brought to you by(U+b300, aka HANGUL SYLLABLE TIKEUT AE)


# Zooba on 19 Feb 2008 5:18 PM:

I'd love some new carpet, but unfortunately I agree completely :-P


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2010/11/09 I [will have] told you so! Well, perhaps too late (all things considered)...

2008/02/20 Officially unofficial? Or Unofficially official?

go to newer or older post, or back to index or month or day