You think that's bad? Just wait, it gets worse...

by Michael S. Kaplan, published on 2006/08/12 03:11 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/08/12/696161.aspx

The story I am telling here is completely true. I have only omitted project and people names to protect the guilty, and perhaps also the [rightfully] embarrassed....

The original mail that came to me was from someone who was getting some unexpected results from CompareString. The mail eventually boiled down to a simple question:

...CHT Vista build, I am passed an lcid of 0x10804 with the query. Is that correct? I can’t find that value on MSDN.

I admit I felt like I was being the teacher in How I Got Into College who looked at Marlon Browne's ungraded SAT (the bunch of dots on the page) and immediately expressed concern as his score, but the problem here was obvious -- calling the LCID value "for Traditional Chinese" a huge crap bag would have been an insult to large bags filled with crap!

You think that's bad? Just wait, it gets worse...

I immediately explained that this LCID value was completely bogus in two senses:

The LANGID portion is for PRC, which would be for Simplified Chinese;
The SORTID portion is invalid for any CJK sort.

They forwarded on to me the way that they were constucting the various Chinese LCIDs:

~~MAKELCID( MAKELANGID(LANG_CHINESE,         // Chinese~~
                     SUBLANG_CHINESE_SIMPLIFIED),
          SORT_CHINESE_UNICODE )

MAKELCID( MAKELANGID(LANG_CHINESE,         // Chinese/china
                     SUBLANG_CHINESE_SIMPLIFIED),
          SORT_CHINESE_UNICODE )

MAKELCID( MAKELANGID(LANG_CHINESE,         // Chinese/taiwan
                     SUBLANG_CHINESE_TRADITIONAL),
          SORT_CHINESE_UNICODE )

Yikes! This was getting worse and worse.

(The values are all struck out so that no one tries to use them!)

Now that first one was the source of the problem -- they were assuming there was a generic "Chinese" that was neither Simplified nor Traditional, and that by passing it they would get some nice generic results -- especially when they plunked in that Unicode support.

You think that's bad? Just wait, it gets worse...

They were also constructing their Japanese and Korean LCIDs in a similar way:

MAKELCID( MAKELANGID(LANG_JAPANESE,        // Japanese
                     SUBLANG_DEFAULT),
          SORT_JAPANESE_UNICODE )

MAKELCID( MAKELANGID(LANG_KOREAN,          // Korean
                     SUBLANG_DEFAULT),
          SORT_KOREAN_UNICODE )

Again, nice use of a generic, friendly sounding flag involving the word Unicode and all would be good. Nice generic results, right?

Well, I can answer that question with a question. Would failure of the function with GetLastError() returning ERROR_INVALID_PARAMETER be generic enough of a result? :-(

Perhaps looking at the definition of these flags with the latest winnt.h might shed some light here:

#define SORT_JAPANESE_UNICODE            0x1     // Japanese Unicode order (no longer supported)

#define SORT_CHINESE_UNICODE             0x1     // Chinese Unicode order (no longer supported)

#define SORT_KOREAN_UNICODE              0x1     // Korean Unicode order (no longer supported)

The Japanese and Korean Unicode sorts are those awful abominations I discussed earlier (here and here), and they were removed back in Windows XP, and the "Chinese Unicode sort" didn't exist even then (I think it was removed before NT 4.0 shipped, if not sooner?) -- and lacking the whole yen/won thing I imagine it had even less reason for being.

You think that's bad? Just wait, it gets worse...

After I straightened out their LCID story for these came the scariest part of all:

I’m wondering though why I didn’t get any complaints for these languages. In fact Korean was tested and works correctly.

It of course scared me for two reasons:

I had suspected, but now it was obviously confirmed, that they were not checking the return value of their CompareString call. Why does nobody ever check the return value?
Whoever was testing the results wasn't really doing any real testing of the language. Some new use the word tested of which I was previously unaware?

As I look to the future I think it is important to spend a lot more time evangelizing how to not only call the NLS functions, but to call them correctly!

This post brought to you by ₩ (U+20a9, a.k.a. WON SIGN)

# oidon on 12 Aug 2006 4:48 AM:

> Why does nobody ever check the return value?

Why was the API designed in such a way to indicate error with a return value when it should really throw an exception? Yeah, I know, legacy C-based API...

The idea behind GetLastError() is even worse.

There is a lot of poorly written code, but the APIs are often just as poor. Their only excuss is when the documentation is clearly written, as in this case.

# Michael S. Kaplan on 12 Aug 2006 10:49 AM:

Hi oidon,

Exceptions 15 years ago were a lot more expensive than they are now. Even now they are not entirely free. Creating code that throws at such a low level often requires one to sacrifice much in the way of performance; it is much better at higher levels than this....

If people write code poorly than they will do so whether you throw or make 'em return error values. People who write bad code will simply do so, as it is their nature.

# Dean Harding on 13 Aug 2006 7:51 AM:

Exceptions are also very language-specific. By using a simple C-based API, the API can be called directly by C, C++, C#, VB, and possibly more languages. If they'd used C++ exceptions, then only C++ would be able to directly call it. And only Microsoft C++ at that - C++ exceptions aren't part of the ABI of standard C++.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

go to newer or older post, or back to index or month or day