Kowloon 951

by Michael S. Kaplan, published on 2007/05/12 04:31 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/05/12/2561904.aspx


(As a point of information, both the fourth and ninth most important realizations of my entire life happened many years ago in two different places not far from Kowloon Bay; therefore, I want to make it clear that this somewhat lame attempt at a clever title is in no way intended to show disrespect to this wonderful place!)

(Also, to the extent that the title of this post is an allusion, it is most assuredly an allusion to Ray Bradbury's Fahrenheit 451 and specifically not to Michael Moore's Fahrenheit 9/11)

The other day, Ji Cheng asked:

Hello All,

I once successfully converted the encodings between Unicode and Big5 in VB 6 and VS.NET 2003, but failed in VS.NET 2005. I have checked it in Unicode.org, the results from VB6 and .NET 2003 are correct but VS2005 is incorrect.  Is this some kind of bugs in .NET Framework 2.0?

You can test Big5 code ‘9068’ - "邨", and the correct Unicode should be '90A8', but if I input ‘9068’ as a Big5 code, the corresponding Unicode returns "E473".... Besides, Most of the differences happen in the Big5 Code ranges FA41 ~ FEFE, 8E40 ~ A0FE, 8140 ~8DFE and C6A1 ~ C8FE

Does anyone have some ideas about it?

Any help is much appreciated. Thanks in advance.

We went back and forth for a bit (my VB knowledge is not what it used to be since .NET came around!) and before too long both Shawn and I realized the cause of the inconsistent phenomena -- it appeared to be yet another case of Kowloon 951.

This first attempt to support the initial version of the Hong Kong Supplemental Character Set (HKSCS) included a replacement for Windows code page 950 (also known as Big5, though even to this day no Taiwan native has been able to name the five big companies that this de facto standard is reportedly based on), sometimes affectionately or not so affectionately known internally as code page 951.

This code page, which made the same heavy use of the Unicode Private Use Area, has been for the most part as good as attempts have been to take the PUA and make it something semi-private.

A complete and total functionality and interoperability nightmare.

(The critics of this plan were reportedly not seeing their opinions subject to censorship, and I would not want my blog post title allusion to even jokingly indicate otherwise; but in any case the critics were overruled by those who wanted to get this support into Windows without waiting for the HKSCS/Unicode issues to be worked out, something that did happen in time for the update to HKSCS and to Vista)

Now when that "rogue" code page (okay, not really a rogue code page but it feels like one!) is combined with the fact that the <= 1.1 .NET Framework relied on the installed Windows code pages while the >= 2.0 .NET Framework actually carries its own code page tables around, one is guaranteed to lose all chance at reasonable, consistent interoperability when any machine that installs that update is involved.

Please do your best to move on to Microsoft's (and Hong Kong's) much more successful later efforts, and avoid Kowloon 951 with whatever abilities you might have at your disposal. :-)

 

This post brought to you by 𣟗 (U+237d7, an Extension B CJK ideograph that used to be mapped to U+E866 in the PUA)


no comments

referenced by

2011/01/13 Doing it for appearances, Hong Kong style!

2007/09/27 Don't look directly at the 951 code page if you can avoid it

go to newer or older post, or back to index or month or day