by Michael S. Kaplan, published on 2005/07/26 09:30 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/07/26/443375.aspx
Yes, I said it -- all code page architectures are created equal. But in the most Orwellian sense, some are more equal than others....
First I will digress into a favorite Odgen Nash poem of mine, which is very short. I pretty much memorized it:
Let's talk about eggs:
Eggs have no legs.
Let's talk about chikens:
Chickens do have legs.
The plot thickens --
eggs come from chickens!
But they have no legs under 'em
What a conundrum!
Why this poem popped into my head may become apparent shortly. If not then it is still a nice poem (Ogden Nash at his finest!).
Anyway....
If you look at the official, sanctioned encoding architectures owned by the GIFT team, there are three of them:
(there is a fourth model for Kernel mode and the Rtl* functions that can be used in both kernel and user mode, but I will cover that another day -- for my purposes here just consider it for now like Win32 but more limited!)
If these were three entirely separate models, it all might be easier. However:
Talk about conundrums -- these three models are so interrelated even though there are so many times that their behavior differs that I doubt anyone will ever be able to sort out the behavioral differences.
It represents complex pieces of code in three code bases written across nine versions of Windows, three versions of IE, and three version of the BCL, using unmanged, managed, and COM based code. It is very hard to figure out what is a bug to fix, what is a bug we are stuck with for backcompat reaons, what is an intentional feature that only looks like a bug because the behavior was not documented well enough. You can get a headache trying to figure it out sometimes (and many have!).
So what does it all mean?
Well, as Shawn Steele, the owner of the bulk of this complex set of code bases likes to say, people ought to just be using Unicode. And Shawn is spot on here -- the more complex the code page work you do, the more likely you are to run into problems with the use.
Now I do not include UTF-8 (or even UTF-32 in the .NET Framework) with the rest of those code pages, since it is a Unicode encoding form and all, but just about everything else ought to be a "use if you have to convert something, but then once it is converted stop using!" model.
Bue please just try to use Unicode, like the opersting system and the .NET Framework prefer, and were basically designed for....
This post brought to you by "ೡ" (U+0ce1, a.k.a. KANNADA LETTER VOCALIC LL)
# Paul Ballard on 26 Jul 2005 11:06 AM:
# Ivo on 26 Jul 2005 2:53 PM:
# Michael S. Kaplan on 26 Jul 2005 3:13 PM:
# HASEGAWA Yosuke on 26 Jul 2005 10:04 PM:
# Michael S. Kaplan on 27 Jul 2005 2:35 AM:
# Michel Lemay on 5 Aug 2005 10:45 AM:
Yuhong Bao on 2 Dec 2008 11:10 PM:
"(there is a fourth model for Kernel mode and the Rtl* functions that can be used in both kernel and user mode, but I will cover that another day -- for my purposes here just consider it for now like Win32 but more limited!)"
I'd just call it the Native API model.
Michael S. Kaplan on 3 Dec 2008 12:43 AM:
Well, that might cause more confusion for some people as "Native" has become the preferred term for the C++ team when referring to code that is not managed code (they prefer "native" to "unmanaged").
I know kernel mode devs had the term first, but there are fewer of them so they may not win that one. :-)
Yuhong Bao on 14 Nov 2010 7:52 PM:
"It represents complex pieces of code in three code bases written across nine versions of Windows, three versions of IE, and three version of the BCL, using unmanged, managed, and COM based code. It is very hard to figure out what is a bug to fix, what is a bug we are stuck with for backcompat reaons, what is an intentional feature that only looks like a bug because the behavior was not documented well enough. You can get a headache trying to figure it out sometimes (and many have!)."
Yea, the fundamental flaw is that MLang was originally part of IE and was layered on top of the Win32 codepage model. As such, it had to run on multiple versions of Windows, accounting for changes in the Win32 codepage model underneath between various versions of Windows. Often when the Win32 codepage model changed, MLang had to be changed as well (for example, removing a workaround for a bug that has been fixed in the Win32 codepage model depending on the version of Windows). Eventually MLang became part of Windows itself, but still retains most of the cruft.
Yuhong Bao on 14 Nov 2010 8:04 PM:
Add the fact that IE (which was what MLang was part of) was updated independently from Windows, so if a bug-fix from Windows interferes with a workaround from MLang, IE would have to be updated at the same time to fix MLang.
referenced by
2006/12/25 Anyone out there switching modes in JIS?