by Michael S. Kaplan, published on 2005/08/24 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/08/24/455446.aspx
Yes, there is a problem with code page 20269. And there has been, since birth.
It is intended to be an implementation of ISO-6937. Unfortunately it cannot really be used for its intended purpose, to provide a form for combining characters for Latin-1. The ISO standard works as follows:
ISO 6937 has for characters single letters and combinations of a letter with a diacritic. Only those which occur in a list are legal, the "repertoire" of ISO 6937. The diacritic shall preceed the letter, but is no character in itself. A diacritic as a free-standing character is created by coding a space behind the byte that represents the "diacritical mark". In this way some characters are coded with one, others with two bytes. The number of codeable characters is finite, basically the 333 characters defined in the repertoire.
(The scheme of 6937 was abandoned in favor of the ISO-8859 scheme, which uses precomposed characters.)
Now both Windows and Unicode do things the other way around (base character followed by combining character). In order to properly handle conversions for ISO 6937, any of the following characters would have to be reversed with the character following it when calling WideCharToMultiByte(20269,...) and the character preceeding it when calling MultiByteToWideChar(20269,...)
Unicode | cp 20269 | Character |
U+0306 | 0xC6 | Combining Breve |
U+0307 | 0xC7 | Combining Dot Above |
U+0308 | 0xC8 | Combining Diaeresis |
U+030a | 0xCA | Combining Ring Above |
U+030b | 0xCD | Combining Double Acute |
U+030c | 0xCF | Combining Hacek |
U+0327 | 0xCB | Combining Cedilla |
U+0328 | 0xCE | Combining Ogonek |
U+0332 | 0xCC | Combining Low Line |
Technically, we should only do this for chars within the legal list of 333 chars, all others should fail to convert properly. But the simple reversal above might be enough....
Since 20269 is a table based code page, this kind of special handling is not being done and really cannot be done; to fix, a new (algorithmic or 'baby DBCS') code page would have to be defined. And we are not defining new code pages, so this one is going to need to be file under the "do not expect useful results without doing a lot of work yourself" category....
Not the end of the world or anything, but it seemed worthy of at least a blog entry. :-)
This post brought to you by "A" (U+0041, a.k.a. LATIN CAPITAL LETTER A)
referenced by
2008/09/14 Johab to be kidding me!
2007/08/30 The main criteria in determing whether a code page sucks? Suckage, of course!
2007/07/17 Sometimes people use code pages even when the code pages are really lame
2006/04/22 Dial 911, code page 864 isn't breathing
2005/09/12 You probably don't want to use Microsoft's code page 21027