You probably don't want to use Microsoft's code page 20269

by Michael S. Kaplan, published on 2005/08/24 03:01 -04:00, original URI:

Yes, there is a problem with code page 20269. And there has been, since birth.

It is intended to be an implementation of ISO-6937. Unfortunately it cannot really be used for its intended purpose, to provide a form for combining characters for Latin-1. The ISO standard works as follows:

ISO 6937 has for characters single letters and combinations of a letter with a diacritic. Only those which occur in a list are legal, the "repertoire" of ISO 6937. The diacritic shall preceed the letter, but is no character in itself. A diacritic as a free-standing character is created by coding a space behind the byte that represents the "diacritical mark". In this way some characters are coded with one, others with two bytes. The number of codeable characters is finite, basically the 333 characters defined in the repertoire.

(The scheme of 6937 was abandoned in favor of the ISO-8859 scheme, which uses precomposed characters.)

Now both Windows and Unicode do things the other way around (base character followed by combining character). In order to properly handle conversions for ISO 6937, any of the following characters would have to be reversed with the character following it when calling WideCharToMultiByte(20269,...) and the character preceeding it when calling MultiByteToWideChar(20269,...)

Unicode cp 20269 Character
U+0306 0xC6 Combining Breve
U+0307 0xC7 Combining Dot Above
U+0308 0xC8 Combining Diaeresis
U+030a 0xCA Combining Ring Above
U+030b 0xCD Combining Double Acute
U+030c 0xCF Combining Hacek
U+0327 0xCB Combining Cedilla
U+0328 0xCE Combining Ogonek
U+0332 0xCC Combining Low Line

Technically, we should only do this for chars within the legal list of 333 chars, all others should fail to convert properly. But the simple reversal above might be enough....

Since 20269 is a table based code page, this kind of special handling is not being done and really cannot be done; to fix, a new (algorithmic or 'baby DBCS') code page would have to be defined. And we are not defining new code pages, so this one is going to need to be file under the "do not expect useful results without doing a lot of work yourself" category....

Not the end of the world or anything, but it seemed worthy of at least a blog entry. :-)


This post brought to you by "A" (U+0041, a.k.a. LATIN CAPITAL LETTER A)

no comments

referenced by

2008/09/14 Johab to be kidding me!

2007/08/30 The main criteria in determing whether a code page sucks? Suckage, of course!

2007/07/17 Sometimes people use code pages even when the code pages are really lame

2006/04/22 Dial 911, code page 864 isn't breathing

2005/09/12 You probably don't want to use Microsoft's code page 21027

go to newer or older post, or back to index or month or day