You probably don't want to use Microsoft's code page 20269

by Michael S. Kaplan, published on 2005/08/24 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/08/24/455446.aspx

Yes, there is a problem with code page 20269. And there has been, since birth.

It is intended to be an implementation of ISO-6937. Unfortunately it cannot really be used for its intended purpose, to provide a form for combining characters for Latin-1. The ISO standard works as follows:

ISO 6937 has for characters single letters and combinations of a letter with a diacritic. Only those which occur in a list are legal, the "repertoire" of ISO 6937. The diacritic shall preceed the letter, but is no character in itself. A diacritic as a free-standing character is created by coding a space behind the byte that represents the "diacritical mark". In this way some characters are coded with one, others with two bytes. The number of codeable characters is finite, basically the 333 characters defined in the repertoire.

(The scheme of 6937 was abandoned in favor of the ISO-8859 scheme, which uses precomposed characters.)

Now both Windows and Unicode do things the other way around (base character followed by combining character). In order to properly handle conversions for ISO 6937, any of the following characters would have to be reversed with the character following it when calling WideCharToMultiByte(20269,...) and the character preceeding it when calling MultiByteToWideChar(20269,...)

Unicode	cp 20269	Character
U+0306	0xC6	Combining Breve
U+0307	0xC7	Combining Dot Above
U+0308	0xC8	Combining Diaeresis
U+030a	0xCA	Combining Ring Above
U+030b	0xCD	Combining Double Acute
U+030c	0xCF	Combining Hacek
U+0327	0xCB	Combining Cedilla
U+0328	0xCE	Combining Ogonek
U+0332	0xCC	Combining Low Line

Technically, we should only do this for chars within the legal list of 333 chars, all others should fail to convert properly. But the simple reversal above might be enough....

Since 20269 is a table based code page, this kind of special handling is not being done and really cannot be done; to fix, a new (algorithmic or 'baby DBCS') code page would have to be defined. And we are not defining new code pages, so this one is going to need to be file under the "do not expect useful results without doing a lot of work yourself" category....

Not the end of the world or anything, but it seemed worthy of at least a blog entry. :-)

This post brought to you by "A" (U+0041, a.k.a. LATIN CAPITAL LETTER A)

no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2008/09/14 Johab to be kidding me!

2007/08/30 The main criteria in determing whether a code page sucks? Suckage, of course!

2007/07/17 Sometimes people use code pages even when the code pages are really lame

2006/04/22 Dial 911, code page 864 isn't breathing

2005/09/12 You probably don't want to use Microsoft's code page 21027

go to newer or older post, or back to index or month or day