Dial 911, code page 864 isn't breathing

by Michael S. Kaplan, published on 2006/04/22 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/04/22/580636.aspx


In the past I have had less than kind words to say about code pages 20127, 20269, and 1258. Well, with this post you will be able to add code page 864 to the list....

Over in the microsoft.public.platformsdk.mslayerforunicode newsgroup, shanab asked:

how i can encode from arabic(1256) to ibm(864) in c#.net?

i need help for this problem because i cannot encode and decode this.

Of course this question has nothing to do with MSLU, but it is common for people to see the word Unicode in the title and just post their question there. I think everyone is used to it. :-)

Now in theory, since both code pages exist on Windows you could just pivot them through Unicode -- a simple Encoding.GetEncoding(1256).GetString() to go from Windows code page 1256 to Unicode and a simple Encoding.GetEncoding(864).GetBytes() to go from Unicode to cp864.

If we take a slightly modified version of the code from this post and run it to get the list of characters in cp864, though (looking above 0x7f, of course!):

U+009b  0x9b
U+009c  0x9c
U+009f  0x9f
U+00a0  0xa0
U+00a2  0xc0
U+00a3  0xa3
U+00a4  0xa4
U+00a6  0xdb
U+00ab  0x97
U+00ac  0xdc
U+00ad  0xa1
U+00b0  0x80
U+00b1  0x93
U+00b7  0x81
U+00bb  0x98
U+00bc  0x95
U+00bd  0x94
U+00d7  0xde
U+00f7  0xdd
U+03b2  0x90
U+03c6  0x92
U+060c  0xac
U+061b  0xbb
U+061f  0xbf
U+0640  0xe0
U+0651  0xf1
U+0660  0xb0
U+0661  0xb1
U+0662  0xb2
U+0663  0xb3
U+0664  0xb4
U+0665  0xb5
U+0666  0xb6
U+0667  0xb7
U+0668  0xb8
U+0669  0xb9
U+2219  0x82
U+221a  0x83
U+221e  0x91
U+2248  0x96
U+2500  0x85
U+2502  0x86
U+250c  0x8d
U+2510  0x8c
U+2514  0x8e
U+2518  0x8f
U+251c  0x8a
U+2524  0x88
U+252c  0x89
U+2534  0x8b
U+253c  0x87
U+2592  0x84
U+25a0  0xfe
U+f8be  0xa6
U+f8bf  0xa7
U+f8c0  0xff
U+fe7d  0xf0
U+fe80  0xc1
U+fe81  0xc2
U+fe82  0xa2
U+fe83  0xc3
U+fe84  0xa5
U+fe85  0xc4
U+fe8b  0xc6
U+fe8d  0xc7
U+fe8e  0xa8
U+fe8f  0xa9
U+fe91  0xc8
U+fe93  0xc9
U+fe95  0xaa
U+fe97  0xca
U+fe99  0xab
U+fe9b  0xcb
U+fe9d  0xad
U+fe9f  0xcc
U+fea1  0xae
U+fea3  0xcd
U+fea5  0xaf
U+fea7  0xce
U+fea9  0xcf
U+feab  0xd0
U+fead  0xd1
U+feaf  0xd2
U+feb1  0xbc
U+feb3  0xd3
U+feb5  0xbd
U+feb7  0xd4
U+feb9  0xbe
U+febb  0xd5
U+febd  0xeb
U+febf  0xd6
U+fec1  0xd7
U+fec5  0xd8
U+fec9  0xdf
U+feca  0xc5
U+fecb  0xd9
U+fecc  0xec
U+fecd  0xee
U+fece  0xed
U+fecf  0xda
U+fed0  0xf7
U+fed1  0xba
U+fed3  0xe1
U+fed5  0xf8
U+fed7  0xe2
U+fed9  0xfc
U+fedb  0xe3
U+fedd  0xfb
U+fedf  0xe4
U+fee1  0xef
U+fee3  0xe5
U+fee5  0xf2
U+fee7  0xe6
U+fee9  0xf3
U+feeb  0xe7
U+feec  0xf4
U+feed  0xe8
U+feef  0xe9
U+fef0  0xf5
U+fef1  0xfd
U+fef2  0xf6
U+fef3  0xea
U+fef5  0xf9
U+fef6  0xfa
U+fef7  0x99
U+fef8  0x9a
U+fefb  0x9d
U+fefc  0x9e

A quick glance of what cp864 supports explains why I talked about the ability to do this is something theoretical. :-)

The only Unicode code points in the regular Arabic block in Unicode are the numbers, which are incidentally not in cp1256. Everything else in cp864 is from the Arabic Presentation Forms, which are not characters you want to be using if you can help it, as I point out in It does not always pay to be compatible.

Even worse, it does not support all four forms of even the basic Arabic characters (no fault of the code page, it is a fault of all code pages since there is not enough room!).

So, the only real way to move between cp1256 and cp864 would be to write custom code to try to move into the presentation forms, and any time something is not supported I suppose just putting in the wrong form. Which would be a lot of work to support something that does not work very well anyway....

 

This post brought to you by "ﻱ" (U+fef1, a.k.a. ARABIC LETTER YEH ISOLATED FORM)


# Jhall5197 on 25 Apr 2006 12:17 PM:

thanks for this article.

im trying to convert unicode to code page 864(the only code page the printer supports) for printing through serial ports, and i keep getting those darn question marks.

Any suggestions on what could be done?

# Michael S. Kaplan on 25 Apr 2006 12:39 PM:

There is really not very much to be done here -- the code page is very inadequate at representing the language....

Yuhong Bao on 18 Nov 2010 8:10 PM:

If you really need to use an OEM codepage for Arabic, there is codepage 720.

Yuhong Bao on 2 Nov 2012 2:25 AM:

864 also maps the ASCII percent sign to U+066A, which is for example used as an escape characters in URLs, which is why Mozilla want to remove support for it from Firefox:

www.w3.org/.../show_bug.cgi

bugzilla.mozilla.org/show_bug.cgi


referenced by

2008/09/14 Johab to be kidding me!

2007/08/30 The main criteria in determing whether a code page sucks? Suckage, of course!

go to newer or older post, or back to index or month or day