Dial 911, code page 864 isn't breathing

by Michael S. Kaplan, published on 2006/04/22 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/04/22/580636.aspx

In the past I have had less than kind words to say about code pages 20127, 20269, and 1258. Well, with this post you will be able to add code page 864 to the list....

Over in the microsoft.public.platformsdk.mslayerforunicode newsgroup, shanab asked:

how i can encode from arabic(1256) to ibm(864) in c#.net?

i need help for this problem because i cannot encode and decode this.

Of course this question has nothing to do with MSLU, but it is common for people to see the word Unicode in the title and just post their question there. I think everyone is used to it. :-)

Now in theory, since both code pages exist on Windows you could just pivot them through Unicode -- a simple Encoding.GetEncoding(1256).GetString() to go from Windows code page 1256 to Unicode and a simple Encoding.GetEncoding(864).GetBytes() to go from Unicode to cp864.

If we take a slightly modified version of the code from this post and run it to get the list of characters in cp864, though (looking above 0x7f, of course!):

U+009b 0x9b
U+009c 0x9c
U+009f 0x9f
U+00a0 0xa0
U+00a2 0xc0
U+00a3 0xa3
U+00a4 0xa4
U+00a6 0xdb
U+00ab 0x97
U+00ac 0xdc
U+00ad 0xa1
U+00b0 0x80
U+00b1 0x93
U+00b7 0x81
U+00bb 0x98
U+00bc 0x95
U+00bd 0x94
U+00d7 0xde
U+00f7 0xdd
U+03b2 0x90
U+03c6 0x92
U+060c 0xac
U+061b 0xbb
U+061f 0xbf
U+0640 0xe0
U+0651 0xf1
U+0660 0xb0
U+0661 0xb1
U+0662 0xb2
U+0663 0xb3
U+0664 0xb4
U+0665 0xb5
U+0666 0xb6
U+0667 0xb7
U+0668 0xb8
U+0669 0xb9
U+2219 0x82
U+221a 0x83
U+221e 0x91
U+2248 0x96
U+2500 0x85
U+2502 0x86
U+250c 0x8d
U+2510 0x8c
U+2514 0x8e
U+2518 0x8f
U+251c 0x8a
U+2524 0x88
U+252c 0x89
U+2534 0x8b
U+253c 0x87
U+2592 0x84
U+25a0 0xfe
U+f8be 0xa6
U+f8bf 0xa7
U+f8c0 0xff
U+fe7d 0xf0
U+fe80 0xc1
U+fe81 0xc2
U+fe82 0xa2
U+fe83 0xc3
U+fe84 0xa5
U+fe85 0xc4
U+fe8b 0xc6
U+fe8d 0xc7
U+fe8e 0xa8
U+fe8f 0xa9
U+fe91 0xc8
U+fe93 0xc9
U+fe95 0xaa
U+fe97 0xca
U+fe99 0xab
U+fe9b 0xcb
U+fe9d 0xad
U+fe9f 0xcc
U+fea1 0xae
U+fea3 0xcd
U+fea5 0xaf
U+fea7 0xce
U+fea9 0xcf
U+feab 0xd0
U+fead 0xd1
U+feaf 0xd2
U+feb1 0xbc
U+feb3 0xd3
U+feb5 0xbd
U+feb7 0xd4
U+feb9 0xbe
U+febb 0xd5
U+febd 0xeb
U+febf 0xd6
U+fec1 0xd7
U+fec5 0xd8
U+fec9 0xdf
U+feca 0xc5
U+fecb 0xd9
U+fecc 0xec
U+fecd 0xee
U+fece 0xed
U+fecf 0xda
U+fed0 0xf7
U+fed1 0xba
U+fed3 0xe1
U+fed5 0xf8
U+fed7 0xe2
U+fed9 0xfc
U+fedb 0xe3
U+fedd 0xfb
U+fedf 0xe4
U+fee1 0xef
U+fee3 0xe5
U+fee5 0xf2
U+fee7 0xe6
U+fee9 0xf3
U+feeb 0xe7
U+feec 0xf4
U+feed 0xe8
U+feef 0xe9
U+fef0 0xf5
U+fef1 0xfd
U+fef2 0xf6
U+fef3 0xea
U+fef5 0xf9
U+fef6 0xfa
U+fef7 0x99
U+fef8 0x9a
U+fefb 0x9d
U+fefc 0x9e

A quick glance of what cp864 supports explains why I talked about the ability to do this is something theoretical. :-)

The only Unicode code points in the regular Arabic block in Unicode are the numbers, which are incidentally not in cp1256. Everything else in cp864 is from the Arabic Presentation Forms, which are not characters you want to be using if you can help it, as I point out in It does not always pay to be compatible.

Even worse, it does not support all four forms of even the basic Arabic characters (no fault of the code page, it is a fault of all code pages since there is not enough room!).

So, the only real way to move between cp1256 and cp864 would be to write custom code to try to move into the presentation forms, and any time something is not supported I suppose just putting in the wrong form. Which would be a lot of work to support something that does not work very well anyway....

This post brought to you by "ﻱ" (U+fef1, a.k.a. ARABIC LETTER YEH ISOLATED FORM)

# Jhall5197 on 25 Apr 2006 12:17 PM:

thanks for this article.

im trying to convert unicode to code page 864(the only code page the printer supports) for printing through serial ports, and i keep getting those darn question marks.

Any suggestions on what could be done?

# Michael S. Kaplan on 25 Apr 2006 12:39 PM:

There is really not very much to be done here -- the code page is very inadequate at representing the language....

Yuhong Bao on 18 Nov 2010 8:10 PM:

If you really need to use an OEM codepage for Arabic, there is codepage 720.

Yuhong Bao on 2 Nov 2012 2:25 AM:

864 also maps the ASCII percent sign to U+066A, which is for example used as an escape characters in URLs, which is why Mozilla want to remove support for it from Firefox:

www.w3.org/.../show_bug.cgi

bugzilla.mozilla.org/show_bug.cgi

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2008/09/14 Johab to be kidding me!

2007/08/30 The main criteria in determing whether a code page sucks? Suckage, of course!

go to newer or older post, or back to index or month or day