Bytes over Western Europe

by Michael S. Kaplan, published on 2006/05/26 21:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/05/26/608497.aspx


Over on the BCLTeam's WebLog, Ryan Byington wrote a mostly excellent post entitled SerialPort Encoding.

At the end of the post he went a bit too far when he mentioned that

The only encoding that converts all characters with a value 0-255 to a single byte with the corresponding value and vice versa when converting bytes to characters is the “Western European (ISO)” encoding.

As regular readers here may know, the vast majority of the code pages supported in Windows and the .NET Framework are single byte character sets (SBCS). This includes 1250, 1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258, 874, 437, 720, 737, 775, 850, 852, 855, 857, 858, 862, 866, 28591 (the one Ryan was thinking of), 28592, 28593, 28594, 28595, 28596, 28597, 28598, 28599, and 28605.

Plus there are all of those EBCDIC code pages if a device was using them -- they are SBCS as well.

And a few miscellaneous ones, too.

There are a lot of them....

The last code sample is also a little incomplete (it treats Encoding.GetEncoding like a property rather than a method) but we can assume that pseudo code and focus on the big list of code pages, instead. :-)

 

This post brought to you by "ਊ" (U+0a0a, a.k.a. GURMUKHI LETTER UU)


# Mihai on 26 May 2006 10:25 PM:

Not to mention UTF-8.
I mean, in the end a byte is just a byte :-)

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day