More on OEMCP vs. ACP

by Michael S. Kaplan, published on 2005/09/21 03:03 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/09/21/472075.aspx


Yesterday, someone asked:

I have a cmd line tool. On French Windows its output is correct in the cmd window but when I redirect it to a file it is no longer good. I noticed that if I set the output encoding on the Console class explicitly to the default encoding then redirecting to a file works fine:

     Console.OutputEncoding = Encoding.Default;

Does anyone know why the default encoding is not set by default? Is there any reason why I shouldn’t put the line above in my code?

The problem here is of course that the console defaults to using the default system OEM code page (CP_OEMCP) and the file once opened in Notepad will be interpreted as being in the default 'ANSI' code page (CP_ACP). There are a few cases where they are the same (Chinese, Japanese, Korean, Thai, Vietnamese) but for most cases the two code pages give two different results.

This will of course have the effect of corrupting data.

Now the line of code above causes the console to match the 'ANSI' code page, so things are put out to the text file in the same encoding that Notepad will eventually use to interpret the file, so everything will work.

Bill Poser recently discussed a similar one of these "write the file with one encoding, read with another" type problems, and he discussed it here and here.

And it can indeed be fascinating when you have to track down what the problems actually is in some of these cases. Well, fascinating or frustrating -- which one it is will depend on how much you enjoy this sort of torture. :-)


no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day