Sometimes, it's Unicode or nothing...

by Michael S. Kaplan, published on 2012/04/09 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2012/04/09/10291883.aspx


The question was simple enough:

I have a cmd line tool. On French Windows its output is correct in the cmd window but when I redirect it to a file it is no longer good. I noticed that if I set the output encoding on the Console class explicitly to the default encoding then redirecting to a file works fine:

    Console.OutputEncoding = Encoding.Default;

Does anyone know why the default encoding is not set by default? Is there any reason why I shouldn’t put the line above in my code?

Thanks

Wow, if there was ever a property to avoid, Encoding.Default would be it.

As the MSDN topic explains, it is the default system ANSI code page - in other words the CP_ACP, the code page used by non-Unicode Windows applications and the programs that can read/write non-Unicode files.

Like the default .TXT file type handler, NOTEPAD.EXE.

Console applications, on the other hand, use CP_OEMCP by default, which is why the described problem is happening.

Because CP_ACP and CP_OEMCP are not the same thing, for a lot of different default system locales.

And if there is anything that the CP_ACP and the CP_OEMCP have in common, it is that they both act as non-overlapping subsets of Unicode

Now some console apps can support all of Unicode, in which case CP_UTF8 is a better use of Encoding. Or it cannot support any change, in which case using Encoding.Default will be talking text that is filtered through CP_OEMCP is and then add a CP_ACP filter as well....

In the end, you should go UP to UTF-8, or least if Unicode won't work, then CP_OEMCP should be left alone....


Claudia Lawrence on 9 Apr 2012 10:22 AM:

Thank you.  That really opened the kimono for me.

Michael S. Kaplan on 9 Apr 2012 6:51 PM:

Ha!

Random832 on 10 Apr 2012 1:04 PM:

Of course, this only works if you want to make a UTF-8 text file. If you want to make an OEM text file, leave it as CP_OEMCP. If you want to make an ANSI text file...

Of course, if your output is to a pipe to a tool that expects OEM text, such as, I don't know, "more", then you really do need it in CP_OEMCP. On balance it's probably best to have a specific option for making output files (in whatever encoding), use WriteConsoleW for console output, and just leave things alone if the output is a pipe.


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day