Can the CP_ACP be UTF-8?

by Michael S. Kaplan, published on 2006/07/14 11:04 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/07/14/665714.aspx


Just moments ago, Sergey asked in the Suggestion Box:

Hello, Michael!
Wouldn't it be great to be able to set UTF-8 as a multibyte code page in Windows? What do you think?

Well yes, I think it would be great. :-)

Of course (in the spirit of RAH) I think it would be great if the lion could lay down with the lamb. Though I'd lay odds that only one of them would be getting up later....

I hint at some of the problems in this post and then talk about it more directly in the comments in this one.

Short version -- it can't happen.

Sorry, Sergey.... :-(

 

This post brought to you by  (U+09dc, a.k.a. BENGALI LETTER RRA)


# Ben Cooke on 14 Jul 2006 1:26 PM:

I assume somewhere under here lies the reason why the command interpreter does a few odd things if you say "chcp 65001". In particular, it won't run batch files anymore.

Lots of things work just fine, of course. Any character that's available in the font you've got set for the console (Lucida Console in my case) will appear just fine if you "type" a UTF-8 document, and you can include unicode characters in the arguments for most built-in commands.

# Michael S. Kaplan on 14 Jul 2006 1:31 PM:

Hi Ben,

That has more to do with the fact that only the output encoding is changed, while the underlying conversion does not (the mismatch usually will not matter, but the interpreter is the biggest exception to this case).

# Sergei on 14 Jul 2006 5:36 PM:

Wow! My question has been discussed on this blog just 10 days ago. I need to pay more attention to the comments ☺.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2008/08/15 Yet another time that UTF-8 can't be the ACP

2007/01/03 UTF-8 and GB18030 are both 'NT' code pages, they just aren't 'ANSI' code pages

2006/10/11 Why can't the CP_ACP be UTF-8?

go to newer or older post, or back to index or month or day