chcp can't do everything

by Michael S. Kaplan, published on 2006/03/06 04:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/03/06/544251.aspx


The chcp.com utility is a simple little program sitting in the \WINDOWS\SYSTEM32 subdirectory. Running it with /? willl give some helpful information about its purpose:

C:\WINDOWS\system32>chcp /?
Displays or sets the active code page number.

CHCP [nnn]

  nnn   Specifies a code page number.

Type CHCP without a parameter to display the active code page number.

There is also more information in the Windows XP documentation, which does hint at a problem in its small list of "supported" code pages:

Code page Country/region or language

437

United States

850

Multilingual (Latin I)

852

Slavic (Latin II)

855

Cyrillic (Russian)

857

Turkish

860

Portuguese

861

Icelandic

863

Canadian-French

865

Nordic

866

Russian

869

Modern Greek

None of the ACP values are there, though this is I think a bit of social engineering -- to keep people thinking of it as the OEM code page. The 125x series code pages also work well here.

However, another set that is missing from the list is the ideographic code pages. You cannot use chcp to change to one of the ideographic code pages unless it is also the default system OEM code page.

Thus on a system with an 0x0409 default system code page:

C:\WINDOWS\system32>chcp 932
Invalid code page

C:\WINDOWS\system32>chcp 936
Invalid code page

C:\WINDOWS\system32>chcp 949
Invalid code page

C:\WINDOWS\system32>chcp 950
Invalid code page

This is a known and expected limitation for which there is no workaround....

 

This post brought to you by "Ā" (U+0100, a.k.a. LATIN CAPITAL LETTER A WITH MACRON)

 


# Ilya Konstantinov on 6 Mar 2006 10:11 AM:

While we're at it, why not try "chcp 65001" (65001 = CP_UTF8)? Amazing but it actually works... but only if you set the console to use a TrueType font (and of course, unless you switch to full-screen text mode).

Too bad Lucida Console doesn't contain Hebrew glyphs. Given that there are other monospaced TrueType fonts in my system (such as Courier New, which happens to contain Hebrew glyphs), what makes a TrueType font appear in the console's Properties | Font screen?

# Ben Cooke on 6 Mar 2006 1:42 PM:

When you enable codepage 65001, batch files and cmd scripts no longer run. No idea why, but it's a bit of a show-stopper for chcp 65001.

# Dean Harding on 6 Mar 2006 6:06 PM:

Ilya: See here: http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q247815

# Mike Myers on 17 Mar 2006 12:42 AM:

"However, another set that is missing from the list is the ideographic code pages. You cannot use chcp to change to one of the ideographic code pages unless it is also the default system OEM code page."

I installed support for East Asian Languages into my English Windows XP Pro system, and have documents with filenames using ideograph characters. Is there really no way for me to work in the command shell (cmd.exe) with these files? Do I need to go buy a version of XP for the East Asian language I am interested in? I was hoping there was something easier I could do.

# Michael S. Kaplan on 17 Mar 2006 1:50 AM:

It means you cannot change to one of the CJK code pages, Mike. You can certainly try 'chcp 65001' and you can also try 'cmd /u' to see if you can work with them.

Or you can even change the default system locale and then the oemcp will match by default if you switch to the right one.

Lots of options....

# Mike Myers on 21 Mar 2006 12:19 PM:

Thanks for the suggestions Michael.

I have seen mention of code page 65001, but I don't see any effect by switching to it. I have created test data in Arabic, in Cyrillic, and in Japanese, and code page 65001 cannot display any of it correctly. At certain font sizes, if I choose Courier New as my console font I can work with Arabic and Cyrillic (on any code page, does not require a chcp).

Likewise, 'cmd /u' does not appear to affect the display, although it does a very good job of creating proper Unicode output. That is, I can do a 'dir' and still not *see* anything in the console, but if I do a 'dir > results.txt' then I get a Unicode text file (I believe) that is readable with Notepad and all of the characters display there correctly.

I will attempt to change the system locale but had hesitated on trying this because I was hoping to avoid system-wide changes requiring a reboot, whenever I needed to work with certain file data.

# Michael S. Kaplan on 21 Mar 2006 12:58 PM:

Hi Mike,

Well, perhaps moving out of the [legacy] console world might be the best solution, in that case? Unicode apps have a much easier time when they are not stuck there....

With that said, I had very little trouble converting console projects to Unicode in the past (I'll be blogging about this soon)....

# Michael S. Kaplan on 5 Apr 2006 10:25 AM:

Apologies to Stanislav Kniazev -- I removed the table, since it is really unreadable in that format? Better to just provide tre link to the MSDN topic, instead?

# Stanislav Kniazev on 9 May 2006 8:43 AM:

If you need unicode output in file, you must use command CMD with option /U.
If you need unicode output in MS console, you must use the following command:
chcp 65001 && <your_command>, where <your_command> is any command or batch file. Font property for MS console in this case must be of course changed to "Lucida Console".
In the following table you find code pages for all charsets:
http://msdn.microsoft.com/workshop/author/dhtml/reference/charsets/charset4.asp

# Ran Ever-Hadani on 1 Feb 2008 6:43 PM:

I got here via a mutual friend (Google).  I want to use the 1252 code page.  Chcp 1252 seems to respond well (it says 1252 is now active) but then when I type a file that contains upper ascii in it, it still display it as one would expect under page 437.  I try writing to the screen from a program, I try to type characters directly holding down ALT and typing the decimal value - it still show me the page 437 characters, not the 1252 characters.  As far as I can tell, all CHCP does under XP is tell you that your page is active, otherwise nothing. :(  I would greatly appreciate any pointers.  Thank.

# Michael S. Kaplan on 18 Mar 2008 7:39 AM:

Change out of the raster font, perhaps? Move to Lucida Console....

# rbouman on 6 Oct 2008 1:17 AM:

interesting reading here, thank you.

I've had an occurrance of the path variable on xp pro being displayed correctly once only as human readable then as ascii chars only.

turned out the codepage for this machine was set to 850, if I then manually set the codepage to 437, the path variable remains human readable [that is; from a command prompt screen output].

the mystery is ; the machine is set to australian english in regional settings; there's no multi linguallity [is that a word?] other than that.

what else could possibly cause the path variable [and it's the only environment variable to be affected] to display ascii chars?

I'm also making the assumption that the ascii chars may cause some apps. to not read the path var. properly, right?

# Accsem3 on 31 Oct 2008 1:15 PM:

>If you need unicode output in file, you must use command CMD with option /U.

I've tried

CMD /U Tree /a > temp.txt

in line 1 of batch file, which stop in this command and nothing is actually executed.

temp.txt is not generated and the script does not continue.

Using a Japanese Windows XP, eastern language enabled of course

And for CHCP, the dos batch script will just close after running CHCP, anything below CHCP will not be executed as well.

# sslaith on 13 Nov 2008 9:53 PM:

Is there a replacement for chcp on a 64bit client, I was browsing through the system32 and found chcp,

but looking through sysWoW64,  I saw nothing that resembles a chcp setting.

DevoMax on 21 Oct 2009 10:38 PM:

1. Running CHCP 65001 causes subsequent batch files not to be executed at all.

2. In my case, the offending character is "ö", which is contained in the 437 code page, so I don't understand why the batch file fails if it contains that character. Works fine if I copy/paste into the command prompt.

3. Running CMD /U means subsequent commands in the batch file will be ignored.

Mike

Yuhong Bao on 30 Jul 2010 5:33 PM:

Are you editing the batch file using Notepad, which uses the ANSI code page not the OEM code page?

Yuhong Bao on 11 Oct 2010 11:33 PM:

BTW, why do DBCS/CJK console support require that the OEMCP set to the needed codepage?

Michael S. Kaplan on 11 Oct 2010 11:36 PM:

Um, because that is how they decide how to hook up some core plumbing the console uses to display text?

yong on 7 Jan 2011 10:40 PM:

in sv_SV OS, I set chcp 437, looks the codepage has changed to 437, but when I input diskpart command, MS-Dos still output localization language, it's amazing.....

Yuhong Bao on 12 Jan 2011 6:14 PM:

This can be a pain on DBCS-localized OSes. When you chcp to 437 and run a command, all the DBCS chars turn into question marks. Is there a way to change the console language to English?

Michael S. Kaplan on 12 Jan 2011 8:35 PM:

Mostly no. But chcp was not designed to do that either -- you are only getting what you requested.

PierreMic on 11 Jun 2012 12:30 PM:

Is it true that one can't add SimSun as a TrueType command box font on a Windows 7 Ultimate box unless the OS default locale is already Chinese!?

What is a *good* reason for this limitation?

riceD on 26 Sep 2012 12:59 AM:

use CHCP 950 successful

thanks


referenced by

2012/06/13 Console limitations

2008/10/07 Even the prisoners of the motherland use cp1252 (though they escaped 037)

go to newer or older post, or back to index or month or day