Consoling people about their troubles with the console.

by Michael S. Kaplan, published on 2005/06/29 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/06/29/433669.aspx

This is not one of those fun posts where I get to talk about exciting new features. Instead I am going to answer some questions about CMD, the console (kŏn'sōl'), and now I am going to try and console (kən-sōl') the people with questions, since they will probably not care for the answers. :-(

Several hundred posts ago (back in the end of 2004), Per Bergland asked (in the Suggestion Box):

This may have been asked (and answered) before, but I find it such a shame that cmd.exe can't execute a bat/cmd file in unicode (UTF16). Since Notepad doesn't do "OEM", I find myself using the DOS EDIT text editor to fix up national characters such as our Swedish å,ä and ö.

Hey, cmd can *read* unicode and even *write* using the "/U" switch, so why can't it read & execute a file containing Unicode?

You wouldn't happen to know this, would you?

I can't answer this one with authority since I don't own CMD.EXE. In fact, I am not even sure who does these days. But I do know that it is not easy to get major feature work done in this area, in that codebase. The whole point of the Monad project (read more about it in posts here) is to get away from all of the backcompat issues that keep people from wanting to touch the code to make changes. The last time I checked it out, the plan was to support Unicode files, though.

For the legacy case, I have been in the habit of using Word and choosing the code page to save a file to as plain text as a way to get the files in the right format, and I have tried to lobby the owners of Notepad to consider adding another "Save As..." option for the OEM code page, but I have not gotten much traction on that (or on my other request for that list, the UTF-8 without BOM choice). Though if i had to guess which was more likely to be seen in the future, I would guess that they would be quicker to add features to Notepad then to the console....

Then, moving on into January, KJK::Hyperion asked (also in the Suggestion Box):

Console windows support Unicode, but they necessarily have a number of limitations, having to support the OEM charset and being limited to monospace fonts (which, I've seen, rules out composed characters and some special spacing characters). How is this handled internally? especially, how is Japanese handled, with its mixture of half-width and full-width characters? and how are valid fonts chosen?

See above for some answers. For wanting to have your own font choice, you can pick any monspace or essentially monospace font and then set one or both of the following registry values:

KEY == HKEY_CURRENT_USER\Console, ValueName == FaceName, Value == <whatever font you like>

KEY == HKEY_CURRENT_USER\Console, ValueName == FontFamily, Value == <50 for decorative, 40 for Script, 30 for Modern, 20 for Swiss, or 10 for Roman>.

Now when I say essentially monospace above, the reason for that is that none of the CJK fonts are true monospaced fonts. They all (even the bitmap fonts) have the halfwidth characters taking up half as much space as the fullwidth ones, though.

Most recently, Denis Bider asked (first of Larry Osterman, then of me (directly):

In our company, we observed the following apparent inconsistency in cmd.exe.

If you execute cmd /?, you get this help text:

/A - Causes the output of internal commands to a pipe or file to be ANSI

/U - Causes the output of internal commands to a pipe or file to be Unicode

But the fact is, the output of cmd /A is not actually ANSI. It is in the OEM code page. For example, if I try cmd /A echo csz > file.txt, and then try to open file.txt in Notepad (which uses ANSI), I get garbage.

Lots of other command line utilities (like those in Cygwin) actually use ANSI. So this is a problem - characters get corrupted across pipe boundaries; files get interpreted in incompatible ways.

From a user's perspective, it seems somewhat logical to expect that if the /A flag description says it will produce ANSI, it should produce ANSI; not OEM.

What do you think? Is this intentional or is it a problem?

Well, since it is behavior that has been around for several versions, I would hesitate to call it a bug. I will run it up the flagpole here, but I assume the "fix" will be to just fix up the text in that help. Which is really all that they could do, since changing the behavior of the flag would break who knows how many scripts (well, if we made the change, we would know -- from all of the people complaining about the behavior change!).

In the meantime though, I can recommend chcp.com, a nice little utility that will either display the active OEM code page in the console (if run with no parameters), or allow you to change that code page. You can look at some documentation on it here. Note that when you run this utility, is reports back the code page as the "Active code page". Not a 100% solution, but as good as the console will really allow.

Did I mention that you may want to take a look at Monad? :-)

This post brought to you by "〷" (U+3037, a.k.a. IDEOGRAPHIC TELEGRAPH LINE FEED SEPARATOR SYMBOL)

# Peter Ibbotson on 29 Jun 2005 7:59 AM:

One irritation with CHCP is that for raster fonts it only changes the fonts for full screen mode, not for windowed mode.
If you're using raster fonts the codepage is fixed at system install time and very hard to change, simply copying over the .FON files from a 437 installed system to an 850 installed system doesn't fix the problem (or at least last time I tried it didn't).

This is particularly noticable for old DOS apps that have screens with
╟──╢
line drawing on them.

# Michael S. Kaplan on 29 Jun 2005 10:39 AM:

Hi Peter,

Any time you are doing anything vaguely international in the console, you should not use one of the raster, bitmap fonts. Ever.

Especially when you change the codepage. :-)

# Tod Nelson on 29 Jun 2005 11:11 AM:

I agree. Switching to Lucida Console is one of the first steps of every new machine I set up.

# David on 29 Jun 2005 1:12 PM:

There's a fairly nifty text editor, EmEditor (http://www.emeditor.com), that allows you to save text documents in any of a fairly large variety of encodings, and with or without BOMs in the case of Unicode. It also allows you to manually load (or reload) a text file in any of its various supported encodings. I find it very helpful. It costs like $40, but it's well worth every penny and then some the first time you *really* need it.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day