Anyone who says the console can't do Unicode isn't as smart as they think they are

by Michael S. Kaplan, published on 2010/04/07 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/04/07/9989346.aspx


Do you play the odds?

If you are a developer, the odds are that all things being equal you are not nearly as smart now before you read this blog as you will be once you have read it....

Back in the middle of February I mentioned in The real problem(s) with all of these console "fallback" discussions that, of the many people talking about the console these days, most of them are wrong.

Solving problems that don't exist, incorrectly impacting problems that do exist, and just generally making the situation worse overall....

But I didn't really finish the work there; the blog was merely armchair criticisms of bugs, design flaws, mistaken assumptions spoken as fact, documentation problems, etc.

100% accurate, but not described in a way that can help you move to the next step (getting it done right, in either native or managed code).

Today's blog is going to change all that. :-)

All of this and much more will be covered in the upcoming training on the World-Ready Console, if you are on the Windows team....

After showing how the console could be 100% Unicode, which I did in March of 2008 after STL showed me, as I talked about in Conventional wisdom is retarded, aka What the @#%&* is _O_U16TEXT?, there is one piece of the puzzle still missing.

I mean it is all well and good to show it in just a few lines of native code using the CRT.

But the truth is that this problem exists in managed code too (some of which actually uses the CRT, and Win32), and also in native code that has no heavy CRT dependency or doesn't take one on.

Behind the scenes, the CRT is doing all the right work in those circumstances to e.g. call WriteConsoleW or WriteFile (depending on whether the console's output streams are redirected or not).

So anyone trying to do the same thing in native Win32 would have to do that same work.

And although the CRT and .NET are both being developed in the same division of Microsoft, and .Net has its own internal CRT dependencies (it depends on .Net's version even when it ships with the OS), the managed Console class is not using this CRT functionality. And they are not doing it the hard way themselves, either.

Now calling the CRT from VB.Net or C# (or other non-C++ languages) has some interesting challenges that I am not going to get into here (if someone wants to go that way they can). I thought instead I'd just give you the code really quick so you can do it in whatever language, without the version or CRT dependencies.

Now this is C# code, this WriteLineRight sample function.

But it is pretty much Win32 code written in C#. So Win32 developers should have no trouble grokking it or what it is doing:

using System;
using System.Runtime.InteropServices;

public class Test {
    public static void Main() {
        string st = "\u0627\u0628\u0629 \u043a\u043e\u0448\u043a\u0430 \u65e5\u672c\u56fd\n\n";
        WriteLineRight(st);
    }

    internal static bool IsConsoleFontTrueType(IntPtr std) {
     CONSOLE_FONT_INFO_EX cfie = new CONSOLE_FONT_INFO_EX();
        cfie.cbSize = (uint)Marshal.SizeOf(cfie);
        if(GetCurrentConsoleFontEx(std, false, ref cfie)) {
            return(((cfie.FontFamily & TMPF_TRUETYPE) == TMPF_TRUETYPE));
        }
        return false;
    }

    public static void WriteLineRight(string st) {
        IntPtr stdout = GetStdHandle(STD_OUTPUT_HANDLE);
        if(stdout != INVALID_HANDLE_VALUE) {
            uint filetype = GetFileType(stdout);
            if(! ((filetype == FILE_TYPE_UNKNOWN) && (Marshal.GetLastWin32Error() != ERROR_SUCCESS))) {
                bool fConsole;
                uint mode;
                uint written;
                filetype &= ~(FILE_TYPE_REMOTE);
                if (filetype == FILE_TYPE_CHAR) {
                    bool retval = GetConsoleMode(stdout, out mode);
                    if ((retval == false) && (Marshal.GetLastWin32Error() == ERROR_INVALID_HANDLE)) {
                        fConsole = false;
                    } else {
                        fConsole = true;
                    }
                } else {
                    fConsole = false;
                }

                if (fConsole) {
                    if (IsConsoleFontTrueType(stdout)) {
                        WriteConsoleW(stdout, st, st.Length, out written, IntPtr.Zero);
                    } else {
                        //
                        // Not a TrueType font, so the output may have trouble here
                        // Need to check the codepage settings
                        //
                        // TODO: Add the old style GetConsoleFallbackUICulture code here!!!
                    }
                } else {
                    //
                    // Write out a Unicode BOM to ensure proper processing by text readers
                    //
                    WriteFile(stdout, BOM, 2, out written, IntPtr.Zero);
                    WriteFile(stdout, st, st.Length * 2, out written, IntPtr.Zero);
                }
            }
        }
    }

    [DllImport("kernel32.dll", CharSet=CharSet.Unicode, ExactSpelling=true)]
    internal static extern bool WriteConsoleW(IntPtr hConsoleOutput, 
string lpBuffer,
int nNumberOfCharsToWrite, out uint lpNumberOfCharsWritten,
IntPtr lpReserved); [DllImport("kernel32.dll", CharSet=CharSet.Unicode, ExactSpelling=true)] internal static extern bool WriteFile(IntPtr hFile,
string lpBuffer,
int nNumberOfBytesToWrite, out uint lpNumberOfBytesWritten,
IntPtr lpOverlapped); [DllImport("kernel32.dll", ExactSpelling=true, SetLastError=true)] internal static extern bool GetConsoleMode(IntPtr hConsoleHandle, out uint lpMode); [DllImport("kernel32.dll", ExactSpelling=true)] internal static extern bool GetCurrentConsoleFontEx(IntPtr hConsoleOutput,
bool bMaximumWindow,
ref CONSOLE_FONT_INFO_EX lpConsoleCurrentFontEx); [DllImport("Kernel32.DLL", ExactSpelling=true, SetLastError=true)] internal static extern uint GetFileType(IntPtr hFile); [DllImport("Kernel32.DLL", ExactSpelling=true)] internal static extern IntPtr GetStdHandle(int nStdHandle); internal struct COORD { internal short X; internal short Y; internal COORD(short x, short y) { X = x; Y = y; } } [StructLayout(LayoutKind.Sequential)] internal unsafe struct CONSOLE_FONT_INFO_EX { internal uint cbSize; internal uint nFont; internal COORD dwFontSize; internal int FontFamily; internal int FontWeight; fixed char FaceName[LF_FACESIZE]; } internal const int TMPF_TRUETYPE = 0x4; internal const int LF_FACESIZE = 32; internal const string BOM = "\uFEFF"; internal const int STD_OUTPUT_HANDLE = -11; // Handle to the standard output device. internal const int ERROR_INVALID_HANDLE = 6; internal const int ERROR_SUCCESS = 0; internal const uint FILE_TYPE_UNKNOWN = 0x0000; internal const uint FILE_TYPE_DISK = 0x0001; internal const uint FILE_TYPE_CHAR = 0x0002; internal const uint FILE_TYPE_PIPE = 0x0003; internal const uint FILE_TYPE_REMOTE = 0x8000; internal static IntPtr INVALID_HANDLE_VALUE = new IntPtr(-1); }

And there you go!

A few things to note here:

I know the last point because I used to say that, when I was not as smart as I am now.

In fact, as I said way back in the beginning, the odds are in favor of the fact that you yourself were not nearly as smart before you read this blog as you are now that you have read it! :-)

And now if you will excuse me, I have to start conversations with the gazillion console applications in Windows that routinely punt bugs in console apps talking about their lack of Unicode support....


Craig Peterson on 7 Apr 2010 1:14 PM:

Is the 'W' at the end of ReadFileW and WriteFileW a typo?  I can't find anything about it online or in the SDK headers.

In any case, thanks for the post.  I feel smarter already!

Michael S. Kaplan on 7 Apr 2010 1:17 PM:

Those are the Unicode versions of the functions I link to -- but you want to call the Unicode ones, whether by compiling with UNICODE or by calling the "W" versions explicitly (the sample does the latter).

Random832 on 7 Apr 2010 2:52 PM:

If you were to select a font which has the Arabic or CJK characters in it, will it appear correctly? I already notice that the [double-width] CJK characters take up only a single column each. So much for no complex scripts or CJK in the console, indeed.

Brendan Elliott on 7 Apr 2010 3:01 PM:

Thank you so much for the console font trick! I now have a way to read Japanese console text on a Japanese machine with the system locale set to English (for compapiblity reasons).  All my years of studying Japanese hadn't increased my ability to read a series of ASCII question marks, so a true type font plus copy & paste is a very useful workaround to know.

Michael S. Kaplan on 7 Apr 2010 6:15 PM:

@Random832 - if such a font is available (generally they aren't unless your system locale matches). But the redirect case works fine and the copy/paste works as well.....

@Brendan Elliott: Great! Glad to assist. :-)

Mike Dimmick on 8 Apr 2010 6:57 AM:

Craig, Mike: there's no ReadFileW or WriteFileW because the functions operate on binary data - therefore, not safe to convert anything. The documentation does not include the "Unicode and ANSI names" section for that reason. There is only ReadFile and WriteFile.

Mostly functions that have A and W versions have string parameters, or structure parameters (or pointer-to-structure) where the structure contains one or more string parameters.

ReadConsole and WriteConsole have A/W variants as they deal with string parameters even though the parameters are declared as VOID*. I'm not actually sure why this is, perhaps because the strings are not required to be null-terminated.

Michael S. Kaplan on 8 Apr 2010 8:00 AM:

And the weird thing is that I knew that (note the WriteFile p/invoke above!). :-)

Igor Tandetnik on 8 Apr 2010 10:14 AM:

This line

if(! (filetype == FILE_TYPE_UNKNOWN) && (Marshal.GetLastWin32Error() != ERROR_SUCCESS)) {

doesn't look right. Perhaps the closing paren after UNKNOWN and an opening one before Marshal shouldn't be there. Personally, I'd write

if(filetype != FILE_TYPE_UNKNOWN ||

  Marshal.GetLastWin32Error() == ERROR_SUCCESS) {

Michael S. Kaplan on 8 Apr 2010 12:15 PM:

Actually, the check is kind of right, believe it or not -- it is attempting to catch the case where it is unknown yet succeeded. Weird code behavior trying to key off weird function results....

Pavanaja U B on 8 Apr 2010 9:24 PM:

You (I mean MS) are still putting me down by not allowing complex scripts (opentype fonts) in console.

-Pavanaja

Michael S. Kaplan on 8 Apr 2010 11:38 PM:

Well, putting down is a relative term....

Seth on 9 Apr 2010 11:02 AM:

Am I missing something? In the line

if(! (filetype == FILE_TYPE_UNKNOWN) && (Marshal.GetLastWin32Error() != ERROR_SUCCESS)) {

say filetype is FILE_TYPE_CHAR, then "filetype == FILE_TYPE_UNKNOWN" evaluates to false, and so "! (filetype == FILE_TYPE_UNKNOWN)" evaluates to true. Since there wasn't an error GetLastError() returns ERROR_SUCCESS and GetLastError() != ERROR_SUCCESS evaluates to false, and the whole expression evalutes to false and the function exits without writing anything?

It seems like the condition you want to return on is if the type is unknown because there was an error. So if it's unknown but there is no error then you still continue on with the write. I think that's the same as just making sure there's no error, so should that check be replaced with the following?

if(GetLastError() != ERROR_SUCCESS) {

   return;

}

Also, why do we need to check that filetype is FILE_TYPE_CHAR? Isn't it enough to just check that out is a console (are all consoles FILE_TYPE_CHAR?), and does the following do that?

bool fConsole = (GetConsoleMode(out,&mode) || (GetLastError() != ERROR_INVALID_HANDLE));

So could it be right to do the following?

void WriteLineRight(std::string const &s) {

//...

HANDLE out = GetStdHandle(STD_OUTPUT_HANDLE);

if(out == INVALID_HANDLE_VALUE) {

   return;

}

// we don't directly check the filetype of output handle, we only check if it's a console

DWORD mode;

bool fConsole = (GetConsoleMode(out,&mode) || (GetLastError() != ERROR_INVALID_HANDLE));

if(fConsole) {

   //... don't care about non-true-type consoles

   //... convert to wchar here

   WriteConsoleW(...)

} else {

   WriteFile(out,&s[0],s.size(),&written,NULL);

}

}

...

Michael S. Kaplan on 9 Apr 2010 1:22 PM:

In my opinion, you are missing something, yes. :-)

If you look at the docs for GetFileType, it is clear that:

You can distinguish between a "valid" return of FILE_TYPE_UNKNOWN and its return due to a calling error (for example, passing an invalid handle to GetFileType) by calling GetLastError.

If the function worked properly and FILE_TYPE_UNKNOWN was returned, a call to GetLastError will return NO_ERROR.

If the function returned FILE_TYPE_UNKNOWN due to an error in calling GetFileType, GetLastError will return the error code.

This sample code is distinguishing the two cases.

Seth on 9 Apr 2010 2:11 PM:

Okay, was something wrong with my analysis of the expression? When I ran the sample code it seemed to confirm my analysis by skipping over printing when my own version does do the printing.

My understanding of the requirements is that there are three possible cases:

1. file type is not unknown. therefore we know the call succeeded

2. file type is unknown, but the call succeeded

3. file type is unknown and the call failed

In case one we want to continue on with printing. In case two we also want to continue on with printing. In case three there was an error, and we cannot continue with printing and must return. This reduces down to just checking for success of the call, and checking if the type is unknown or not is unneeded.

However the sample code seems to only cause printing in a fourth, impossible case: when filetype is not unknown, but the call failed.

I think the expression contains a typo. ! has higher precedence than && right? so ! applies only to the left side, not the whole expression. If the ! were instead applied to the entire expression then it looks like it would be correct to me.

Seth on 9 Apr 2010 2:16 PM:

Also I'm still curious about checking for FILE_TYPE_CHAR specifically. Can consoles be anything else? Isn't a successful call to GetConsoleMode enough to distinguish between when we need to do  special things for console output vs. when we need to use WriteFile to write to the file that output's being redirected to?


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2010/09/23 A confluence of circumstances leaves a stone unturned...

2010/06/27 Bugs hidden in plain sight, and commented that way too ANSWERS

2010/06/18 Bugs hidden in plain sight, and commented that way too

2010/05/07 Cunningly conquering communicated console caveats. Comprende, mon Capitán?

go to newer or older post, or back to index or month or day