A confluence of circumstances leaves a stone unturned...

by Michael S. Kaplan, published on 2010/09/23 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/09/23/10066660.aspx

Thinking back to blog posts from earlier this year, like

 It can be important to think back to an earlier one, Conventional wisdom is retarded, aka What the @#%&* is _O_U16TEXT?.

In it, I talked about how I was meeting with STL. What I didn't mention at at the time was that we were meeting to talk about what were the features in the CRT that were broken in regards to Unicode support.

There was a bunch that was broken like the locale data only having a non-Unicode backing store, and all of these items were fixed in version of the CRT that came out most recently.

However, as that blog points out, when STL was looking at one of my other complaints (the Unicode input/output problems), he found that someone had already done the work to add things like _O_U16TEXT and the like to get the right behavior when dealing with Unicode.

So he left that stuff alone (since it worked) and focused on the stuff that was insanely broken (like the locale stuff; considering all the conversion code he had to remove he might have had negative LOC numbers during that time!).

Now all of the blogs I list above do the work directly via the Win32 API (WriteConsoleW/WriteFile) and mention how you can the input via ReadConsoleW/ReadFile. I had tested all that and all of that works just fine, and has for some time.

This begs the question: why would I do that extra work after starting it all years ago by pointing out that the comparatively easy code (just a few lines) using the CRT works well?

Well, it was a side effect of how I was doing the samples!

You see, in my samples, I was usually writing C code in C# as a way to be more universally accessible to developers who weren't as comfortable with strauight C, and I could not figure out how to do the setmode work via the CRT in C#, so I just decided to do it the long way and go from there.

Unfortunately, there was a consequence of all of the previous history and this explicit choice of mine.

As it turns out, the work to support stuff like _O_U16TEXT et. al., while properly handling stdout and stderr, has no proper support for stdin.

It was architect Dave Thaler who first found this out when he was doing some essentially unrelated work. Like all good developers, he went the extra mile to try all of the reasonable things he could think of, like these three variations:

    wprintf(L"> ");
    wscanf(L"%ls", Buffer);
    wprintf(L"stdin: %ls\n", Buffer);

    wprintf(L"> ");
    _setmode(_fileno(stdin), _O_U16TEXT);
    wscanf(L"%ls", Buffer);
    wprintf(L"stdin _O_U16TEXT: %ls\n", Buffer);

    wprintf(L"> ");
    _setmode(_fileno(stdin), _O_BINARY);
    wscanf(L"%ls", Buffer);
    wprintf(L"stdin _O_BINARY: %ls\n", Buffer);

but each of these, and more, were failing -- the first one was converting it in and out of Unicode (ick) and the second two were requiring the user to type ^Z (CTRL+Z) to end the input and then also corrupting the text.

The investigation showed that the stdin side of this fix for Unicode input/output was never done.

I was so used to thinking of STL as the guy who did the Unicode work that it was only after he remind me that I realized that this was work someone else had done, at least four years prior to him even having a chance to look at the code!

Suddenly I felt a little bit relieved that I had been doing it the hard Win32 way all this time, because my code simply worked.

Though I felt bad that I hadn't figured out the "setmode from C#" problem and stayed with the "easier" CRT code, since then I would have seen the same bug that Dave ran into later as he was writing something that was using the CRT.

Anyway, the bug is in now so they can look at fixing it in the future, and in the meantime doing it all in Windows via functions such as ReadConsoleW/ReadFile will do the trick. And maybe someone from CSS could write a KB article about the CRT bug.

Or failing that, there is this blog you are reading.... :-)

John L Veazey on 15 Jul 2011 1:38 PM:

Is it possible to find out the current mode on a Pipe (from CreatePipe)?  I'm using the example code in the MS KB article "How to spawn console processes with redirected standard handles" (support.microsoft.com/.../190351).  So far I have figured out how to determine if a child process is writing out Unicode/Ansi.  But I haven't been able to figure out whether the child process _reads_ Unicode.  For example "cmd.exe /U /K" prints out Unicode but doesn't seem to accept it as input.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2010/10/07 Myth busting in the console

go to newer or older post, or back to index or month or day