Sometimes, the shortcuts give better AND faster results

by Michael S. Kaplan, published on 2007/04/24 06:01 -04:00, original URI:

Just because you build up a huge architecture plan for a software plan doesn't mean that you can't use a sensible shortcut or optimization from time to time.

Like for example, MSLU was set up to "speak Unicode" with its callers and "speak ANSI" with the Win9x operating system. This was a guiding architectural principal of the whole project.

But that wasn't going to mean that when calling a function that could sometimes accept Unicode that unicows.dll wouldn't just take a little shortcut and leave it as Unicode. I mean, why go through all the trouble to convert it on the way in and then maybe even back on the way out, taking up the time and allocating the space, all to be a little bit less functional, right?

Of course other times, people get so bogged down in the architecture plan that the shortcut doesn't occur to anyone.

Or maybe they just didn't know about it. :-)

Shawn asked me a question about someone else's bug that made me think about all this (just because I am on another team doesn't mean I don't answer questions!):

I learned that 弌訲 in a file name shows up OK if you do a dir in a cmd window, however those characters aren’t in CP950, so if you do a dir > out.txt, then they turn into ??.  How does the CMD window render these in the first place if they aren’t in CP950? 

So could I do a wprintf or something and see these?  (Like how does dir write them to the console?)

This is because of a .Net bug report.  .Net effectively does the conversion and writes that to the console.

This all kind of made sense to me.

I mean, sure if you weren't using a TrueType font then things might not work right.

But Windows 2000 and XP and Server 2003 and Vista are all Unicode platforms. So it makes sense in those cases where it wouldn't hurt anything that just sending Unicode text right to a Unicode console (after all, even when CMD.EXE is started in ANSI mode, it was still compiled as a Unicode application!).

So I wrote a little managed console application to test out this little thought experiment:

using System;
using System.Runtime.InteropServices;

public class Test {
    [DllImport("kernel32.dll", EntryPoint="WriteConsoleW", CharSet=CharSet.Unicode, ExactSpelling=true)]
    internal static extern bool WriteConsole(IntPtr hConsoleOutput,
                                             string lpBuffer,
                                             int nNumberOfCharsToWrite,
                                             out uint lpNumberOfCharsWritten,
                                             IntPtr lpReserved);

    [DllImport("Kernel32.DLL", EntryPoint="GetStdHandle", CharSet=CharSet.Unicode, ExactSpelling=true)]
    internal static extern IntPtr GetStdHandle(int nStdHandle);

    internal const int STD_OUTPUT_HANDLE = -11; // Handle to the standard output device.

    public static void Main() {
        string st = "\u5f0c\u8a32";
        uint NumberOfCharsWritten;

        WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), st, st.Length, out NumberOfCharsWritten, IntPtr.Zero);

Sure enough, if you run it with any system locale and default console, the return will be:


(well, you may see square boxes for the second string but you'll never see them converted to question marks!)

And then if you first run chcp 65001 and run it again, you'll see:


Maybe .NET should try directly calling WriteConsoleW once in a while instead of so aggressively converting to the console output codepage. Extension A would thank them for it!


This post brought to you by (U+5f0c, a Unicode Extension A Ideograph)

# Mike Dimmick on 24 Apr 2007 8:46 AM:

Digging around in Reflector and the Shared Source CLI 2.0 shows that .NET 2.0 is creating a StreamWriter using the Encoding corresponding to the console output codepage to wrap a special Stream (__ConsoleStream). That __ConsoleStream writes to the console using WriteFile, not WriteConsole. WriteFile takes void pointers, of course: there is no WriteFileW/WriteFileA pair as no conversion is done.

You could bump the task up to a special class derived from StreamWriter/TextWriter, but you'd have to take account of redirected handles because I don't think WriteConsole works on ordinary file handles (and you'd want your redirected output to be converted to the appropriate codepage, I think, so that pipelines still work properly).

It seems bizarre that the .NET Framework still runs on Windows 9x and that it doesn't make use of Unicode even where that works on Windows 9x.

# Michael S. Kaplan on 24 Apr 2007 8:54 AM:

In the end, all one has to do is detect the additional redirection and decide whether to optimize this case or not.... which in a weird way is how the unmanaged console handles this particular scenario, itself....

referenced by

2008/03/19 Before you say "What's next?" you have to figure out the action items

2007/11/22 If it does [best] fit, it may be off a bit! (aka Parlez-ゔ japonais?)

go to newer or older post, or back to index or month or day