Handling [Unicode] input in the console

by Michael S. Kaplan, published on 2005/12/15 12:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/12/15/504092.aspx


It was about seven months ago when I pointed out a method for typing in random Unicode code points using the Unicode IME.

And then it was about five months ago that I revisited the issue in typing in random Unicode code points redux.

In the latter post, I talked about a cool utility that Andrew put together, and I pointed out how on his web site (fileformat.info) he did a nice review of the various ways to enter Unicode text. The title of the page is How to enter Unicode characters in Microsoft Windows.

This may be a site worth watching, since I swear he has added to it in these last few months!

I was actually playing around with the console keyboard functions (you know, like ReadConsoleInput) trying to look at how a developer who had to deal with text input without the miracle of WM_CHAR and WM_IME_CHAR and WM_UNICHAR might get their work done. Here is the program:

#include <windows.h>
#include <wincon.h>
#include <stdio.h>
 
int main()
{
  puts("ReadConsoleInput test");
  puts("Ctrl-D to quit.\n");

  HANDLE in = GetStdHandle(STD_INPUT_HANDLE);
  int n = 0;
  puts("#    UC  u/d  VK   SC  State\n\n");
  for (;;) {
    INPUT_RECORD ir[1];
    DWORD cEventsRead;
    if (! ReadConsoleInputW(in, ir, 1, &cEventsRead)) {
      puts("ReadConsoleInput failed!");
      return 0;
    }
    for (DWORD i = 0; i < cEventsRead; ++i, ++n) {
      if (ir[i].EventType & KEY_EVENT) {
        KEY_EVENT_RECORD ker = ir[i].Event.KeyEvent;
        printf("%3d: U+%04x %s %04x %04x %04x\n",
            n,
            (WORD)ker.uChar.UnicodeChar,
            ker.bKeyDown ? "down" : " up ",
            ker.wVirtualKeyCode,
            ker.wVirtualScanCode,
            ker.dwControlKeyState);

        if (!ker.bKeyDown && ker.uChar.AsciiChar == ('D' - '@'))
          return 0;
      }
    }
  }
  return 0;
}

Now for predictable things like typing the keys "ABCDEF" with the US English keyboard the output is predictable:

ReadConsoleInput test
Ctrl-D to quit.

#      UC    u/d  VK   SC  State


  0: U+0061 down 0041 001e 0000
  1: U+0061  up  0041 001e 0000
  2: U+0062 down 0042 0030 0000
  3: U+0062  up  0042 0030 0000
  4: U+0063 down 0043 002e 0000
  5: U+0063  up  0043 002e 0000
  6: U+0064 down 0044 0020 0000
  7: U+0064  up  0044 0020 0000
  8: U+0065 down 0045 0012 0000
  9: U+0065  up  0045 0012 0000
 10: U+0066 down 0046 0021 0000
 11: U+0066  up  0046 0021 0000
 12: U+0000 down 0011 001d 0008
 13: U+0004 down 0044 0020 0008
 14: U+0004  up  0044 0020 0008

and the same keystrokes with the Russian keyboard are also fairly predictable:

ReadConsoleInput test
Ctrl-D to quit.

#      UC    u/d  VK   SC  State


  0: U+0444 down 0041 001e 0000
  1: U+0444  up  0041 001e 0000
  2: U+0438 down 0042 0030 0000
  3: U+0438  up  0042 0030 0000
  4: U+0441 down 0043 002e 0000
  5: U+0441  up  0043 002e 0000
  6: U+0432 down 0044 0020 0000
  7: U+0432  up  0044 0020 0000
  8: U+0443 down 0045 0012 0000
  9: U+0443  up  0045 0012 0000
 10: U+0430 down 0046 0021 0000
 11: U+0430  up  0046 0021 0000
 12: U+0000 down 0011 001d 0008
 13: U+0004 down 0044 0020 0008
 14: U+0004  up  0044 0020 0008

Once you get into the <ALT>+Numpad stuff, things become harder to figure. Typing <ALT>65:

ReadConsoleInput test
Ctrl-D to quit.

#      UC    u/d  VK   SC  State


  0: U+0000 down 0012 0038 0121
  1: U+0000 down 0066 004d 0021
  2: U+0000  up  0066 004d 0021
  3: U+0000 down 0065 004c 0021
  4: U+0000  up  0065 004c 0021
  5: U+0041  up  0012 0038 4000120
  6: U+0000 down 0011 001d 0028
  7: U+0004 down 0044 0020 0028
  8: U+0004  up  0044 0020 0028

Now I do not know what to make of that weird dwControlKeyState value that came out when I released the <ALT> key and U+0041 was output. Especially since ReadConsoleInput claims that it can be one or more of the following will be output:

Value Meaning
CAPSLOCK_ON
0x0080
The CAPS LOCK light is on.
ENHANCED_KEY
0x0100
The key is enhanced.
LEFT_ALT_PRESSED
0x0002
The left ALT key is pressed.
LEFT_CTRL_PRESSED
0x0008
The left CTRL key is pressed.
NUMLOCK_ON
0x0020
The NUM LOCK light is on.
RIGHT_ALT_PRESSED
0x0001
The right ALT key is pressed.
RIGHT_CTRL_PRESSED
0x0004
The right CTRL key is pressed.
SCROLLLOCK_ON
0x0040
The SCROLL LOCK light is on.
SHIFT_PRESSED
0x0010
The SHIFT key is pressed.

Kind of makes you wonder what 0x4000120 is supposed to mean, doesn't it? :-)

It does not happen if you type <ALT>0065:

ReadConsoleInput test
Ctrl-D to quit.

#      UC    u/d  VK   SC  State


  0: U+0000 down 0012 0038 0121
  1: U+0000 down 0060 0052 0021
  2: U+0000  up  0060 0052 0021
  3: U+0000 down 0060 0052 0021
  4: U+0000  up  0060 0052 0021
  5: U+0000 down 0066 004d 0021
  6: U+0000  up  0066 004d 0021
  7: U+0000 down 0065 004c 0021
  8: U+0000  up  0065 004c 0021
  9: U+0041  up  0012 0038 0120
 10: U+0000 down 0011 001d 0028
 11: U+0004 down 0044 0020 0028
 12: U+0004  up  0044 0020 0028

I'll have to dig into the source at some point to figure out what the extra 0x4000000 is supposed to mean. :-)

Now I should point out something that you may find out yourself if you take that sample code and compile it, which is the complication of trying to type with the <ALT> key held down and typing the numbers since unless you are used to doing this very fast you will get the "held down" key as additional down key event records.

And the managed version of this code (easily found in the new Console.ReadKey method that returns a ConsoleKeyInfo object) has the same issue with the repeats.

Of course on one level makes sense since it is happening but on another makes it harder to know what is being typed, but on another level the cooler separation between WM_KEYDOWN/WM_KEYUP and the various character messages is actually much easier to handle on the programming side for cases like the <ALT>+Numpad entry method....

It is clear to me that it is more complicated to try and read this information in the console than it is with Windows applications and the messages that are sent.

 

This post brought to you by "р" (U+0440, a.k.a. CYRILLIC SMALL LETTER ER)


# Michael S. Kaplan on 15 Dec 2005 12:03 PM:

By the way, thanks to Ravi for the original inspiration on this one. :-)

referenced by

2008/06/27 How does it work? It cheats, that's how!

2008/06/27 Adding keys to your layout when you don't know what they are (aka Life with a 108-key keyboard layout)

2008/01/11 Getting your Control to Function, and putting your Function in Control

2007/11/28 Tracking down problems with the 102 key on an Acer Tablet with the UK Extended keyboard layout?

2006/09/26 In case you have a yen to extend your keyboard (or at least want a yen?)

2005/12/16 More on handling [Unicode] input in the console

go to newer or older post, or back to index or month or day