Getting all you can out of a keyboard layout, Part #3

by Michael S. Kaplan, published on 2006/03/24 04:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/03/24/559169.aspx


Previous posts in this series: Part 0, Part 1, and Part 2.

Ok, we are making some progress here, and we are at the very least no longer stomping on the user's own keyboard list.

But we are ignoring dead keys and ligatures. Which, once again, is quite lame.

The key here is to have a little more respect for the return value of ToUnicodeEx. Right now we do nothing with the resulting string unless the return value is 1. But there are three other possibilities:

Clearly the only case where we want to do nothing is when 0 is the return value; in all other cases we want to do something. So let's fix that....

(As before, the older code is gray, the new code is black)

using System;
using System.Text;
using System.Windows.Forms;
using System.Runtime.InteropServices;

namespace KeyboardLayouts {
    class Class1 {

        //  You'll want to insert that enumeration here!

        internal const uint KLF_NOTELLSHELL  = 0x00000080;

        [DllImport("user32.dll", CharSet=CharSet.Unicode, EntryPoint="MapVirtualKeyExW", ExactSpelling=true)]
        internal static extern uint MapVirtualKeyEx(
            uint uCode,
            uint uMapType,
            IntPtr dwhkl);

        [DllImport("user32.dll", CharSet=CharSet.Unicode, EntryPoint="LoadKeyboardLayoutW", ExactSpelling=true)]
        internal static extern IntPtr LoadKeyboardLayout(string pwszKLID, uint Flags);

        [DllImport("user32.dll", ExactSpelling=true)]
        internal static extern bool UnloadKeyboardLayout(IntPtr hkl);

        [DllImport("user32.dll", CharSet=CharSet.Unicode, ExactSpelling=true)]
        internal static extern int ToUnicodeEx(
            uint wVirtKey,
            uint wScanCode,
            KeysEx[] lpKeyState,
            StringBuilder pwszBuff,
            int cchBuff,
            uint wFlags,
            IntPtr dwhkl);

        [DllImport("user32.dll", ExactSpelling=true)]
        public static extern int GetKeyboardLayoutList(int nBuff, [Out, MarshalAs(UnmanagedType.LPArray)] IntPtr[] lpList);

        [STAThread]
        static void Main(string[] args) {
            int cKeyboards = GetKeyboardLayoutList(0, null);
            IntPtr[] rghkl = new IntPtr[cKeyboards];
            GetKeyboardLayoutList(cKeyboards, rghkl);
            IntPtr hkl = LoadKeyboardLayout(args[0], KLF_NOTELLSHELL);
            if(hkl == IntPtr.Zero) {
                Console.WriteLine("Sorry, that keyboard does not seem to be valid.");
            }
            else {
                KeysEx[] lpKeyState = new KeysEx[256];

                for(uint sc = 0x01; sc <= 0x7f; sc++) {
                    uint vk = MapVirtualKeyEx(sc, 1, hkl);
                    if(vk != 0) {
                        StringBuilder sb = new StringBuilder(10);
                        int rc = ToUnicodeEx(vk, sc, lpKeyState, sb, sb.Capacity, 0, hkl);
                        if(rc > 0) {
                            StringBuilder sbChar = new StringBuilder(5 * rc);
                            for(int ich = 0; ich < rc; ich++) {
                                sbChar.Append(((ushort)sb.ToString()[ich]).ToString("x4"));
                                sbChar.Append(' ');
                            }
                            Console.WriteLine("{0:x2}\t{1:x4}\t{2:x2}\t{3}\t{4}",
                                sc, 
                                sbChar.ToString(0, sbChar.Length - 1), 
                                vk, 
                                ((KeysEx)vk).ToString(), 
                                ((Keys)vk).ToString());
                        }
                        else if(rc < 0) {
                            Console.WriteLine("{0:x2}\t{1:x4}\t{2:x2}\t{3}\t{4}\t\t\tDEAD!!!",
                                sc, 
                                ((ushort)sb.ToString()[0]), 
                                vk, 
                                ((KeysEx)vk).ToString(), 
                                ((Keys)vk).ToString());

                            // It's a dead key; let's flush out whats stored in the keyboard state.
                            ToUnicodeEx((uint)KeysEx.VK_SPACE, MapVirtualKeyEx((uint)KeysEx.VK_SPACE, 0, hkl), lpKeyState, sb, sb.Capacity, 0, hkl);
                        }
                    }
                }

                foreach(IntPtr i in rghkl) {
                    if(hkl == i) {
                        hkl = IntPtr.Zero;
                        break;
                    }
                }

                if(hkl != IntPtr.Zero) {
                    UnloadKeyboardLayout(hkl);
                }
 
           }
        }
    }
}

Now a few different things happened here. First, any time the return of ToUnicodeEx is greater than zero, all of the code points are dumped out.

Secondly, any time it is less than zero, it is known to be a dead key, which as I point out in this post are always limited to a single UTF-16 code unit. So we grab that one code unit and use it.

Thirdly, in that dead key case a second call is made to clear out the buffer -- otherwise the next call will be contaminated by the dead key value and will return either a different character entirely or two separate characters. Neither of those situation is too terribly desirable, so the buffer is cleared out.

(In an upcoming post I will explain why I chose VK_SPACE as the character for clearing out the buffer.)

It is very important to pay attention to that return value and never look past it when the string is not null terminated (and there is no guarantee that it will be). In fact, let's look at the return values table from the documentation:

-1 The specified virtual key is a dead-key character (accent or diacritic). This value is returned regardless of the keyboard layout, even if several characters have been typed and are stored in the keyboard state. If possible, even with Unicode keyboard layouts, the function has written a spacing version of the dead-key character to the buffer specified by pwszBuff. For example, the function writes the character SPACING ACUTE (0x00B4), rather than the character NON_SPACING ACUTE (0x0301).
0 The specified virtual key has no translation for the current state of the keyboard. Nothing was written to the buffer specified by pwszBuff.
1 One character was written to the buffer specified by pwszBuff.
2 or more Two or more characters were written to the buffer specified by pwszBuff. The most common cause for this is that a dead-key character (accent or diacritic) stored in the keyboard layout could not be combined with the specified virtual key to form a single character. However, the buffer may contain more characters than the return value specifies. When this happens, any extra characters are invalid and should be ignored.

Of course I am assuming people never mistype a dead key combination and thus tend to think of that "2 or more" case as being for ligatures -- certainly in the code provided the only case that applies is the ligature one (since it never combines with other characters!).

Ok, we are making progress now -- dead keys and ligatures. But we are still missing some important pieces like:

Now note that those last two go well beyond what even MSKLC supports, but that's okay; I am not limited in this sample by the same things that might limit functionality in MSKLC. :-)

Obviously we'll need something a bit smarter in the way of algorithm for some of these; this will be happening too. Remember that the main point of this sample is to show off some of those lessons that can be gleaned from this stuff....

 

This post brought to you by "3" (U+0033, DIGIT THREE)
A Unicode character that is in the very small family of those whose VK value is the same as it's code point!


# Phylyp on 24 Mar 2006 6:41 AM:

Wow, code samples + a series!   Looks like you're taking after Raymond :)

# Michael S. Kaplan on 24 Mar 2006 10:15 AM:

Hi Phylyp,

That's high praise! Though I never think of myself as someone quite in Raymond's league. :-)

# Martin Bohring on 24 Mar 2006 12:23 PM:

Hello Michael,
hey this series really provides good insight into the keyboard API's.
I only wish you had writtem this half a year ago
I would have saved me from a lot of hassles and dead ends (not keys).

But still those bits about harder shift states and death key chains sound interesting

# Michael S. Kaplan on 24 Mar 2006 12:48 PM:

Hi Martin!

Ah, I suppose I could have dated them back a year ago and then people who discover the blog for the first time would kick themselves for not finding it when they needed it? :-)

Stay tuned, as I delve in deeper into the under-documented!

referenced by

2015/08/07 Can you tell me how Microsoft keyboard layouts define a ligature and by the way can it be changed?

2007/08/10 I've been to the desert on a dead key with no name

2006/04/22 Getting all you can out of a keyboard layout, Part #10a

2006/04/13 Getting all you can out of a keyboard layout, Part #9b

2006/04/12 Getting all you can out of a keyboard layout, Part #9a

2006/04/10 Getting all you can out of a keyboard layout, Part #8

2006/04/06 Getting all you can out of a keyboard layout, Part #7

2006/03/31 Getting all you can out of a keyboard layout, Part #6

2006/03/28 Getting all you can out of a keyboard layout, Part #5

2006/03/27 Getting all you can out of a keyboard layout, Part #4

go to newer or older post, or back to index or month or day