Off by one what, exactly?

by Michael S. Kaplan, published on 2010/10/08 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/10/08/10073124.aspx


It's a funny thing about off-by-one errors, you know.

Looking at the subject in Wikipedia:

An off-by-one error (OBOE) is a logical error involving the discrete equivalent of a boundary condition. It often occurs in computer programming when an iterative loop iterates one time too many or too few. Usually this problem arises when a programmer fails to take into account that a sequence starts at zero rather than one (as with array indices in many languages), or makes mistakes such as using "is less than or equal to" where "is less than" should have been used in a comparison.

I found another case yesterday.

By which I mean today, since it is today that I am writing it, but by the time you read it then either you hacked my account on the blog server, or it is tomorrow. Or later still.

Anyway, the error.

So if you look at keyboard hardware, each key has a scan code that it sends to the computer when you hit it.

Here is a typical layout, shown via MSKLC:

Now the key with the tilde on it has a scan code of 29.

And then the numbers 1234567890 have the scan codes of 02 to 0b.

You may know where I am going with this one....

If you look at the keyboards of India that ship on Windows, most of them have one thing in common: they put U+200d (ZERO WIDTH JOINER) and U+200c (ZERO WIDTH NON-JOINER) in the CRTL+SHIFT shft state.

This was very important at the time when Windows 2000 was being developed, back when Unicode had the grand plan for how to use these two characters I showed in Why don't all the half forms sort right? -- before fonts started widely doing something different as I discussed in Which form to use if the form keeps changing?.

So because of this fact, they were put on most of the keyboards.

Here we come to the problem, though.

You see, while all the keyboard layouts we ship (e.g. the Hindi Traditional keyboard) put these two control characters on the CTRL+SHIFT+1 and CTRL+SHIFT+2 keys:

but a colleague of mine who had created some updated versions of some of the layouts had put them instead on the CTRL+SHIFT+2 and CTRL+SHIFT+3 keys.

I pointed this out to her, and she admitted she did the work in the .KLC file rather that MSKLC itself. And when looking at the following rows in the file:

02  1   0  09e7   0021  -1   -1
03  2   0  09e8   0040  -1   200d
04  3   0  09e9   %%    -1   200c

The confusion of the scan codes (02, 03, 04) right next to the VK_* values (1, 2, 3) and her knowledge that the scan codes were one off from the numbers caused her to put them in the wrong spot.

And thus the decision long ago that caused the scan codes to not line up for these digits when they could have, combined with an incorrect compensation for that of-by-one assignment by intentionally being one off from the scan codes, led the characters to be in the wrong slot.

Clearly this was not due to the traditional "off-by-one" error due to 0-based vs. 1-based counting that the Wikipedia article was referring to.

But the fact that the scan codes were off-by-one from the numbers atop them due to the way they were assigned and the potential confusion thereof made it easy for her to introduce an off-by-one error of her own!

The story has a happy ending, though. Her keyboards will now be correct when she uses them, and she was only a little embarrassed by the fact that I told her I'd be writing this blog you are reading (though she admitted that it wouldn't make sense to not do!). She even bought me a beer for looking at her layouts. And thus everybody wins....


Random832 on 12 Oct 2010 8:53 AM:

Why doesn't a trackback show up here from blogs.msdn.com/.../10074411.aspx ?

Michael S. Kaplan on 12 Oct 2010 9:06 AM:

Tracbacks are broken at the moment due to a bug.


referenced by

2010/11/03 Y can't Z Undo, exactly?

go to newer or older post, or back to index or month or day