Ā was unexpected at this time.

by Michael S. Kaplan, published on 2006/01/30 17:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/01/30/519760.aspx


(Special thanks to Dave Poole for pointing this one out!)

It was not too long ago that I got mail from Dave about a strange error he was getting in some automated tests of various SQL Server command line tools. An error that occurred with a command line including LATIN CAPITAL LETTER A WITH MACRON (Ā).

I am used to getting random mails of a particular class from people (such as Unicode characters not working in the console), so I started him down the troubleshooting road with chcp and default system locale and so on.

But he persevered and when he proved that he was entering things correctly and I looked at the problem, I saw that he was right. Typing an 'Ā' always had this particular problem:

while typing anything else did not:

Since Dave is on the SQL Server team and I am on Windows, I figured I should go find out owns CMD.EXE.

It turns out that is none other than Mark Zbikowski, Architect and man whose initials appear in every binary that is not one of those .COM files.

By many accounts, he is the third most "senior" employee working at Microsoft today.

Even though I had actually conversed with him before (about those casing table and NTFS issues!), I did not want to waste his time or take advantage of the fact that he is by all reports and in my personal experience a really nice guy.

So there was a brief delay while I tried to look into the bug a bit before sending mail to someone who had been here approximately ten times longer than me. :-)

I found a constant with the value 0x0100 in the code but could not really tell how the lexer and parser work enough to see a problem with an overlap between them. So I finally sent Mark some mail and asked him.

As it turns out, Mark and I have at least one thing in common -- he is also used to getting random mails of a particular class from people (such as specific characters not working in the console). So he started me down the road of troubleshooting when text is munged before it ever gets to CMD's parser.

But I persevered and when I proved that I was entering things correctly and he looked at the problem, he saw that I was right. Typing an 'Ā' always had this particular problem.

(to Mark's credit he realized all of this much more quickly than I did, but then he is much smarter than I am and has been here almost ten times longer than me so I think that would be expected!)

This is not a regression, and has been around even longer than Unicode support in the console has been (perhaps even as far back as the OS/2 days) -- back when 0x0100 would have been a very sensible mark that could be used to indicate something that is not a character (since no characters were above 0xFF!).

Good proof that there is always a small but measureable difference to be had between a nearly total rewrite and an actual rewrite. A difference that testers could make their living on if they needed to!

But in any case it was a rather cool bug, if you ask me. I have no idea why it has never been reported before, but it is reported now!

 

This post brought to you by "Ā" (U+0100, a.k.a. LATIN CAPITAL LETTER A WITH MACRON)


# Gabe on 31 Jan 2006 2:14 AM:

So where was the bug? Was it in cmd.exe (the command prompt) or csrss.exe (the subsystem that includes the console)? Or elsewhere?

I should also add that I concur that Mark Z. is a nice guy. Back in 1995 my roommate (Paul) worked for him, but had no idea who he was. When Paul mentioned his boss' name, I simply had to meet him. Mark seemed impressed that I knew him as the guy who created the EXE format. At the time Mark was working on the never-to-be-released OFS, and readily answered all of my questions about the content indexing (which did ship).

# Michael S. Kaplan on 31 Jan 2006 2:22 AM:

The bug was definitely in cmd.exe, not csrss....

# Jerry Pisk on 31 Jan 2006 6:24 PM:

Isn't this just one case of the "magic" values and the problems associated with them? I'm just curious why was there a need to mix a command string with a value that was not a valid input at the time the code was written. Reminds you of all the mainframe programmers being lazy to express a missing year value and chosing the magic value of 99 to do that.

Yuhong Bao on 6 Sep 2011 2:24 PM:

I wonder why did it take more than 10 years to discover that ONE Unicode char was not working properly?

Michael S. Kaplan on 6 Sep 2011 2:36 PM:

I guess no one was using that letter.


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2015/04/13 Ā was unexpected at this time (MS-DOS Mobile edition)

2012/02/24 The Truth was unexpected at this time

2011/07/18 Pedantic nitpickery, turned up to 11

2010/10/07 Myth busting in the console

2006/11/11 Keeping out more of the undesirables

go to newer or older post, or back to index or month or day