by Michael S. Kaplan, published on 2006/01/30 17:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/01/30/519760.aspx
(Special thanks to Dave Poole for pointing this one out!)
It was not too long ago that I got mail from Dave about a strange error he was getting in some automated tests of various SQL Server command line tools. An error that occurred with a command line including LATIN CAPITAL LETTER A WITH MACRON (Ā).
I am used to getting random mails of a particular class from people (such as Unicode characters not working in the console), so I started him down the troubleshooting road with chcp and default system locale and so on.
But he persevered and when he proved that he was entering things correctly and I looked at the problem, I saw that he was right. Typing an 'Ā' always had this particular problem:
while typing anything else did not:
Since Dave is on the SQL Server team and I am on Windows, I figured I should go find out owns CMD.EXE.
It turns out that is none other than Mark Zbikowski, Architect and man whose initials appear in every binary that is not one of those .COM files.
By many accounts, he is the third most "senior" employee working at Microsoft today.
Even though I had actually conversed with him before (about those casing table and NTFS issues!), I did not want to waste his time or take advantage of the fact that he is by all reports and in my personal experience a really nice guy.
So there was a brief delay while I tried to look into the bug a bit before sending mail to someone who had been here approximately ten times longer than me. :-)
I found a constant with the value 0x0100 in the code but could not really tell how the lexer and parser work enough to see a problem with an overlap between them. So I finally sent Mark some mail and asked him.
As it turns out, Mark and I have at least one thing in common -- he is also used to getting random mails of a particular class from people (such as specific characters not working in the console). So he started me down the road of troubleshooting when text is munged before it ever gets to CMD's parser.
But I persevered and when I proved that I was entering things correctly and he looked at the problem, he saw that I was right. Typing an 'Ā' always had this particular problem.
(to Mark's credit he realized all of this much more quickly than I did, but then he is much smarter than I am and has been here almost ten times longer than me so I think that would be expected!)
This is not a regression, and has been around even longer than Unicode support in the console has been (perhaps even as far back as the OS/2 days) -- back when 0x0100 would have been a very sensible mark that could be used to indicate something that is not a character (since no characters were above 0xFF!).
Good proof that there is always a small but measureable difference to be had between a nearly total rewrite and an actual rewrite. A difference that testers could make their living on if they needed to!
But in any case it was a rather cool bug, if you ask me. I have no idea why it has never been reported before, but it is reported now!
This post brought to you by "Ā" (U+0100, a.k.a. LATIN CAPITAL LETTER A WITH MACRON)
# Gabe on 31 Jan 2006 2:14 AM:
# Michael S. Kaplan on 31 Jan 2006 2:22 AM:
# Jerry Pisk on 31 Jan 2006 6:24 PM:
Yuhong Bao on 6 Sep 2011 2:24 PM:
I wonder why did it take more than 10 years to discover that ONE Unicode char was not working properly?
Michael S. Kaplan on 6 Sep 2011 2:36 PM:
I guess no one was using that letter.
2012/02/24 The Truth was unexpected at this time
2011/07/18 Pedantic nitpickery, turned up to 11
2010/10/07 Myth busting in the console
2006/11/11 Keeping out more of the undesirables
go to newer or older post, or back to index or month or day