Bugs hidden in plain sight, and commented that way too

by Michael S. Kaplan, published on 2010/06/18 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/06/18/10024976.aspx

In my time working in windows, I have seen a lot of different code.

As begets any project of this size written over this much time by this many developers, there is a lot there.

Some parts of it are ordinary.

Other parts of it are quite brilliant.

Still other parts are just wrong.

Every once in a while the wrongness even includes a comment that vmakes the wrongness so entirely clear to anyone who read the comment and went so far as to think about what it said that one has to conclude that no one was reading the comment and thinking about whether it might be true.

An example of this phenomenon can be seen in blogs of mine like The Bug(s) Spotted, aka Design flaws are worse than bugs, just to give one example.

Today I am going to talk about another!

Now before I get into that, let me backpedal a bit and mention a small point of obviation, or at least of mitigation.

When code has been around for a while (like multiple versions) it is treated as if it is a bit more stable. More likely to be right.

As a consequence, comments associated with that code are sometimes given that same bit of extra authority. And people who see it in passing who are not thinking about bugs or code reviews can easily let that authority stand, subconsciously.

It is of course still just as wrong, mind you.

But otherwise smart people can go so far as to quote the code with its comment when making a tangantial point, not even seeing the problem in either one!

Such as this bit from a piece of mail the other day, one that came up in a conversation about console applications (I have been involved in several such email threads/meetings after blogs like Conventional wisdom is retarded, aka What the @#%&* is _O_U16TEXT? and The real problem(s) with all of these console "fallback" discussions and Anyone who says the console can't do Unicode isn't as smart as they think they are and Cunningly conquering communicated console caveats. Comprende, mon Capitán?, as you might expect).

Product details removed for hopefully obvious reasons:

The current console output behavior for <redacted> and <redacted> was painfully established through years of international testing.
That’s not to say there are no issues – just that we need to be cautious about making changes.

// if redirected to a pipe or a file, don't use WriteConsole;
// it drops redirected output on the floor
// if going to a file, we should not use console codepage, we
// should use ANSI codepage OR write unicode & a unicode
// filemarker at beginning of file.
// Since we don't know if this is the first thing in the file,
// let's go the ANSI route.


See the problems?

 Let's make it interesting, in a Tosh.0 sort of way....

How many problems do you see, in any of the following:

Make Daniel proud....how many problems can you find in 20 seconds?

Ready? Set? Go!

Daniel on 18 Jun 2010 8:55 AM:

No semicolon in C code? ;)

Michael S. Kaplan on 18 Jun 2010 2:42 PM:

I said algorithm implied; the pseudo-code complier doesn't require semicolons!

Robert on 18 Jun 2010 11:48 PM:

This seems to be largely consistent with the remarks in the documentation of the WriteConsole function


Though using ANSI just because a BOM is too hard to think about seems like a bad excuse. They should probably be using Unicode without BOM. That doesn't specify the encoding, but neither do ANSI text files, given the console code page as another reasonable choice for redirected console output (cmd.exe does that, for example, in "echo abçøü > redirected.txt").

Ralph Trickey on 21 Jun 2010 5:25 PM:

There isn't 'a' ansi codepage, he probably meant 1252, the latin-1 Western european code page. If this were to run on any non european based language, it's got a reasonable chance to not do what's expected, and almost guaranteed to not be readable on any other system.

The comment says that they don't want to use the console codepage, but getacp returns the system codepage, which is also the console codepage.

The email also implies that there are issues, which makes for a very bad smell to this code.

That's one mistake per section, did I miss any?


Yuhong Bao on 21 Jun 2010 11:07 PM:

Is it the ANSI vs OEM code page?

Random832 on 22 Jun 2010 6:13 AM:

"The comment says that they don't want to use the console codepage, but getacp returns the system codepage, which is also the console codepage."

Not on 'european based languages' it's not, by default - in fact, pretty much not on anything except Chinese, Japanese, Korean, Vietnamese, and Thai. (and Unicode-only locales, I suppose - what will this code do on those?)

(Incidentally, the difference between the console codepage and the OEMCP is that one of them can be affected by the chcp command)

But, yeah, considering that cmd itself (/u) writes no-BOM unicode data to redirected output, there's really no excuse. And if they really think one is necessary, and their program always writes output, they could just write one at the beginning of the program, rather than trying to detect whether this is the first time their function is being called.

referenced by

2010/06/27 Bugs hidden in plain sight, and commented that way too ANSWERS

go to newer or older post, or back to index or month or day