The SOP for STX vs. SOH vs. SOT: excitabat enim fluctus in simpulo!

by Michael S. Kaplan, published on 2010/05/26 07:01 -04:00, original URI:

If your Latin is a little rusty, the title has a bit from Cicero in it; literally translated it would be something like "He was stirring up billows in a ladle" so think of it as me very pretentiously referring to "a tempest in a teapot"!

Sometimes Microsoft makes mistakes.

I know, major newsflash there. Stop the presses, for sure....

How quickly they are noticed, and how quickly they are fixed? Both points are subject to interpretation and bias. I tend to not be so biased in most cases (being unafraid to call a bone-headed thing bone-headed). But that might be a story for another day....

For now, I'll talk about one of the mistakes.

One that can be be traced back quite a ways though it is not entirely clear how far back.

You may have had your eyes pass over this bug yourself, once or perhaps many times.

If you know what I mean.

Perhaps if I explained what I was talking about it would be easier for you, huh?

I have a feeling this will gio much faster if I just start talking about what I am talking about.

It started with a customer report of a problem on one of web pages on goglobal (this one, the one with code page 1251 on it):

* Name: Юрий
* URL:
* Feedback Details: Здравствуйте! Мне кажется что символы с кодами 01 и 02 перепутаны местами, и символ SOT должен называться SOH (Start Of Heading)

Oh wait,  some of you might feel like your Russian skills are a tad rusty. I know mine are barely enough for this one myself.

Put simply, Yuri (Юрий) was suggesting that the character codes for 0x01 and 0x02 were reversed, and that SOT should be SOH.

In this table:

Every single code page reference on the goglobal site, and the original GlobalDev site tables they were migrated from, has the same layout for these two slots. Including, ironically enough, the ISO code pages.

This is one of those weird cases though. I mean, nothing is reversed in regard to the numbers (0x01 goes with U+0001 and 0x02 goes with U+0002 -- so the data in the table is right).

But if you look at info on these C0 control codes, you'll see that Yuri is right about the symbols the table uses to describe the slots; the symbols should actually look more like they looked in the original Developing International Software for Windows 95 and Windows NT.

Most of the code pages don't have these tables in them, but look in your copy of the book, check out the beginning of code page 932 right where Appendix G starts.

Go on, look now.

I'll wait.

Or, if your copy of the book isn't handy (maybe it is at the office and you are at home, or vice versa), you can check it out online here:

See what I mean?


Now unless I'm mistaken, the tables for the book were provided by the same person who provided the tables for the website (most likely one of two people: either the person who I would totally excuse the lapse in, or the other person who I'd shrug and say "that figures"). But either way, how this particular mistake came about is not entirely clear, either way.

The names themselves are also odd in that there are not the names according to Unicode; instead they are defined in places like ISO/IEC 6429:1992, a standard that I actually have a copy of from a few years back and which does quite clearly match the one in the book and not the ones on the globaldev/goglobal sites.

Now this error is not a recent problem -- according to the Internet Archive it has existed at least since 1999 when GlobalDev went live (as the WayBack Machine link proves pretty conclusively!).

And if you consider every code page table that has this wrong and the fact that they have been online for over decade and no one has ever reported it before, quite possibly INCLUDING YOU, which tends to put the  words of the person who forwarded it on to someone who forwarded it on to someone who forwarded it on to someone who forwarded it on to someone who forwarded it on to someone who asked me to comment about it:

Seems like one of our customers found a huge, huge issue with our GoGlobal center. The issue happens across all locales.

in rather sharp relief.

I mean, is it really that huge?

The text tables below have it right and there is no programmatic process that uses those graphical representations in Windows and the names are correct, it is just the two symbols in the table that are wrong.

Now there are some people out there who, when see that Microsoft is wrong about something, they alert Digg or SlashDot or Mary Foley or whoever and it's a big thing. But this small (admittedly x 35) graphical error that has been around over 10 years that no one ever noticed and if they had would never have made that kind of impact.... does not seem like such a big deal.

Maybe it is worth fixing (though I don't happen to think so, and if it were my call I wouldn't bother). Perhaps it is worth a KB article; I mean, if we can put up ones like my old favorite KB172653 (the one which made me want to look into working for Product Support for a bit!) then certainly we could consider writing one up about this web site graphics issue that has made it through over 10 years and almost ten versions of Windows. :-)

It just occurred to me that (in this blog) I assumed that you, as a reader here, knows both Latin and Russian, has a copy of the 1st version Developing International Software, and has seen those code page table at least once and maybe many times just because I have linked to them many times. I hope at least some of it was true, and if not that you were not too distracted thereby!

no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day