by Michael S. Kaplan, published on 2009/09/11 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2009/09/11/9894019.aspx
Microsoft is a company based in Redmond, Washington, in the United States of America.
Yes it is a world wide company.
Yes almost 60% of its products are sold to customers who are not in the US.
Yes there are development centers around the world and in many of them code that is written there ultimately can end up in Microsoft products.
But ultimately, that original fact is inescapable:
Microsoft is a company based in Redmond, Washington, in the United States of America.
All it takes something like the DST 2007 snafu to get people to see it: a bug affects users throughout the world (including in the USA in places like Indiana) for over a decade with minimal help/work from Microsoft yet as soon as it affects Redmond too the push to fix problems and help users even have vice presidents and general managers and directors of Microsoft logging phone hours to help users and afterward there are numerous presentations about how each team dealt with the problem that all ignore the fact that they had been ignoring the problem all along for a decade.
I could give countless other examples but many are less well known and some might violate my NDA so for now you can trust me that there are other examples.
Now there is no shame in being a company based in Redmond, Washington, in the United States of America.
And I would not want imply otherwise either in a blog or in person.
Though there probably ought to be some shame involved in not realizing the pain one causes others (e.g. those other countries dealing with time zone issues for a decade, something I even not-too-gently but not-too-harshly chastised a couple directors about when that DST 2007 thing was winding down!).
Anyway, take the above as valid, if you don't then you may as well skip the rest of this blog and maybe even this Blog (since no relationship can really stay healthy when there is no trust!).
Did you know that any developer who is enlisted in the full sources for Windows (sources that include the compiler, linker, headers files, and LIBs as well as source) can build Windows?
It is true.
There are in fact developers in many parts of the world who work on Windows who have to do that very thing either occasionally or regularly. Or both.
Many people inside Microsoft have even given presentations about the strengths of such distributed development models and the advantages of being a company so large as to offer the opportunity of such models.
Now, for the other shoe to drop.
To build the full Windows product, all sources, you really must have a default system locale that will cause your default system code page to be 1252.
Such as US English.
The reason for this is that there are some source files that contain characters that are legal in cp1252 but in other code pages are either interpreted differently (incorrectly) or that will cause the build of those files to fail.
I ran across many of these as I was looking at code all over Windows and in most cases was not allowed to "fix" the problem as no one really saw it as a problem.
In almost every case I saw it was the same character (see Dumb quotes... or maybe they are just smart-ass quotes for which character it was) and the problem was in a comment.
A comment that was clearly created in an email written in Outlook using Word as the mail editor and then copied/pasted into the source.
Of course it is not a bug to make this mistake since it is not a bug to make a file unable to compile on another system locale.
Being a company based in Redmond, Washington in the United States of America, that just isn't a priority....
Now this is all well and good and is generally an internal issue at Microsoft that never impacts a customer in a way they would realize.
But if you look at recent version of the Windows SDK (formerly known as the Platform SDK), you may see an exception to this generalization.
First we'll look at the older version of the file in question, shobjidl.idl.
This one compiles everywhere.
The non-offending bit of the file, if you scroll down a bit, is:
// IShellFolder::CompareIDs lParam flags
//
// SHCIDS_ALLFIELDS is a mask for lParam indicating that the shell folder
// should first compare on the lParam column, and if that proves equal,
// then perform a full comparison on all fields. This flag is supported
// if the IShellFolder supports IShellFolder2.
//
// SHCIDS_CANONICALONLY is a mask for lParam indicating that the shell folder
// that the caller doesn't care about proper sort order -- only equality matters.
// (Most CompareIDs test for equality first, and in the case of inequality do
// a UI sort. This bit allows for a more efficient sort in the inequality case.)
Ok, see the problem?
That was a trick question, there is no problem.
Fast forward to a much newer version, like the one in the 6.1 and 7.0 SDK:
// IShellFolder::CompareIDs lParam flags
// *these should only be used if the folder supports IShellFolder2*
//
// SHCIDS_ALLFIELDS
//
// only be used in conjunction with SHCIDS_CANONCALONLY or column 0.
// This flag requests that the folder test for *pidl identity*, that is
// “are these pidls logically the same”. This implies that cached fields
// in the pidl that would distinguish them should be tested.
// Without this flag, you are comparing the *object* s the pidls refer to.
//
// SHCIDS_CANONICALONLY
//
// This indicates that the sort should be *the most efficient sort possible*, the implication
// being that the result will not be displayed to the UI: the SHCIDS_COLUMNMASK portion
// of the lParam can be ignored. (Before we had SHCIDS_CANONICALONLY
// we assumed column 0 was the "efficient" sort column.)
//
//
Ok, now we have a party.
We have a couple of those quote characters that don't exist on all code pages and in fact for Japanese represent a byte that is illegal to have by itself, which means it will not compile.
The long and short of it is if you have a Japanese system locale you can't use this .IDL file unless you munge the file to remove the bogus quotes.
Now I don't know of any devs who write either code or comments in Word, but getting an email containing an "updated comment to better explain this bit" seems pretty obvious and not at all uncommon to see (if you ignore the relative uncommonality of such updates).
Oops.
This oops is in a couple of Windows SDK editions and some of those that shipped in products like Visual Studio and in the not-yet-shipped VS 2010.
In fact, I don't think it will be fixed for VS 2010 since they ship an already shipped PSDK and there won't be an update they pick up before they ship.
Oops again.
Anyway, they're on it now, and this will get fixed at some point.
That fix will eventually end up everywhere.
If you hit this problem, maybe you will feel somewhat less unhappy knowing that people like me can hit this problem a bunch of times in a night if I do a full Windows build. So that I share your pain....
And we are still a company in Redmond, Washington, in the United States of America.
Robert on 11 Sep 2009 12:29 PM:
Maybe the problem is that the compiler is trying to make sense of something that it is supposed to skip?
John Cowan on 11 Sep 2009 7:38 PM:
"Now there is no shame in being a company based in Redmond, Washington, in the United States of America."
No. But there is shame in being Microsoft.
(Sorry, Lord. The man tempted me, and I did snark.)
Michael Everson on 11 Sep 2009 8:37 PM:
Erm. Um. Well. OK, one might just suggest you switch to the Mac OS. At least there's joy.
Henry Skoglund on 12 Sep 2009 5:17 PM:
I remember a project long ago where developers had to edit their code using the Terminal font when a release was imminent, just to catch these kinds of errors (because some of the files were shared with mainframes, i.e. EBCDIC).
Still a valuable advice, it seems.
Rgrds Henry
Cheong on 13 Sep 2009 2:45 PM:
Robert: almost wanted to second your comment, but seems XML documentation style comments are inside comment. So the compiler cannot ignore them completely.
And it's always easier to write code to check the whole file for particular code page, then to write code that skips comments, eapecially considering both // and /* */ are legal comments.
Not that it's rocket science to write something to process it like that, but the priority it's that high to make them push the fix.
Erzengel on 14 Sep 2009 2:45 AM:
Is there any particular reason that these files can't be marked with a UTF-8 BOM? (That's not sarcasm, I'm genuinely curious)
chavakiran on 14 Sep 2009 4:49 AM:
:)
There are many more internal tools with such behavior ; better stick to default locale than fixing them :P
Michael S. Kaplan on 14 Sep 2009 8:25 AM:
One can postulate many ways to fix the problem, but when no one wants to change a file at all, the options are much more limited!
Random832 on 14 Sep 2009 1:30 PM:
Why don't the tools just use a substitute character? (for cp932 it would be U+30FB KATAKANA MIDDLE DOT, covering both the offending byte and the one after it)
Or better yet, restrict to ASCII in all cases so that the error can be found immediately.
Or have a command-lne flag for what codepage to use
Michael S. Kaplan on 14 Sep 2009 7:47 PM:
Given that MultiByteToWideChar is used today, the first two would be a huge behavior change.
The last idea alreadys exists, but changing per file would ne just as much of a change to Windows as anything else would in tyerms of behavior -- and no one else sees it as a bug here....
Making every customer change how they compile would be even more difficult, all to make up for an MS bug? :-(
Pacific Reader on 6 Oct 2009 8:18 AM:
Speaking of DST snafus, we're at GMT+13 here and you mightn't be surprised by the poor software support in general. Programmers just assume there's nothing beyond GMT+12!
Yuhong Bao on 16 Oct 2009 1:32 PM:
"One can postulate many ways to fix the problem, but when no one wants to change a file at all, the options are much more limited!"
What about just ignoring invalid characters inside comments? After all, comments are just that, they are not compiled at all.
Michael S. Kaplan on 17 Oct 2009 12:25 PM:
And the folks who use the current behavior as a way to make sure projects *are* compiled under certain environments? We'd be breaking them!
Ignoring that, implementation simply converts the whole file, it does not parse before conversion (parsing first is FRAUGHT with problems of the decidedly non-trivial type!)....
Yuhong Bao on 26 Dec 2011 12:37 AM:
Another example is how RTL layout in IE is less tested and thus more buggy than LTR layout.
referenced by