by Michael S. Kaplan, published on 2006/07/25 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/07/25/676295.aspx
Apologies for the small George Carlin riff in italics below, it is based on the Civil War bit he did during his New Jersey HBO special back in the early 90s. I lack the budget to have Mr. Carlin do a Podcast saying this bit, so please use your imagination to get the full effect!
The first version of Windows (1.0) shipped back in 1985, and it didn't have all that much in the way of impressively compelling international support. There were other good reasons for nobody to buy it, so most people probably did notice the lack.
Anyway, about seven years after the first version was released, seven years later, Windows NT 3.1 was shown at the PDC in San Fransisco. And it supported Unicode.
Not so you'd really notice it, of course.
Just sort of 'on paper.'
Of course now, fourteen years later, and Microsoft is planning on shipping Windows Vista, a fully Unicode operating system.
But not so you'd really notice it.
Because we still have these components that don't support Unicode.
Components who figure a code page is a really keen way to encode.
And the developers study the encoding carefully, and they try to improve on the strategies and the tactics to increase the component's utility. In case we have to go through writing new non-Unicode support some time. [sarcasm]
In fact, some of these components actually get used in top of the line applications and they go out and shoot for the moon with the features they provide.
You know what I say? Use live ammunition, would you please?
That was fun. :-)
Anyway, let's get down to it.
One of those components, I mentioned briefly in this post: wininet.dll.
It came up because we had an interest in changing the defaults for the NtfsAllowExtendedCharacterIn8dot3Name setting, documented as:
Specifies whether the characters from the extended character set, including diacritic characters, can be used in short file names using the 8.3 naming convention on NTFS volumes.
On NTFS volumes, file names using the 8.3 naming convention are limited to the standard ASCII character set (minus any reserved values).
On NTFS volumes, file names using the 8.3 naming convention may use extended characters.
This entry does not exist in the registry by default. You can add it by using the registry editor Regedit.exe.
Of course what is not mentioned in that informational topic is that years ago it was decided that this value should be set anytime the default system locale was Chinese, Japanese, or Korean (and unset anytime it wasn't).
There are several problemes here --
After talking with various partners and knowledgable people in the file system and the various markets, we tried just setting it always and being done with it. Unicode had been around for some time, maybe it was "time to cut the cord" (the exact words of one of the file system architects).
In fact, if you have Beta 2 of Vista then that is what you have on your install.
Everything was going great until we found out that that one several-year old baby still had the cord attached. :-(
The wininet cache (that is used to basically cache everything that various processes including IE use accessing the internet) does not support Unicode, since wininet.dll doesn't (wininet supports a Unicode interface that converts anything you throw at it, but that is more or less it).
Now for a page on the web it would not be too noticeable; after all, if a cache item of an internet access cannot be reached, then it just wouldn't get used -- you just go right to the internet. Unfortunately if you have a user name that isn't on your default system code page then the path to the cache itself is broken. So you fail even trying to get to it to fail -- so basically you lose Internet Explorer.
Anyway, no worries, even though no beta customers had reported the problem, there was no sense waiting for a report -- clearly this was a big enough regression that it had to be fixed.
The change has been reverted for future builds, so that wininet.dll's lack of Unicode support (and incidently of the Windows non-Unicode heritage!) is preserved for another version.
Though I suppose it means that there aren't a whole lot of Windows user names off the default system code page that are used on CJK system locale machines. Or if there are then those customers probably don't try to use IE much. Since they are as broken in the prior versions as they will be in the new version.
And of course the people who use that NtfsDisable8dot3NameCreation setting to block the creation of short file names are probably not going to be too happy either if they have long user names or names with characters off the default system code page, for roughly the same reason.
In the end I am not really too worried about it since both ANSI support and short file name support on NTFS are there for backwards compatibility. So I suppose the overlap is consistent enough that people are not hitting this particular bug much.
But it is a story that I have been shaking my head about since the problem was identified....
This post brought to you by ඤ (U+0da4, a.k.a. SINHALA LETTER TAALUJA NAASIKYAYA)
# Random Reader on 25 Jul 2006 4:09 AM:
# Michael S. Kaplan on 25 Jul 2006 7:51 AM:
# Ben Cooke on 25 Jul 2006 1:58 PM:
# Michael S. Kaplan on 25 Jul 2006 2:09 PM:
2008/05/29 Ask a simple question, and then duck!
2006/12/31 More on our non-Unicode heritage
go to newer or older post, or back to index or month or day