by Michael S. Kaplan, published on 2012/01/04 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2012/01/04/10252916.aspx
At Microsoft, we support Unicode.
Particularly in Office and Windows - Unicode, Unicode, Unicode.
Unicode, Unicode, Unicode, Unicode, Unicode, Unicode, Unicode, Unicode, Unicode, Unicode, Unicode, Unicode, Unicode.
Did I mention that we support Unicode?
Well, occasionally we do silly stuff in Office, like that "J" smiley emoticon.
You know, the one I talk about in Maybe they had a really great experience on J-Date, or really liked Jay Leno.
Silly, but not the end of the world.
I'll tell you a secret, though.
There is one place in Windows that our support of Unicode bites. And bites huge.
It is in a feature we added in XP, known as:
As Wikipedia says about it in the its Zip (file format) article:
Versions of Microsoft Windows have included support for zip compression in Explorer since the Plus! pack was released for Windows 98. Microsoft calls this feature "Compressed Folders". Not all zip features are supported by the Windows Compressed Folders capability. For example, AES Encryption, split or spanned archives, and Unicode entry encoding are not known to be readable or writable by the Compressed Folders feature in Windows XP or Windows Vista.
I've talked about this problem off and on over the years, e,g. in blogs like Zipping up Unicode file names, Zipping up Unicode file PATHs, and WinZip, the [long awaited] Unicode edition!!!.
If I were asked to summarize these three blogs, the time-line would go something like this:
Here we are in 2012, so four years after both PKZIP and WinZip have proven ways to support Unicode in ZIP, we're still hiding behind some ancient code we licensed over a decade ago.
We have the source, we've even fixed bugs in it. But we never fixed this bug, and we never found other (better) code to do it right.
The only hope is to install WinZip, which will disconnect the [broken] compressed folders file association, and replace it with one that works.
And the next time someone from Windows goes on abut about our Unicode support, you can (with an ironic intonation) you can just tell them to ZIP it! :-)
David on 4 Jan 2012 7:47 AM:
There are still people out there still using WinZip?
7-zip is such an overall improvement it is hard to consider anything else.
Simon Buchan on 4 Jan 2012 7:52 AM:
Other than suggesting the more flexible and lightweight (and free) 7-zip over WinZip, I'd have to ask who thought licensing Zip file code was a good idea? I wrote a (rather trivial and barebones, I admit) read/write SDK in a day, it's not like it's a difficult format (unless you *require* supporting recovery, and even then it's not that complex). Supporting Unicode is, at least according to the specs, a case of en/decoding as UTF-8 rather than some DOS codepage when a bit is set, so it's not like it's a huge code investment there either. In short - from a cowboy coder perspective, that this is even an issue seems confusing!
Michael S. Kaplan on 4 Jan 2012 11:52 AM:
Well, there is a dearth of cowboy programmers, for one thing!
mpz on 4 Jan 2012 2:16 PM:
Since .zip archives created on Linux (where UTF-8 has been the default character set for years now) will not have the WinZip/PK specified "Unicode" bit set, you'll still end up with some incompatibilities.
A way around that would be to simply check whether the filename is a valid UTF-8 string, and if it is, treat it as such. Otherwise decode it according to the current legacy code page.
This is what IRC clients have been doing for a couple of years now. It works surprisingly well, since natural languages encoded in legacy character sets (like ISO-8859-1) rarely have sequences of characters that are also valid UTF-8.
But yeah, this is a major annoyance. And even if you fixed it today, there are still billions of ZIP files out there with filenames that do not conform to Unicode :-( This should have been fixed at the introduction of Windows XP really..
Aaron.E on 4 Jan 2012 2:33 PM:
So, what dll do we need to hotpatch to get this working properly? J
Yuhong Bao on 4 Jan 2012 3:10 PM:
"It is in a feature we added in XP,"
Funny when there is a quote from Wikipedia below saying this is incorrect. In fact, Me was the first version of Windows with it built in.
Michael S. Kaplan on 4 Jan 2012 3:13 PM:
The Plus! pack is not an official OS release....
Yuhong Bao on 4 Jan 2012 3:14 PM:
But WinMe is.
Michael S. Kaplan on 4 Jan 2012 7:36 PM:
Doesn't matter -- the quote is literally accurate.
And nitpicker proof. :-)
jon on 5 Jan 2012 12:27 PM:
We licenced the same zip library Microsoft did (with source), and found ourselves in the same position (no Unicode support) - the difference in our case was we invested the two or three days it took to add support for it.
If you'd like to discuss licencing our changes, yell out :)
cron22 on 10 Jan 2012 10:02 PM:
Okay that's hilarious.
go to newer or older post, or back to index or month or day