by Michael S. Kaplan, published on 2006/04/22 13:48 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/04/22/581356.aspx
I have talked about the limitations in ZIP before in the post Zipping up Unicode file names, but Heath has pointed out a new and interesting wrinkle in the problem in his post Update for the Palm Treo 700w Available, with Problems.
Now Heath may seem to some to be some kind of lightning rod for Unicode Lame List stories, but he isn't -- he is just a smart developer who is finding himself thrown into bad software situations that he did not design....
In this case we see the biggest problem with not using Unicode -- the basic problem of deciding what code page to use. It is probably not so much that zipfldr.dll is specifically using cp437 and cp1252, it is that it is using CP_OEMCP and CP_ACP.
What causes such a mistake to not get noticed, though? I mean, it is pretty un-natural to be using both constants, isn't it?
As luck (or unluck) would have it, they are not. The problem starts with the Shell folks, are using funky macros wrapped around funky shlwapi wrappers like SHAnsiToUnicode and SHUnicodeToAnsi. I call them funky because they are. They are also quite consistent in their underlying use of CP_ACP always.
And as for the rest of the problem, it looks like the CP_OEMCP is coming from the fact that it is a console app that is running things so that some of the translations are happening in this different context....
How smart is Palm feeling for putting ™ and ® in the filename, at this point? No wonder they took the update down. :-)
Clearly we'll need to see people using ASCII file names until people move up to Unicode. Code pages are just too damn confusing!
This post brought to you by "®" and "™" (U+00ae and U+2122, a.k.a. REGISTERED SIGN and TRADE MARK SIGN)
# Heath Stewart on 22 Apr 2006 4:20 PM:
# Heath Stewart on 22 Apr 2006 6:57 PM:
# Michael S. Kaplan on 23 Apr 2006 1:56 AM:
# Mihai on 24 Apr 2006 12:56 PM:
# Yuhong Bao on 12 Mar 2009 9:04 PM:
"All I know is that the DLL does not ever use the OEMCP"
Except it does, I looked at the zipfldr.dll imports and it imports OemToCharBuffA and CharToOemA.
referenced by
2008/05/13 WinZip, the [long awaited] Unicode edition!!!
2006/04/30 Sometimes, you have to keep it in ASCII