by Michael S. Kaplan, published on 2005/09/28 06:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/09/28/474756.aspx
It all seemed so simple -- that whole 'uppercase and binary comparison' semantic. Used by NTFS, by Windows in so many places like named pipes, mutexes, environment variables, and so on.
But then there is FAT and FAT32. :-(
Take the two characters:
Note that they are the compatibility and combining forms of the same Jamo (see this post for more info on the purposes of each).
Stick them both in filenames in the same directory in a FAT or FAT32 drive (ㄱ.txt and ᄀ.txt).
Works just fine.
Now change the default system locale to Korean and reboot.
If you just try to create the same files in a new directory, then it will give you an error:
There are all kinds of weirdnesses though:
Now if you look at code page 949, U+3131 is there and can roundtrip (it is 0xa4a1), but 0x1100 has a best fit mapping to the same character. I would at first have thought that this had something to do with the problem (it is certainly the cause for problems on Window 98 and Me!), but in Win2000 I can create filenames in Unicode only languages (where everything would map to ? on the code page) on these drives and I have no problems at all. So this is not just a simple code page issue.
It is also not a simple collation being used incorrectly issue, since these two characters are not considered equal there, either.
Luckily it also does not repro on Windows XP, Server 2003, or Vista. So whatever is going on here, they fixed it.
But it does keep the filesystem thing from being simple, especially since the newer machines will have the same behavior if you access those Win2000 drives over the network....
(Special thanks to Gregg Miskelley and Dylan Lingelbach for pointing out some of the anomalies here!)
This post bought to you by "ㄱ" and "ᄀ" (U+3131 and U+1100, a.k.a. HANGUL LETTER KIYEOK and HANGUL CHOSEONG KIYEOK)
# Jonathan on 29 Sep 2005 2:44 AM:
# Michael S. Kaplan on 29 Sep 2005 10:26 AM:
2005/10/17 Comparing Unicode file names the right way
go to newer or older post, or back to index or month or day