FAT/FAT32 oddness on Win2000

by Michael S. Kaplan, published on 2005/09/28 06:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/09/28/474756.aspx


It all seemed so simple -- that whole 'uppercase and binary comparison' semantic. Used by NTFS, by Windows in so many places like named pipes, mutexes, environment variables, and so on.

But then there is FAT and FAT32. :-(

Take the two characters:

Note that they are the compatibility and combining forms of the same Jamo (see this post for more info on the purposes of each).

Stick them both in filenames in the same directory in a FAT or FAT32 drive (.txt and .txt).

Works just fine.

Now change the default system locale to Korean and reboot.

If you just try to create the same files in a new directory, then it will give you an error:

There are all kinds of weirdnesses though:

Now if you look at code page 949, U+3131 is there and can roundtrip (it is 0xa4a1), but 0x1100 has a best fit mapping to the same character. I would at first have thought that this had something to do with the problem (it is certainly the cause for problems on Window 98 and Me!), but in Win2000 I can create filenames in Unicode only languages (where everything would map to ? on the code page) on these drives and I have no problems at all. So this is not just a simple code page issue.

It is also not a simple collation being used incorrectly issue, since these two characters are not considered equal there, either.

Luckily it also does not repro on Windows XP, Server 2003, or Vista. So whatever is going on here, they fixed it.

But it does keep the filesystem thing from being simple, especially since the newer machines will have the same behavior if you access those Win2000 drives over the network....

(Special thanks to Gregg Miskelley and Dylan Lingelbach for pointing out some of the anomalies here!)

 

This post bought to you by "" and "" (U+3131 and U+1100, a.k.a. HANGUL LETTER KIYEOK and HANGUL CHOSEONG KIYEOK)


# Jonathan on 29 Sep 2005 2:44 AM:

I remember (from hacking disks with Norton DiskEdit, in Win95 days) that short file names are saved in ANSI (presumably in the system locale's codepage), while LFNs are saved in Unicode (UTF-16, don't know about surrogate support though).

# Michael S. Kaplan on 29 Sep 2005 10:26 AM:

This must be the LFN though, or things like Georgian would never work, right?

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2007/10/24 In Case you have problems that you might think are ǸȦȘȚȲ

2007/09/05 Head checks containing either comparison or case validation BITE

2005/10/17 Comparing Unicode file names the right way

go to newer or older post, or back to index or month or day