by Michael S. Kaplan, published on 2007/10/24 10:31 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/10/24/5641619.aspx
People often don't understand the consequences of the case insensitive file system provided by NTFS.
So today I will show an extreme example of a [im]possibly complicated situation....
So, we'll start with our characters:
Then we'll take two jump drives that are 500mb each, and format them both with NTFS (one in XP and one with Vista -- the one with the updated casing table)....
Now we will take those ten characters above and combine them in various five character blocks, creating files on the NTFS 5.1 partition:
Note that this is on a Vista platform....
Of course the innocent NTFS 6.0 jump drive is sitting there, empty and anxious:
What happens if we try and copy our thirty files to this other drive?
Well, the first one will work (any one of them). But any attempt to copy a second file will give a nice prompt:
So how can the world of file systems survive the changes here?
Well, for the most part because the whole scenario is NASTY.
Who on earth but a tester or an ornery language person like me would put those thirty files on a drive, just to prove he could, just to write a blog post?
Back on planet earth, people are disappointed if case pairs are not recognized; this is not a feature they hate.
But there is another aspect here too; if you have a built-in drive on the machine that has a casing table unlike the one on the machine, at boot time you will often see a CHKDSK run replacing the casing tale on the drive with the one on the OS itself (in fact if you dual boot between XP and Vista you can even see each of them "fixing up" the drive this way at boot time). Imagine the consequences if you keep an MRU list and then try to use it after a reboot -- you could easily be selecting a different file than you did prior to that reboot!
(Shades of what to do with Getting the preserved case of a file wHeN tHe uSeRs tYpE wHaTeVeR tHe hElL tHeY wAnT tO in the above, right?)
But even if that drive with the thirty files is updated to the NTFS 6.0 casing table, no files are lost. They are all right there.
Even though you could not create that bunch of files yourself on the drive, you will see an issue similar to the one we saw with that FAT/FAT32 oddness on Win2000 issue where the files are fine and you can access them and use them, even though you can't create them in a neighboring directory yourself.
It is why I harp on issues like I do with In any CASE, it is somewhat INSENSITIVE to point out to someone how well PRESERVED they are and If you don't always preserve case, you don't always preserve meaning and What happens when people don't consistently preserve case? They cause bugs!.
Because NTFS is intrinsically case sensitive underneath it all, and the illusion of case insensitivity that is there is a layer atop the drive.
It is why so many of the IFS File System Runtime Library Routines take an optional pointer to an UpcaseTable -- because in this world where those things can change, it is important for the OS to accept those changes and respect that they can (and have, and will) happen.
The illusion of case insensitivity (ref: KB100625 and How Case Sensitive Filenames are Handled in the POSIX Subsystem under NTFS) is one that is quite hard to maintain, and it is one that you can easily destroy in an application that tries to duplicate a behavior at higher levels without fully understand what is happening at lower levels....
Now as I pointed out in Getting the preserved case of a file wHeN tHe uSeR tYpEs wHaTeVeR tHe hElL tHeY wAnT to, the rules for URLs are completely different, and I will talk about them soon as many people find them complicated and in some cases genuinely disturbing!
This post brought to you by those ten letters I mentioned above....
# Jeff on 25 Oct 2007 6:18 PM:
Holy crap. Wouldn't it be just easier (and more useful) to be case sensitive?
I've had it invisibly destroy data, where the files happened to be the same characters, but different case.
Why be a throwback to the FORTRAN days of 5-bit characters?
# Michael S. Kaplan on 25 Oct 2007 7:19 PM:
Short answer?
Because Windows has to for your mom, or someone's mom, a typical mom. One who does not understand people who claim that "A" is not "a" and so on.
More complicated? Sure. But that's why they don't pay us minimum wage to do it....
# Cristian Secară on 27 Oct 2007 5:46 AM:
> Who on earth but a tester or an ornery language person [...]
Cannot remember how, but some time ago I noticed thst on a WinXP system, when I used the correct Romanain characters for some filenames. That's why I avoid them today in filenmes :)
I mentioned the case sensitive strange thing in my post here http://blogs.msdn.com/michkap/archive/2006/12/07/1232365.aspx#1236058 (strange how the RAR archiver can also handle them, while at the same time the WinZIP archiver cannot, as far as I remember)
Cristi
Jan Kučera on 3 Dec 2007 7:58 AM:
Although this scenario might seem quite NASTY, why does Vista the same thing when copying folders/files beginning/ending with space?
One question is whether to allow user to create such folders/files, but not be able to copy them when they are there is another one I think (which I feel should be supported).
I came to this just when I wanted to copy "Folder" and " Folder" from XP HDD to Vista's one. XP had no problems copying this by dragging in Explorer, but Vista refused. What was the reason for this change?
I am aware this is not a NTFS versioning related problem, though (sorry) (it can by workarounded by copying using the command line).
Rosyna on 8 Dec 2007 2:35 AM:
So what happens when you have a bunch of ignorable characters fighting it out?
http://sailor月.com/imgs/ignorable.png
Screenshot from a disk image I made.
Michael S. Kaplan on 8 Dec 2007 5:24 PM:
I am unable to see the screenshot -- the URL doesn't resolve....
Rosyna on 8 Dec 2007 5:52 PM:
Hm, resolves without issue on Safari. But if the browser you use cannot handle unicode domain names:
http://xn--sailor-183m.com/imgs/ignorable.png
Michael S. Kaplan on 8 Dec 2007 5:58 PM:
Okay, I see it now (and on my other machine!).
Well, they aren't really ignorable in the filesystem sense, on Windows or on the Mac. So they aren't really fighting it out at all, really!
You can do the same thing on Windows, too -- I was just focusing on the case insensitiviry across versions issue....
Rosyna on 8 Dec 2007 7:08 PM:
Oh, perhaps I should have clarified. This is only possible on Mac OS X if you have a case-sensitive file system. If you have a case-insensitive file system (HFS+), ignorables are ignored and basically case-folded (same end result as case-folding, at least). This makes it impossible to have these files in the same directory. You can't even make them at the lowest-level.
The screenshot I posted was from a case-sensitive file system (HFS+, Case-sensitive) where ignorables aren't ignored.
Ignorables not being ignored becomes a slight issue when round-tripping APIs. Some Web apps (like AIM, it seems) and servers will strip ignorables for security reasons and then give you back the string without the ignorables in them.
Michael S. Kaplan on 8 Dec 2007 7:19 PM:
Yep, we have those problems on Windows too if you call the higher level functions expecting the lower level fidelity....
referenced by
2007/12/27 No upproblems with $UpCase when you do a upVista upgrade
2007/12/08 Social engineering in Windows Explorer....
2007/10/26 All things being equal, your mom probably has an easier time with case insensitivity than not
2007/10/24 Getting the preserved case of a file wHeN tHe uSeRs tYpE wHaTeVeR tHe hElL tHeY wAnT tO