Intuitivosity (intuitivality?) of case insensitivity

by Michael S. Kaplan, published on 2006/06/04 08:52 -07:00, original URI: http://blogs.msdn.com/michkap/archive/2006/06/04/616904.aspx


Over on Language Log, Geoffrey K. Pullum notes in ASCII, Mac OS-X, and the 128 names of DormAid an issue that folks running on Windows have a very good understanding of -- a case insensitive file system in the sense of identification and validation that is case sensitive in regard to its name storage.

So that things will be retained in the same form you saved them in, even though all of the variants amount to the same file.

A few edges Windows [apparently] has here:

Now this behavior certainly can cause headaches, especially for the more technically minded, the developers, the people coming from UNIX/Linux, and others.

But if you tell the average person that file.txt and FILE.TXT should represent two different things, they would have had trouble with the idea even before Windows took this natural user expectation and made it such a strong conceptual building block of most of its object and file namespaces.

In any case, it is nice to see that other platforms make similar efforts in the name of usability, although the dangers of building such efforts on top of a platform that does not recognize such an approach is fraught with problems, as Geoff's post points out....

 

This post brought to you by Ϧ and ϧ (U+03e6 and U+03e7, a.k.a. COPTIC CAPITAL LETTER KHEI and COPTIC SMALL LETTER KHEI)


# Geoff Wozniak on Sunday, June 04, 2006 1:20 PM:

I moved to OS X last year and I didn't even notice that the file system was case insensitive.  It's probably because I now give longer and more descriptive names to files.

# Rosyna on Sunday, June 04, 2006 3:34 PM:

often, the only people fighting for Case-sensitivity are long time linux/unix/bsd users that grew up with it and expect it. It is *not* the best for users and is fact generally a huge usability issue to allow two file names in the same directory with the same name that differ only in case.

In fact, the entire reason the aforementioned Operating Systems support case-sensitivity is due to laziness. It wasn't an active "feature". It's far easier to sort based on the ASCII value of a character than it is to do case-insensitive compares on everything, especially unicode file names.

And both of these are very documented. See http://developer.apple.com/technotes/tn/tn1150.html#HFSPlusNames and http://developer.apple.com/technotes/tn/tn1150.html#UnicodeSubtleties

I thought FATNN wasn't case-preserving and required a different attribute for the cased name?

# Rosyna on Sunday, June 04, 2006 3:37 PM:

Oh yeah, given the example in the HFS Plus technote of é which can be both u+00E9 and u+0065,u+0301, I wonder if these volume formats that have no concept of case-insensitivity will allow two files named filé and filé in the same directory depending on the API that created the file names?

# Gabe on Sunday, June 04, 2006 4:54 PM:

Aside from Unix and its brethren, are there any other operating systems that would consider DormAid and DorMaid to be distinct files?

# Sebastian Redl on Sunday, June 04, 2006 6:39 PM:

Well, if we're being historical, I might point out that case-insensitivity is partially a heritage of DOS FAT, which stored all file names in upper case, because the character set for them was so limited.

On the flipside of the medal, Windows' case insensitivity sometimes causes problems when programs or programmers assume the environment is guaranteed to be case insensitive.
Case in point: a very large percentage of Windows C/C++ programs is probably written to include <windows.h>. MFC, at least, uses this casing. The Platform SDK, interestingly enough, does not: it's actually Windows.h.
Chances of working on a case-sensitive system: 0. (Not that windows.h will be used much outside Windows. WINE names its file windows.h. But that in turn may cause problems, when an overeager programmer decides to be "correct" and includes <Windows.h>.)

# Dean Harding on Sunday, June 04, 2006 8:18 PM:

I thought the case-insenstive nature of Windows was more because DOS was all-uppercase (and converted all filenames to uppercase before doing any operations on them) so making Windows case-insenstive was more of a back-compat thing than any concious usability decision (the fact that it is also more user-friendly is a happy coincidence).

Anyway, I was bitten by the Linux-case-senstive stuff when developing an ASP.NET app. We tried to port it to mono to see if Linux+Apache+Mono performed any better than ASP.NET+IIS, but it was just going to take too long for us to convert our code to be case-senstive that we didn't bother. It's surprising how often you reference filenames in an ASP.NET application (in Response.Redirect calls, in Control.LoadControl calls, in <@ Reference > tags, etc, etc)

I mean, case-senstivity is fine for a programming language where you have Intellisense to help you get the names right (though I wouldn't complain if C# had been case-insenstive either) but in a file system, you go no such helpers...

# Michael S. Kaplan on Sunday, June 04, 2006 8:52 PM:

Hi Dean,

For short file names, sure - but LFNs went through explicit design decisions on how case sensitivity would work, both for FAT and for NTFS.

I agree with you that tools should work as well as CMD does on filenames where you can autocomplete them -- that would be a very cool extension to Intellisense if you ask me.....

# Rosyna on Monday, June 05, 2006 1:05 AM:

Not sure if it is relevant, but we're keeping statistics on user's volume formats for the very support reason that Dean mentions. To judge if any issues come up if they are worth fixing our just telling the user it isn't a supported configuration. Oddly, a much higher percentage than I would have believed are using a case-sensitive FS. I wonder if this small percentage even knows what it means or if they're just doing it "Because it has more words, therefore it must be better!"

http://www.unsanity.com/goto/swup/

# Nick Lamb on Monday, June 05, 2006 3:10 AM:

Ignoring for a moment the issue of whether it's more or less intuitive to confuse JanesExports with JaneSexPorts, the larger problem is that "are those two names the same?" is a locale question, and thus not amenable to a solution that is both consistent and pleasing for all your users around the planet, or perhaps even those sharing a single machine or (networked?) filesystem.

If you choose an arbitrary system-wide locale, or try to synthesise one, which thus has rules some of your users will be surprised by, you now have a complicated system that's no more intuitive than doing nothing.

The only place we stand a chance is in the user interface, and sure enough Windows and Unix both attempt this, providing a full case sensitive filesystem (yes NTFS is case sensitive) and offering a GUI that has locale based sorting and searching that understand e.g. "Church.xls" is next to "church.doc" and "CHURCH.ZIP" in US English.

Obviously (as previous posts indicate) I think Windows does a poor job of delivering this, but on an NTFS system the essential design is the same as Unix. The core filesystem is agnostic, and the UI tries to deliver what the user expects. The "insensitive" OS X filesystem is a kludge left over from Classic Mac OS and should be abolished sooner rather than later.

# Michael S. Kaplan on Monday, June 05, 2006 3:25 AM:

Actually Nick, you are mistaken about the locale side of things here -- the case sensitivity of NTFS is entirely independant of locale -- for very good resons.

Windows does a much better job here overall, with just a few exceptions:

1) No normalization support -- for most customers mitigated by the lack of Form D input languages.

2) casing tables out of date -- fixed in Vista.

Most users DO expect case insensitivity here. Anyone who does not want it is definitely in the minority, not the majority....

# Mike Dunn on Monday, June 05, 2006 4:04 AM:

From the linked blog post:
<quote>
Don Porges tells me you get the same behavior doing DOS commands on a Windows system. I'm not really surprised.... I would have expected Windows to faithfully replicate every feature of other operating systems that really sucks, and then add stupid features of its own, plus bugs. And apparently that's right.
</quote>

This sounds like the author thinks DOS, its file systems, and its case-insensitivity copied (and thus were written after) OS X.

# Rosyna on Monday, June 05, 2006 4:08 AM:

Nick Lamb, you fell into a common logic flaw proposed by people that attempt to advocate case-sensitivity. Locale matters not. At all. Unless sentences, paragraphs, and articles, file names have sort names with no mixing context.

A common example given is German in which case does matter for sentences. But it doesn't matter if you have two file names since you'd never confuse the two words in the file system so Locale doesn't come into play.

Case-sensitivity exists as legacy from lazy developers that didn't want to bother adding case-sensitive compares to their sorting algorithms. And now this laziness is labeled as a "feature" from many long time unix users.

The part about extensions differing is simply a legacy kludge from DOS where the extension was "separate" from the name. metadata should never dictate the format of the data. And it's also a gui mistake, in my opinion, to allow hidden extensions when two files in a directory are named church.xls and church.doc. The GUI should automatically unhide extensions for those files.

# Rosyna on Monday, June 05, 2006 4:09 AM:

Mike Dunn, as stated here before somewhere, DOS didn't actually support case at the FS level, so it's hard to say it was case-insensitive since everything was the same case.

# Maurits on Tuesday, June 06, 2006 7:17 PM:

I'm guessing that it's not OS X which is case-insensitive, but rather the HFS+ file system.

http://en.wikipedia.org/wiki/HFS+
... yup.  HFS+ was case-insensitive until 10.3

referenced by

2007/09/05 Head checks containing either comparison or case validation BITE

2006/06/05 More on case insensitivity and its intuitivality

go to newer or older post, or back to index or month or day