Why I am not in favor of a FileSystemInfo.Compare method

by Michael S. Kaplan, published on 2006/04/24 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/04/24/581628.aspx


Back in January of this year, reader J. Daniel Smith asked:

Did I miss your other blog entry on this topic?
-----
re: Comparing Unicode file names the right way

Tuesday, October 18, 2005 3:51 PM by Michael S. Kaplan
Ah, the reasons I am resistant to *that* particular path are the subject of another blog entry, coming soon! :-)

My sense of temporal logistics is pretty skewed compared to everyone else's. :-)

He was referring to this post, which was from last October, and the question that he was asking:

In <3 weeks, there's not going to be the big gulf between managed & unmanged code (I'm talking about VS2005 & C++/CLI).

So if you show us the unofficial 100% correct "static int Compare(FileSystemInfo a, FileSystemInfo b)" in managed code, it will soon be fairly straight-forward to use it from unmanaged code.

(Of course, this assumes that you're OK doing such a thing...there may be other reasons for staying completely in unmanaged code).

Yes, I realize Whidbey is frozen harder than the Antarctic ice pack…I thought the "static" function taking two arguments made things clear; I guess not. Sorry.

Show us the exact C# code for your FileNameCompare() utility function; that way there can be no confusion as to the proper technique. My preference would be to take stronger-typed FileSystemInfo parameters (rather than just strings)…and also to indicate that eventually such code should perhaps be part of that class.

I probably would have taken after Triumph the Insult Comic Dog and said that Whidbey was frozen colder than an Iditarod huky's nutsack, but that is neither here nor there, he was right in any case. :-)

The reason why I am troubled about the development architecture path that leads to a

static int Compare(FileSystemInfo a, FileSystemInfo b)

is that the FileSystemInfo class is of course just a base class for the FileInfo and DirectoryInfo classes, which contains a lot of information:

Given the level of complexity inherent in these objects, the exact definition of a meaningful comparison between two FileSystemInfo objects is far from clear -- in any given situation, a developer could clearly be expecting it to mean any of these things, and not all of those varied and sometimes conflicting definitions are very farfetched!

However, the fact that the OrdinalIgnoreCase semantic is a 100% match for what many of the methods and functions that make use of the NTFS/FAT filesystems it itself clear (to anyone who reads this blog? <grin>), and it really is comparing the file names (which are in fact strings), not FileSystemInfo objects, which are something else entirely.

Which is not to say that no one would ever add such a method; it happens all the time over in the Windows Shell -- it is why the Shell Lightweight API has so many wrappers around our stuff!

But this leasd (unfortunately) to another, more insidious, problem:

And there are theoretical benefits to such a method if the various underlying filesystems exposed their actual comparison semantics such that one could wrap up the case sensitive nature of CDFS and the case insensitive nature of NTFS/FAT32/FAT and all of the other various differences.

The problem there is that the information is simply not exposed and until it is, the only way to do such tests with 100% accuracy is to create the file, which is not always possible for a myruad of reasons related to permissions and media.

And what happens when you have multiple media forms connected in a single path via symbolic links? In such cases support in the underlying file systems is crucial -- since there is NO ONE TEST that would suffice.

I had high hopes that I could convince Kevin Phaup to require such a comparison method when he was still at Microsoft and that he could convince the people in charge of the various drivers. But now he is no longer at Microsoft and there is no one on that side of the company who I can say I have gone to a party with, or seen their house. Which for some reasons dashes my hopes a bit.... :-(

But anyway, these are but a few of the reasons why the direction of a FileSystemInfo wrapper to handle comparison is something that I am highly resistant to.

 

This post brought to you by "𐃺" (U+100fa, a.k.a. LINEAR B IDEOGRAM VESSEL B305)


# J. Daniel Smith on 27 Apr 2006 1:56 PM:

To some extent, strings have become what ints were in C.  In C, everything was an "int": bools, pointers, enums, etc. (yes, I'm exaggerating some).

We now use strings the same way: file system paths, a person's name, URLs, guids, etc.  Some parts of the BCL recognize that some strings have additional semantics and provide classes like System.Guid and System.Uri.

As another example, a string which represent's a person's name should probably always compare case-insensitive and titles might be ignored: "Mr. Dan Smith" is often the same as "DAN SMITH".

Given that, I would suggest that FileInfo (or FileSystemInfo) can be used as a more strongly typed string.  Although, as you mention, there's more to FileSystemInfo than System.Guid.

# Michael S. Kaplan on 8 May 2006 10:27 AM:

Well, think of it another way. Almost everyone would expect (or at least want!) the following paths to be considered equal in such a method:

\\machine\share\directory\filename.ext
\\machine\d$\directory\filename.ext
d:\directory\filename.ext

Though of course the filenames are not equal....

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day