by Michael S. Kaplan, published on 2005/06/12 21:15 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/06/12/428429.aspx
It was a little over a month ago that I pointed out that Similar descriptions does not mean similar methodologies, and I spent a little time comparing many of the Win32, Shell, Shlwapi, CRT, and Kernel methods of doing case-insensitive comparisons. And of course some people looked at that topic and saw it as a proof that managed code is the way to go to avoid the confusion over which method to use.
But as I hinted at in this post, things are not so simple in the managed world that you can really count on all of that confusion going away. So you can think of this post as the manged version of that issue.
First there is the core method, the managed equivalent of CompareString, the CompareInfo class and its Compare method. One of the overrides for that method takes a CompareOptions enumeration member that lets you get at the gamut of insensitive operations for case, nonspacing mark, symbol, kana, width, etc., as well as getting to Ordinal (and as of Whidbey post Beta 2, OrdinalIgnoreCase). Since you can do it off of any culture, you have access to using the invariant culture as well.
Then there is the String.Compare method, whuch supports a subset of those operations, but some of the methods take a CultureInfo object and others take a StringComparison enumeration member (many of which give access to the same things a CultureInfo would like InvariantCulture or CurrentCulture, or an OrdinalIgnoreCase comparison.
Of course there is also a String.CompareOrdinal method which does the same thing that the StringComparison enumeration with the Ordinal comparison would do.
I would be remiss if I did not point out the String.ToUpper/String.ToUpperInvariant methods, especially since the first of them has an override that takes a CultureInfo which makes the second one not entirely necessary. Those extra invariant methods were added in Whidbey; I would not have strenuously objected if they had been taken out prior to shipping. :-)
There is also the new StringComparer class, which has some interesting remarks in it:
You might be confused about how to use the System.StringComparer properties because of a seeming contradiction. The value of each System.StringComparer property is a System.StringComparer object. However, the System.StringComparer class is declared abstract (MustInherit in Visual Basic), which means its members can only be invoked on an object of a class derived from the System.StringComparer class, but each property is declared static (Shared in Visual Basic), which means the property can be invoked without first creating a derived class. This appears to be a contradiction.
You reason you can call a System.StringComparer property directly is because each property actually returns an instance of an anonymous class that is derived from the System.StringComparer class. Consequently, the type of each property value is the base class of the anonymous class, not the type of the anonymous class itself.
I think I can parse that. But for what it is worth, a StringComparer (which includes properties to get at CurrentCulture, CurrentCultureIgnoreCase, InvariantCulture, InvariantCultureIgnoreCase, Ordinal, and OrdinalIgnoreCase flavors of itself), could also have been covered by a CompareInfo -- maybe we should have made CompareInfo inherit the IComparer and IEqualityComparer interfaces that the StringComparer brings to the mix? :-)
And every time something takes a CultureInfo for comparisons, it is actually pretty much using that CultureInfo's associated CompareInfo. Which you cannot pass in there most of the time, for reasons of type safety.
Ok, the above will hopefully free people of the illusion of simplicity in managed code. And I did not even get into all of the Hashcode providers, dictionaries, hash tables, and lists that would use these various comparison objects, all of which have to be created in particular ways. I will work on trying to sort some of these out another time, in another post.
For now I will point people at my post about the new string recommendations, and suggest that everyone take that one simple piece of advice I gave:
Use appropriate comparison methods.
Use appropriate comparison methods.
Use appropriate comparison methods.
Use appropriate comparison methods.
The easy (well, easier at least, I think) principles:
If you can follow those three rules, you will seldom if ever go wrong on using appropriate comparison methods.
This post brought to you by "¢" and "£" (U+00a2 and U+00a3, a.k.a. CENT SIGN and POUND SIGN)
(As the saying goes, in for a penny, in for a pound....)
# TheMuuj on 13 Jun 2005 6:35 PM:
# Michael S. Kaplan on 13 Jun 2005 6:51 PM:
# TheMuuj on 13 Jun 2005 9:36 PM:
# Anonymous on 5 Jul 2005 12:51 PM:
# Andy Bantly on 21 Jul 2005 11:23 AM:
# Michael S. Kaplan on 21 Jul 2005 11:40 AM:
Tanveer Badar on 21 Dec 2007 2:46 PM:
Each use of override is incorrect, it should be overload and use IStemmer for derivatives. :)
referenced by
2007/08/31 Elegant? Beyond compare...
2005/07/05 'Need more input, Stephanie!'