Compare ≠ Equals, Invariant ≠ Ordinal, readers of this blog ≠ everyone else, and so on....

by Michael S. Kaplan, published on 2011/05/06 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/05/06/10161793.aspx

Sometimes the questions seem like they never actually change, mainly because all the new people often find themselves saying the same things the formerly new people did.

I just noticed that it seems like people have had a variety of different preferred methods for checking string equality. I was wondering if anyone knows if there are any measurable performance or memory differences between the different ways of doing case insensitive string equality checks.

    stringA.ToLowerInvariant() == stringB.ToLowerInvariant();
    stringA.ToUpperInvariant() == stringB.ToUpperInvariant();
    string.Equals(stringA, stringB, StringComparison.OrdinalIgnoreCase);
    string.Compare(stringA, stringB, StringComparison.OrdinalIgnoreCase) == 0;
    stringA.Equals(stringB, StringComparison.OrdinalIgnoreCase);

Now many people pointed out that perhaps the question was being asked the wrong way around.

After all, assuming all five methods are doing exactly the same thing would be a big mistake, since they aren't.

And obviously it is much more important for one to first get the right answer; once that happens then if there are performance issues one can start searching for the fast answer.

It felt good to see such responses since I remember when people didn't always jump in here as fast. It's nice to know I was a little bit of a part of that....

Do we have any guarantee that all five of those techniques give the same results when given the same input strings? I see no reason why all five of those ought to give the same answers. Are they documented as doing so? I don’t believe they are.

If not, then the question you should be asking first is “which is correct?” not “which is faster?”

The next thing I’d consider before thinking about which is faster is “which uses the right tool for the job?”

Consider the third and fourth options, for example. Clearly the purpose of “equals” is to determine equality. Clearly the purpose of “compare” is to determine order. You’ve stated that you are seeking equality. Why then would you ever avoid using a method clearly documented as doing what you want, that is easy to read and understand the semantics of the code, and instead use a method designed to do some slightly different task? When I see code that has “Compare” in it, I assume that it is being used for determining order, not equality.

Once you know which is correct, and once the code is written clearly, then you can easily determine which is faster in the common scenarios you face by using a profiler against realistic data. Remember, different string algorithms may be optimized for different scenarios.

The reason I found his "for example" to be so interesting is that the notion that Equals is not the same as Compare and the fact that virtually all of the developers and testers I talked to over the last few years did not come to the same conclusions.

In fact, it of often reported as a bug when the two techniques return different results.

Furthermore, even very experienced developers often use the wrong function, either for this issue of the lexicographic vs, linguistic issue, the invariant vs. ordinal issue, or the invariant vs. culture-specific issue.

And the astounding number of times that even old-timers will not only use the wrong results but will state equalities between these various different techniques that have consequences on correctness, performance, or security!

Simple searches in this blog will reveal several dozen articles on various aspects of those points.

The tendency of many developers to assume they are right is something we take for granted, and it only seems incongruous when they are wrong....

Just recently I was in a heated email conversation that was almost tending toward violence with someone's insistence that these differences did not exist, than these issues weren't issues.

For what it's worth, that person is now a regular reader who has asked many interesting follow-up questions by email. I think we both will end up a lot smrter at the other end of this "collaboration"....

I knew Eric was smart before this. So it wasn't specifically him saying these things that caught my interest, exactly.

But now I find myself hoping that everyone can pick up this knowledge too, and that these questions will eventually seem as intuitive as the "easier" questions seem to be now.

And like Dogbert's New Ruling Class, my readers will have a head start.... :-)

It is funny that you bring the subject up. In my native language (dutch), there is no verb I can think of that has the same meaning as 'to equal'. Rather, we use 'is the same as'. However, 'to compare' does have its own verb in 'vergelijken'.

But, if I am to discuss the expression (a == b) in casual speech, I would likely use the same sort of speech I would use for (a < b) or (a > b): 'you would do that by comparing a to b in an if-statement.' Sure, it is a very trivial example, but I feel the similarity exists even in English. :) Another example that works for both language is the concept of product and price comparison websites - no such concept of order there for the user.

Either way, this brings along the interesting fact that 'vergelijken' has very little of an ordering context to it, and far more of an 'equalizing' one. As such, what may be common sense to the former newbies such as yourself and the esteemed Eric Lippert, may not at all be that to others: even if they are as fluent in English as I am. (And although I'd rather not go into the embarrasing subject: I find myself using english to dutch dictionaries when a dutch word escapes me. Go figure.)

[quote]Furthermore, even very experienced developers often use the wrong function, either for this issue of the lexicographic vs, linguistic issue, the invariant vs. ordinal issue, or the invariant vs. culture-specific issue.[/quote]

So, while I am sure you may have discussed it prior on this blog (and I apologize for not coming across it when I looked), I think I and others might benefit from a down-to-earth description of the differences in this list of things that do not equal eachother. (Do not compare sounds soo much better in that last sentence, don't you think?) There is MSDN I could look at but in my experience this is not the sort of thing it excels at explaining. Besides, you are the perfect person given your experience with languages, character sets and encodings; an expert of your domain - whereas documentation writers usually tend to be the sort that need to know a little bit of everything and as such do not always do the subject justice.

I hope you'll keep on blogging for a long time - you are a treasure trove of interesting information. :)

Okay as a silent reader of this blog, can someone relegate me to some post about compare(a,b)==0 != equals(a,b)? Because I personally always thought of that as a quite logical invariant (and made sure to not violate it in my own code) and don't see a reason where this wouldn't work out correctly.

@JW: Quite similar to German. "vergleichen" in my opinion also doesn't necessarily carry an ordering context with it. Those implicit assumptions about word connotations are often quite surprising for non-native speakers.

Your distinction between Compare and Equals seems a bit of a stretch to me (as a native en_US speaker). Compare is what you do, equals is one of the possible results of the comparison (the others being less than, greater than, and possibly null or error).

Consider std::set. We're talking about equality there, because it has to determine the uniqueness of the elements, but the interface is defined in terms of comparison (specifically less-than comparison) rather than equality. Perhaps a good topic for a future blog would be how to write a locale-dependant, case-insensitive comparator for std::set.