What it means to be case insensitive

by Michael S. Kaplan, published on 2005/06/16 02:30 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/06/16/429667.aspx


Brad Eck asked (in the microsoft.public.dotnet.internationalization newsgroup):

I am calling the following code in a button click event. Every click causes
the result set to 'toggle'. In otherwords, the original click will display
'...,lIMES,LIMES,limes,...' and then the next click will display
'...,limes,LIMES,lIMES,...'. Any ideas as to why?


string [] myValues = {"strawberries", "PEARS", "lIMES", "LIMES", "lIMES",
"Apple",
"Æpple", "BERRIES", "grapes", "olives", "cantaloupe"};

Array.Sort(myValues, new CaseInsensitiveComparer(Application.CurrentCulture));
string output = "";
for (int itemCount = 0; itemCount < myValues.Length; ++itemCount)
output += myValues[itemCount]+",";
MessageBox.Show(output);

Lets take a look at the sort keys for those three strings, with and without the case insensitive comparer:

Case Sensitive

limes 0E 48 0E 32 0E 51 0E 21 0E 91 01 01 01 01 00
lIMES 0E 48 0E 32 0E 51 0E 21 0E 91 01 01 02 12 12 12 12 01 01 00
LIMES 0E 48 0E 32 0E 51 0E 21 0E 91 01 01 12 12 12 12 12 01 01 00

Case Insensitive

limes 0E 48 0E 32 0E 51 0E 21 0E 91 01 01 01 01 00
lIMES 0E 48 0E 32 0E 51 0E 21 0E 91 01 01 01 01 00
LIMES 0E 48 0E 32 0E 51 0E 21 0E 91 01 01 01 01 00

Suddenly, it is all clear as to why the strings might be moving around with each click -- because when they are case-insensitive, they are all equal! And when they are equal, you are at the mercy of the sorting algorithm, whatever it may be. For example in my experience, quicksorts I have written have always been very good at (for example) swapping the second and third items if you put these three in a list, and other sort algorithms will do other things in this case. But the case sensitive sort will be deterministic and will always sort in the order I give above, though all three strings will always be together.

This is of course not true of all sorting algorithms, but the quicksort, while great for most cases, is not the best possible sort for when strings are already properly ordered in scenarios like this one (it is fair to say that strings being equal are already sorted!). Your mileage will vary even with slightly different quicksort imeplementations, let alone with other types of sorts....

 

This post brought to you by "L" (U+004c, a.k.a. LATIN CAPITAL LETTER L)

 


# Maurits [MSFT] on 16 Jun 2005 2:33 AM:

Any sort can be made into a stable sort...
Break ties by comparing the initial positions of the tied elements.

# Michael S. Kaplan on 16 Jun 2005 2:46 AM:

I don't disagree -- but most implementations are not stable in that sense....

Which was the cause of the problem neing reported....

# TheMuuj on 16 Jun 2005 4:09 AM:

For sorting, wouldn't a culture-aware, case-sensitive comparer be good enough? Wouldn't "ABCD" sort next to "abcd", unlike in an Ordinal comparison? I mean, with the exception that there *is* a defined difference between the two cases (the result != 0).

In other words, aren't case-insensitive comparisons only useful for equality comparison?

One possible workaround comes to mind. You could write a comparer with a definite sorting order (one that never returns 0). If you are sorting objects by a .Name property, and there are two names with the *exact* same .Name, you could always fall back on the object's hash code or something just to get some kind of predictable result.

But I'm really surprised that there isn't a version of Array.Sort in the framework that is stable. That's a pretty big hole, if you ask me.

# Michael S. Kaplan on 16 Jun 2005 7:54 AM:

Well, I am not saying that there isn't a stable sort in there somewhere -- there might be. I was just explaining what he was seeing...

But I am a huge fan of using case- (and everything else-) sensitive sorts whenever possible, because they find bugs. Especially in SQL Server but really everywhere. Mishmashing stuff together hides bugs and sometimes (like when dealing with Korean language stuff, and Indic languages, and others) even cause new bugs of their own.

Erica on 24 Dec 2008 9:19 AM:

Does anyone know how to explain case sensitive for the game NHL 2006? Im trying to explain it to my brother, but Its hard to explain, my definition is a bit wierd for that game, but it is a definition. Can someone tell me? My e-mail is espinisgreat1@yahoo.com . Someone please e-mail me soon! Thanks


referenced by

2006/03/02 CompareString ignores case by lowercasing....

go to newer or older post, or back to index or month or day