Being [case] insensitive without being inconsistent?

by Michael S. Kaplan, published on 2005/03/23 12:51 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/03/23/401060.aspx


Bob Berger asked, in the suggestion box:

CaseInsensitiveHashCodeProvider and CaseInsensitiveComparer are not synchronized, despite what the documentation claims. The problem is that the CaseInsensitiveHashCodeProvider does not take equivalent character sequences into account, whereas the CaseInsensitiveComparer does.

For example, the character “Á” can be represented by a single code point, 0x00C1, and the character “á” can be represented by two code points, 0x0061 0x0301. Consequently, the expression

CaseInsensitiveComparer.DefaultInvariant.Compare("\u00C1", "\u0061\u0301")

returns zero. However, the expressions

CaseInsensitiveHashCodeProvider.DefaultInvariant.GetHashCode("\u00C1")
CaseInsensitiveHashCodeProvider.DefaultInvariant.GetHashCode("\u0061\u0301")

return 177508 and 5861733, respectively.

(I did some minor edits to make sure the code would work and that it matched the intent of the message)

I think there may be a little confusion on this point, because the hash code as returned by the GetHashCode() method on the CaseInsensitiveHashCodeProvider is not returning a number that is based on trying to create a culturally sesitive number that will meet the guidelines of returning equal values if Compare() would consider them to be equal (that functionality did not even exist until Whidbey, a fact that I know because I wrote it for Whidbey!). The actual hash table that uses the provider (as shown in the help examples) do show the behavior of the two being in lockstep when things work....

 

This post brought to you by "Х" (U+0425, CYRILLIC CAPITAL LETTER HA)


# AT on 23 Mar 2005 1:10 PM:

Why you did link to this source at http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=FDBK23107 (
Opened Date: 2005-03-22 13:21:01 )

# Michael Kaplan on 23 Mar 2005 3:47 PM:

:-)

I was unaware this existed. Of course the code does not work as it stands in this bug....

# mihailik on 24 Mar 2005 3:27 AM:

Are you plan to fix this behavior? It can produce hard-debuggable errors.

# Michael Kaplan on 24 Mar 2005 6:45 AM:

It is on the list and actually *can* be fixed now (since there is way to get such a hash code), so the odds are good....

# Bob Burger on 24 Mar 2005 9:54 AM:

Similarly, performing case-sensitive comparisons using Comparer.Default or Comparer.DefaultInvariant and the default hash-code provider (the GetHashCode method) results in the same problem.

I don't see a CaseSensitiveHashCodeProvider in the .NET framework, but it would be useful if strings equivalent from the Comparer.Default point of view are to be used in a hash table.

# Michael Kaplan on 24 Mar 2005 10:02 AM:

You may want to add that as a comment for the MSDn Feedback Center issue? :-)

# Bob Burger on 24 Mar 2005 10:02 AM:

It's also interesting to note that System.Collections.Specialized.HybridDictionary gets around this bug by using its own private comparer, SymbolEqualComparer, that does not take equivalent character sequences into account.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day