When methods use collation to 'disturb the peace' we charge them with being 'out of sorts'

by Michael S. Kaplan, published on 2007/04/10 08:05 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/04/10/2071471.aspx


You know how I talk about best practices here sometimes? And the worst ways to misuse various globalization/internationalization methods and functions?

Well, believe it or not, sometimes even the Microsoft code gets it wrong.

(Gasp!)

Like the other day, when Balsu asked:

Hi

When I call System.Messaging.MessageQueue.Exists("ઽEFGH"), I am getting an ArgumentException saying ‘PathSynatx is invalid’. Looking in to the MessageQueue.Exists() method, I am finding that String.LastIndexOf method is causing the issue.

int index1 = ".\\PRIVATE$\\ઽEFGH".LastIndexOf("\\PRIVATE$\\",StringComparison.CurrentCultureIgnoreCase);
int index2 = ".\\PRIVATE$\\EFGH".LastIndexOf("\\PRIVATE$\\", StringComparison.CurrentCultureIgnoreCase);

When I execute the above statements in en-US culture, I am getting index1 as -1  where it should be 1.

I am getting index2 as 1 as expected.

Has anyone faced similar issue? Any work around?

The problem in the example can be traced to that , which is a U+0abd (a.k.a. GUJARATI SIGN AVAGRAHA). Which it just so happens that in linguistic comparisons is treated as a combining character given its tendency to make the previous character a little bit heavier.

Regular readers might immediately spot the problem here, and immediately remember Put in on my Tab, please from last September. With the only real difference being that in this case the example is building up on a REVERSE SOLIDUS instead of a TAB.

In both cases, a character that is not really a combining character is treated as if it were one in collation in order to get a specific linguistically appropriate result. Which is really not a bug, though it does end up causing one.

(I'll talk more about THAT issue another day!)

If we stay focused and try to figure out in a bit of Root Cause Analysis the reason for the problem in System.Messaging.MessageQueue.Exists, we are just getting started....

There is of course the fact that Collation != Case (a.k.a. Collation <> Case).

And the misuse of CurrentCulture, of course, since one would never want the behavior to change based on user settings.

But even more important is the fact that when one is dealing with the file system as they are here, one should never be using a linguistic comparison method (something I first pointed out back in Comparison confusion: INVARIANT vs. ORDINAL). This scenario simply screams for the use of OrdinalIgnoreCase, to help match the behavior of the file system.

So the fix here would be (in that String.LastIndexOf(String, StringComparison) call) to use StringComparison.OrdinalIgnoreCase, rather than StringComparison.CurrentCultureIgnoreCase....

 

This post brought to you by  (U+0abd, a.k.a. GUJARATI SIGN AVAGRAHA)


# Mihai on 10 Apr 2007 6:21 PM:

<<use of OrdinalIgnoreCase, to help match the behavior of the file system>>

Is this really the case?

Does OrdinalIgnoreCase use the NTFS case mapping table?

Because if it does not, then there is no way it will "match the behavior of the file system" :-)

And even if the NTFS table is user, then for what volume? What if my system volume is formated under Win 2003 but the data volume was formated under NT 4.0?

# Michael S. Kaplan on 10 Apr 2007 6:31 PM:

Well, it is either a 100% match, or a reboot away from a 100% match....

# Dean Harding on 10 Apr 2007 6:38 PM:

> So the fix here would be [... snip]

Not that that helps Balsu at all! :-) I guess all he can do is cut'n'paste the code from Reflector into his own file, fix the bug and use that version instead. At least, until a "real" fix is provided (but even then, he'd probably want to keep his cut'n'pasted version around in case he ever finds his code is running on an older version of the runtime).

# Michael S. Kaplan on 10 Apr 2007 7:02 PM:

Yep, it is sad to have to admit it -- but sometimes even Microsoft code has the odd bug in it. :-(

# Dean Harding on 10 Apr 2007 7:57 PM:

> Yep, it is sad to have to admit it -- but sometimes even Microsoft code has the odd bug in it. :-(

Sure, but at least with .NET, you can always work around the problem with a bit of Reflector'ing -- in the unmanaged world, its usually a lot harder ;)


referenced by

2007/04/11 Microsoft is not uncaron^H^Hing about the issue!

go to newer or older post, or back to index or month or day