I know I said 'µ' but I didn't really mean 'µ'. I meant 'μ', you know?

by Michael S. Kaplan, published on 2012/04/25 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2012/04/25/10297456.aspx


So I recently got an email:

We recently had a bug filed against our team because on a PS-PS machine we were unable to do a proper search with a greek character. It turned out that the issue was caused because some greek lowercase characters do not compare correctly against their uppercase counterparts (and vice versa). The issue is actually a .Net bug. The attached bug is specifically for a RegEx check but it also fails when using .Net’s String.Compare function.

Example:

‘µ’.ToUpper() = ‘Μ’

Theoretically we would then expect that these two characters should compare true against each other when you do “IgnoreCase”. However they do not.

Ah yes, this is something I had seen before.

They were looking for µ, aka U+00b5 aka MICRO SIGN.

And unhappy that regular expressions that were uppercasing the text couldn't find the character again later.

Of course they were assuming it was μ, aka U+03bc, aka GREEK SMALL LETTER MU.

Unfortunately, several factors conspire to make things not work:

Now on the whole, pseudo is pretty cool.

It lets you find bugs that you usually wouldn't find until much later during the development cycle.

it does have one downside though - one that makes pseudo pretty annoying.

When you substitute characters for kinda-lookalike characters with different properties and attributes, then you're going to get unexpected results sometimes....

Like this time!

 

1 - One can only speculate why the MICRO SIGN is treated so differently than other similar symbols, e.g. Ω (U+2126, aka OHM SIGN), K (U+212a aka KELVIN SIGN) and Å (U+212b, aka ANGSTROM SIGN). I only know that it has always been done this way. There is one workaround for those troubled by the discontinuity: Unicode normalization....


Joshua on 26 Apr 2012 11:10 AM:

And those two are actually different glyphs on my screen.

Simon on 26 Apr 2012 2:07 PM:

This calls for a corny friction joke!

If you toss two kittens on the roof, which will come down first?

The one with the smaller μ.

Jochen on 22 Jan 2013 3:39 AM:

It seems as if part of the .NET string comparision (nlssorting.dll) went into the Windows 8 / Windows 2012 kernel. Is a potential bug fixed in .NET not making it's way to the new OS? We found that ::CompareStringEx(LOCALE_NAME_INVARIANT, NORM_LINGUISTIC_CASING, L"aä", 2, L"Aä", 2, NULL, NULL, NULL) comes out to be CSTR_LESS_THAN (1) under Windows 2008 R2 but reports CSTR_EQUAL (2) under the newest OS.

Feels strange!


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day