I know I said 'µ' but I didn't really mean 'µ'. I meant 'μ', you know?

by Michael S. Kaplan, published on 2012/04/25 16:02 +02:00, original URI: http://blogs.msdn.com/b/michkap/archive/2012/04/25/10297456.aspx


So I recently got an email:

We recently had a bug filed against our team because on a PS-PS machine we were unable to do a proper search with a greek character. It turned out that the issue was caused because some greek lowercase characters do not compare correctly against their uppercase counterparts (and vice versa). The issue is actually a .Net bug. The attached bug is specifically for a RegEx check but it also fails when using .Net’s String.Compare function.

Example:

‘µ’.ToUpper() = ‘Μ’

Theoretically we would then expect that these two characters should compare true against each other when you do “IgnoreCase”. However they do not.

Ah yes, this is something I had seen before.

They were looking for µ, aka U+00b5 aka MICRO SIGN.

And unhappy that regular expressions that were uppercasing the text couldn't find the character again later.

Of course they were assuming it was μ, aka U+03bc, aka GREEK SMALL LETTER MU.

Unfortunately, several factors conspire to make things not work:

Now on the whole, pseudo is pretty cool.

It lets you find bugs that you usually wouldn't find until much later during the development cycle.

it does have one downside though - one that makes pseudo pretty annoying.

When you substitute characters for kinda-lookalike characters with different properties and attributes, then you're going to get unexpected results sometimes....

Like this time!

 

1 - One can only speculate why the MICRO SIGN is treated so differently than other similar symbols, e.g. Ω (U+212a, aka OHM SIGN), K (U+212a aka KELVIN SIGN) and Å (U+212b, aka ANGSTROM SIGN). I only know that it has always been done this way. There is one workaround for those troubled by the discontinuity: Unicode normalization....


comments not archived

go to newer or older post, or back to index or month or day