Similar descriptions does not mean similar methodologies

by Michael S. Kaplan, published on 2005/05/08 13:30 -07:00, original URI: http://blogs.msdn.com/michkap/archive/2005/05/08/415522.aspx


The other day, I had to take a look at the various unmanaged case insensitive string comparison functions. I thought I would post what the comparison/contrast information.

First the locale sensitive functions:

And now the locale insensitive functions:

A few interesting points about these functions:

1) According to comments in the SHLWAPI source, many of them were initially added because the CRT and user32 counterparts were not supported on earlier versions of Win9x. Kind of ironic when you note the small behavior differences between them all, huh?

2) Given the Georgian casing issue, it is a little sad that almost all of these functions that convert prior to comparison use a lowercasing operation when so much of the core OS uses uppercasing. Especially given how often people use the functions to emulate the OS behavior for tidier validation messages. Luckily, the amount of data in Khutsuri is small so the inconsistency is not often noticed.

3) Am I the only person who thinks it is weird that _wcsicmp and _wcsnicmp have locale-specific behaviors, especially such really weird ones? They doc this a bit I guess, but until I looked at the code I would never have guessed.

4) CompareStringW is definitely the king of the linguistic comparison -- everyone else is either (a) calling our function, (b) doing the job wrong, or (c) both!

Now there is no king (nor good heir apparent) for the non-linguistic comparison right now in unmanaged code, like I talk about here.

Yes, I am still thinking about it. :-) 

The situation is kind of like when you have a vacancy in management and a lot of "wannabe" replacements (like these other functions), none of whom really fit the bill and none of whom can get the job done themselves. If you know what I mean....

 

This post brought to you by "ς" (U+03c2, a.k.a. GREEK SMALL LETTER FINAL SIGMA)


# Sriram on Sunday, May 08, 2005 2:40 PM:

All for the sake of comparing 2 strings. Whatever happened to good old strcmp? :-)

# Michael S. Kaplan on Sunday, May 08, 2005 2:51 PM:

Ah, remember my criteria -- Unicode, case insensitive. The strcmp function (intrinsic or CRT) is none of those. :-)

# Michael S. Kaplan on Sunday, May 08, 2005 3:37 PM:

Don't worry, I'll point out the explosion of methods and overrides in managed code soon. I hinted at them in http://blogs.msdn.com/michkap/archive/2005/04/14/408116.aspx

:-)

# Dean Harding on Sunday, May 08, 2005 7:04 PM:

Well, to be honest, I prefer lots of overloads to lots of differently-named functions. At least with overloads you can look in the same place for all the documentation whereas with differently-named functions, you've got to rely on the documentation to include pointers to all the other possible variants.

Still, one function that can do it all would be best of all, even if I have to write my own little wrappers for my own special cases. At least then I can follow my own standards, rather than trying to remember the difference between RtlCompareUnicodeString, StrCmpNIW and lstrcmpiW for example...

# Michael S. Kaplan on Sunday, May 08, 2005 7:17 PM:

Well, me too.

But I prefer fewer functions with fewer overrides best of all -- with lots of intuitive enumerations, which intellisense also help with....

# Someone passing by on Monday, May 09, 2005 5:56 PM:

# StrCmpLogicalW (shlwapi.dll) -- does linguistic comparisons using the thread locale (falling back to the system locale on failure), cleverly wrapping multiple calls to CompareStringW to upport treating the 0123456789 digits as numbers.
^

to support;)

# Michael S. Kaplan on Monday, May 09, 2005 9:26 PM:

Good catch -- fixed now. :-)

# Nazgul on Tuesday, July 05, 2005 9:16 AM:

Hi. I'm trying to use CompareStringW to compare some WideStrings and I need to compare them case-sensitively. However, I always got then compared case-insensitively. I did NOT set the "NORM_IGNORECASE" flag on.
So, when I sort strings "France", "Portugal" and "other", I want the result to be either

France
Portugal
other

or

other
France
Portugal

but what I get is

France
other
Portugal

cuz when I compare "France" and "Portugal", the result is 1 (this is correct), comparing "other" and "Portugal" gives 1 (that's correct, too), but comparing "France" and "other" also gives 1 (incorrect, should be 3).
It's interesting that whene I call CompareStringW on "portugal" and "Portugal" the result I get is not 2, but 1. It looks like this function does case-insensitive comparison, and only if the compared strings don't differ (case-insensitive) it looks on the case.
Is there a way to make the CompareStringW function not ignore the case?
I am using locale MAKELCID(MAKELANGID(LANG_CZECH, SUBLANG_DEFAULT), SORT_DEFAULT), but it behaves exactly in the same way even if I set it to MAKELCID(MAKELANGID(LANG_ENGLISH, SUBLANG_DEFAULT), SORT_DEFAULT).

# Michael S. Kaplan on Tuesday, July 05, 2005 10:32 AM:

Hi Nazgul, See my post 'What it means to be case insensitive' at <A HREF="/michkap/archive/2005/06/16/429667.aspx">http://blogs.msdn.com/michkap/archive/2005/06/16/429667.aspx</A> to understand what is meant here. There is no NLS function that does what you want here, and it would certainly not be an 'ignore case' since that is the opposite of what you are doing -- you are not only *not* ignoring case, you are going out o you way to pay attention to it in non-intuitive ways! :-)

referenced by

2006/06/16 Neither GDI nor Uniscribe solve the ultimate font problem completely, either

2006/03/15 Casing and IgnoreCase are still not the same thing....

2005/06/16 More on locales in SQL Server

2005/06/12 Browsing the shoals of managed string comparisons

2005/06/02 The New String recommendations

go to newer or older post, or back to index or month or day