How do I feel about lstrcmpi? I think it blows....

by Michael S. Kaplan, published on 2007/09/14 03:16 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/09/14/4900107.aspx


(Negative assessment word (blows) chosen via a magic eight ball and the info in this post

Benski asked:

Michael -

Ran into a weird problem in our product. I have no idea about what caused it, but I found the solution. In a lot of places in a cross-platform area of our code, we have a function that maps to lstrcmpiW on Win32, wcscasecmp on Unix, and some CFString functions on Mac OS X.

Changing lstrcmpiW back to wcsicmp resolved our problem. This particular part of the code is doing mostly string table lookups, so locale-aware comparisons were probably inappropriate (a better function to call might be CompareStringW with a "neutral" LCID. wcsicmp is probably just working because it's using the default "C" locale).

Any idea what actual caused the problem? The code is too massive for me to trace what's going on. I realize that you have no context to work from at all here :) But I thought maybe you'd have some insight on any "gotchas" that are particular to these locales but not others.

-Benski

Indeed, that old lstrcmpi function which has been around for a long time -- in the past (like in this post), I have praised the fact that it takes the position of making the default function that people are likely to call respect the user's defaults.

I have decided that I was (to put it quite simply) completely and utterly wrong.

Not because respecting user preferences isn't a good thing; it is, truly.

But the most common ways people use this function basically amount to cases like Benski's, where what was being looked for was a C-runtime like function. And since they get one with locale-specific behavior, they can run into problems.

You want to know why we never fixed the Turkic I bug by default in CompareString/lstrcmpi? Simple! Because to this day you can find code like this hanging around in our docs and unfortunately in our own internal codebase, far too often for me to be comfortable (example from the Getting the System Version (Windows) topic):

         if ( lstrcmpi( TEXT("WINNT"), szProductType) == 0 )
            printf( "Workstation " );
         if ( lstrcmpi( TEXT("LANMANNT"), szProductType) == 0 )
            printf( "Server " );
         if ( lstrcmpi( TEXT("SERVERNT"), szProductType) == 0 )
            printf( "Advanced Server " );
         printf( "%d.%d ", osvi.dwMajorVersion, osvi.dwMinorVersion );
      }

      // Display service pack (if any) and build number.
      if( osvi.dwMajorVersion == 4 &&
          lstrcmpi( osvi.szCSDVersion, TEXT("Service Pack 6") ) == 0 )
      {

Now the calls themselves may be fine here (though some of it isn't since who knows what will be gotten from the registry to fill in szProductType) but even if were okay in this case, it remains an example that will encourage people as they copy and paste solutions just propagates the behavior.

Now some people do misuse the invariant locale for what should be ordinal comparisons, but by and large the people who call lstrcmpi seem way more likely to be making a mistake. Given that, I think it easier to just say it blows and move on. :-)

Remember that when checking for equality, especially on an item like a registry value where OS semantics are involved, the best answer is CompareStringOrdinal, with a fallback to RtlCompareUnicodeString or even better RtlEqualUnicodeString or if you absolutely must wcsicmp (with awareness that there is one character it can be wrong about) for anything that has to run pre-Vista.

(more on why RtlEqualUnicodeString is best here)

Noe when Benski mentioned the CFString stuff on the Mac, I have to admit I took a quick look and found some of it fascinating. I'll share my thoughts on this soon....

 

This post brought to you by ς (U+03c2, a.k.a. GREEK SMALL LETTER FINAL SIGMA)


# DavRis on 14 Sep 2007 3:57 PM:

Very useful post -- thanks!  I was just last week considering submitting a question of which string comparison to use pre-Vista for registry and file-system-like case insensitive string comparisons.


referenced by

2008/08/05 On describing poorly (and on not listening)

2008/03/07 Some armchair root cause analysis of the suckage of lstrcmpi

2007/10/15 We're Lost. Maybe the map is wrong. Maybe all maps are wrong?

2007/09/23 Docs can whet SiaO's appetite, but where's the blog?

go to newer or older post, or back to index or month or day