When features collide (aka Your LCID sucks, but sometimes the bug sucks more)

by Michael S. Kaplan, published on 2008/11/14 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/11/14/9068578.aspx


Regular readers might recall a long ago blog entitled New in Vista: What's your name? Who's your daddy?, which talked about the new name-based NLS API functions, intended to wean people off of their use of LCIDs. Because let's face it, LCIDs suck.

Anyway, it turns out that in one case at least, bugs suck more.

Maybe people recall the even earlier blog entitled New in Vista Beta 1: more use of the word 'linguistic', which described (among other things) the NORM_LINGUISTIC_CASING flag -- a flag to do proper casing for Turkic languages.

Turns out there is a problem getting these two features to work together properly....

The bug?

Well, take the following code in C# (it is a Win32 bug not a C# bug, but this lets us look at the managed case and the native one, which is sometimes relevant; plus in this case, more people can test it out themselves!):

using System;
using System.Globalization;
using System.Runtime.InteropServices;

class  test {
    static unsafe void Main() {
        Console.WriteLine("Turkish by name (native)");
        Console.WriteLine("131, \u0049 = " + CompareStringEx("tr-TR", 0x08000010, "\u0131", -1, "\u0049", -1, null, null, 0));
        Console.WriteLine("130, \u0069 = " + CompareStringEx("tr-TR", 0x08000010, "\u0130", -1, "\u0069", -1, null, null, 0));
        Console.WriteLine("\u0069, \u0049 = " + CompareStringEx("tr-TR", 0x08000010, "\u0069", -1, "\u0049", -1, null, null, 0));
        Console.WriteLine("English by name (native)");
        Console.WriteLine("131, \u0049 = " + CompareStringEx("en-US", 0x08000010, "\u0131", -1, "\u0049", -1, null, null, 0));
        Console.WriteLine("130, \u0069 = " + CompareStringEx("en-US", 0x08000010, "\u0130", -1, "\u0069", -1, null, null, 0));
        Console.WriteLine("\u0069, \u0049 = " + CompareStringEx("en-US", 0x08000010, "\u0069", -1, "\u0049", -1, null, null, 0));
        Console.WriteLine();
        Console.WriteLine("Turkish by LCID (native)");
        Console.WriteLine("131, \u0049 = " + CompareStringW(0x041f, 0x08000010, "\u0131", -1, "\u0049", -1));
        Console.WriteLine("130, \u0069 = " + CompareStringW(0x041f, 0x08000010, "\u0130", -1, "\u0069", -1));
        Console.WriteLine("\u0069, \u0049 = " + CompareStringW(0x041f, 0x08000010, "\u0069", -1, "\u0049", -1));
        Console.WriteLine("English by LCID (native)");
        Console.WriteLine("131, \u0049 = " + CompareStringW(0x0409, 0x08000010, "\u0131", -1, "\u0049", -1));
        Console.WriteLine("130, \u0069 = " + CompareStringW(0x0409, 0x08000010, "\u0130", -1, "\u0069", -1));
        Console.WriteLine("\u0069, \u0049 = " + CompareStringW(0x0409, 0x08000010, "\u0069", -1, "\u0049", -1));
        Console.WriteLine();
        Console.WriteLine("CultureInfo Turkey");
        CultureInfo ci;
        ci = new CultureInfo("tr-TR");
        Console.WriteLine(ci.CompareInfo.Name);
        Console.WriteLine(ci.CompareInfo.Compare("\u0131", "\u0049", CompareOptions.IgnoreCase));
        Console.WriteLine(ci.CompareInfo.Compare("\u0130", "\u0069", CompareOptions.IgnoreCase));
        Console.WriteLine(ci.CompareInfo.Compare("\u0069", "\u0049", CompareOptions.IgnoreCase));
        Console.WriteLine("CI en-US");
        ci = new CultureInfo("en-US");
        Console.WriteLine(ci.CompareInfo.Name);
        Console.WriteLine(ci.CompareInfo.Compare("\u0131", "\u0049", CompareOptions.IgnoreCase));
        Console.WriteLine(ci.CompareInfo.Compare("\u0130", "\u0069", CompareOptions.IgnoreCase));
        Console.WriteLine(ci.CompareInfo.Compare("\u0069", "\u0049", CompareOptions.IgnoreCase));
    }

    [DllImport("kernel32.dll",CharSet=CharSet.Unicode)]
    static unsafe extern int CompareStringEx(String strLocale, uint dwCmpFlags, String str1, int count1, string str2, int count2,
        char* version, char* reserved, int param );   

    [DllImport("kernel32.dll",CharSet=CharSet.Unicode)]
    static unsafe extern int CompareStringW(uint Locale, uint dwCmpFlags, string lpString1,
        int cchCount1, string lpString2, int cchCount2);
}

The results?

Turkish by name (native)
131, I = 3
130, i = 3
i, I = 2
English by name (native)
131, I = 3
130, i = 3
i, I = 2

Turkish by LCID (native)
131, I = 2
130, i = 2
i, I = 3
English by LCID (native)
131, I = 3
130, i = 3
i, I = 2

CultureInfo Turkey
tr-TR
0
0
1
CI en-US
en-US
1
1
0

The results that are the bug are in red.

Basically, the NORM_LINGUISTIC_CASING flag feature added in Vista does not work if you use the name-based NLS collation API functions added in Vista.

Not as bad as the whole IsSortable() == false? Well, sometimes it may be lying.... situation since in this case at least it was two different people.

However, that just lets the two people feel a little better; it doesn't really do anything for someone hit by the bug.

Thus on a scale of 1 to LAME, as mitigations go, this one is kinda lame. :-)

This is fixed in Windows 7, so I guess that's why we have new versions.

And I mentioned it here, so I guess that's why we have blogs. :-)


This post brought to you by İ (U+0130, a.k.a. LATIN CAPITAL LETTER I WITH DOT ABOVE)


no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day