A difference that makes no difference makes a blog

by Michael S. Kaplan, published on 2011/05/18 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/05/18/10165599.aspx


One of the most interesting things about digit substitution is the weird cases.

like if you look at the relevant fields you get from GetLocaleInfo or GetLocaleInfoEx:

LOCALE_SNATIVEDIGITS:

Native equivalents of ASCII 0 through 9. The maximum number of characters allowed for this string is eleven, including a terminating null character. For example, Arabic uses "٠١٢٣٤٥ ٦٧٨٩". See also LOCALE_IDIGITSUBSTITUTION.

LOCALE_IDIGITSUBSTITUTION:

ValueMeaning
0 Context-based substitution. Digits are displayed based on the previous text in the same output. European digits follow Latin scripts, Arabic-Indic digits follow Arabic text, and other national digits follow text written in various other scripts. When there is no preceding text, the locale and the displayed reading order determine digit substitution, as shown in the following table.
Locale     Reading order     Digits used
Arabic      Right-to-left          Arabic-Indic
Thai         Left-to-right           Thai digits
All others Any                        No substitution used
1 No substitution used. Full Unicode compatibility.
2 Native digit substitution. National shapes are displayed according to LOCALE_SNATIVEDIGITS.

So basically LOCALE_SNATIVEDIGITS can be some native set of digits.

And LOCALE_IDIGITSUBSTITUTION decides whether to always use 0123456789 (which happens when the value is 1), to always be LOCALE_SNATIVEDIGITS (which happens when the value is 2), or to sometimes be one and sometimes another (which happens when the value is 0, for some locales).

Of course the times these settings fall down is any time LOCALE_SNATIVEDIGITS is "0123456789" and LOCALE_IDIGITSUBSTITUTION is 0 or 2 -- since these settings basically ask the system to replace 0123456789 to 0123456789.

Oops.

Now of course you can set it this way yourself in Regional and Language Options.

And every version of Windows has had some locales that have data like this in anywhere from at least 2 to 6 locales.

Here's some managed code that builds a list:


using System;
using System.Text;
using System.Globalization;
using System.Runtime.InteropServices;

public class Test {
    public static void Main() {
        StringBuilder sb;
            foreach(CultureInfo ci in CultureInfo.GetCultures(CultureTypes.SpecificCultures)) {
            uint uDS;
            GetLocaleInfoW((uint)ci.LCID, LOCALE_IDIGITSUBSTITUTION | LOCALE_RETURN_NUMBER, out uDS, 4);
            if(uDS==0 || uDS==2) {
                sb = new StringBuilder(11);
                GetLocaleInfoW((uint)ci.LCID, LOCALE_SNATIVEDIGITS, sb, 11);
                if(sb.ToString().Equals("0123456789")) {
                    Console.WriteLine("{0}\tIDIGITSUBSTITUTION=={1}\t SNATIVEDIGITS=={2}", ci.Name, uDS, sb.ToString());
                }
            }
        }
    }

    static uint LOCALE_RETURN_NUMBER = 0x20000000;
    static uint LOCALE_IDIGITSUBSTITUTION = 0x1014;
    static uint LOCALE_SNATIVEDIGITS = 19;

    [DllImport("kernel32.dll", CharSet=CharSet.Unicode, ExactSpelling=true, CallingConvention=CallingConvention.StdCall)]
    private static extern int GetLocaleInfoW(uint Locale, uint LCType, StringBuilder lpLCData, int cchData);

    [DllImport("kernel32.dll", CharSet=CharSet.Unicode, ExactSpelling=true, CallingConvention=CallingConvention.StdCall)]
    private static extern int GetLocaleInfoW(uint Locale, uint LCType, out uint lpLCData, int cchData);
}


You can run this on various versions of Windows.

Like in XP SP2, the list is ky-KG, mn-MN, ar-LY, ar-DZ, ar-MA, and ar-TN.

Or In Windows Server 2008, where the improved list is ky-KG and MN-MN.

Or in Windows 7, where the small backslide the list is en-US, ar-LY, ar-DZ, ar-MA, and ar-TN.

Ultimately there are two problems here -- one to do with theoretical data purity (the data just seems wrong), and the other to do with data performance (asking the system to do processing that isn't necessary can have performance impact).

Though in practice, since a user can set it the same way in Regional and Language Options, I'd rather that the system just determined when the operation would be a no-op (this scenario) and just stopped processing. Since then everyone will benefit, including any user with the wrong settings, any custom locale with the wrong settings, and any future data with the wrong settings (the latter is a reasonable supposition since very version has been wrong in at least a  few cases so far!).

Even if this optimization is not happening (it may be!) and even if it never happens, the "wrong" data doesn't lead to wrong results.

Thus my conclusion:

You see, as any cat will tell you, curiosity never killed anything other than a few hours.

And a difference that makes no difference? It makes no differencea blog.


no comments

referenced by

2011/10/06 The unused case case (i.e. the case of the unused case), part 2

2011/10/05 The unused case case (i.e. the case of the unused case)

go to newer or older post, or back to index or month or day