by Michael S. Kaplan, published on 2006/12/21 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/12/21/1331650.aspx
Yesterday in the post For the [locale] explorer in you...., I mentioned that there was a bug. Francois is actually the person who saw it, and he came and asked me about it....
The bug can be seen in the picture of the Uighur (PRC) culture (ug-CN):
Can you see it?
The side effect of the bug is the way that the parentheses are screwed up in the Native Name at the top. But the actual bug is in the purple Text section, where it claims that Bidirectional is False for a fulture for which it shoulc clearly be True.
As you can tell by looking in the source for Culture Explorer 2.0, Francois is simply using the TextInfo.IsRightToLeft property both to fill in that purple item and to set the TextBox.RightToLeft property of the controls containing native text with lines like:
NameNative.RightToLeft = ci.TextInfo.IsRightToLeft ? RightToLeft.Yes : RightToLeft.No;
So, there is a bug either in the Windows locale data for Uighur, or in the .NET Framework code that synthesizes a Windows Only culture from the Windows data.
My psychic powers suggested to me that the Windows data was correct, because although locale data can have mistakes on occasion, it is more likely that the specific locale data was reviewed than a generic process that may not have been tested across every possible culture since it shipped before Vista was widely available. It could have gone either way, I guess it was just a judgment thing.
To prove the acuity of my psychic powers, I suppose I could just ask you to run the How To [NOT] detect that a locale is bidi or even the How To detect that a locale is bidi code, or I could make you look at the binary FONTSIGNATURE for the locale, which in WCHAR values returned by GetLocaleInfo looks like this:
\x2000\x0000\x0000\x8000\x0008\x0000\x0000\x8800\x0000\x0000\x0000\x0000\x0000\x0000\x0000\x0000
But for those who do not find such a view to be too comfortable and who wanted more than just a rerun of blog posts past, let's take the following managed code instead:
using System;
using System.Globalization;
namespace Testing {
class LdmlDump {
[STAThread]
static void Main(string[] args) {
CultureInfo ci;
string stCulture;
// First figure out the name
if(args.Length > 0) {
stCulture = args[0];
} else {
stCulture = CultureInfo.CurrentCulture.Name;
}
// Create the culture and say what it is
ci = new CultureInfo(stCulture, false);
Console.WriteLine("\r\nUsing the following culture: '{0}' ({1})\r\n", ci.DisplayName, ci.Name);
// Create the replacement and fill it
CultureAndRegionInfoBuilder carib = new CultureAndRegionInfoBuilder(stCulture, CultureAndRegionModifiers.Replacement);
carib.LoadDataFromCultureInfo(ci);
carib.LoadDataFromRegionInfo(new RegionInfo(stCulture));
carib.Save(stCulture + ".ldml");
}
}
}
Stick it in a file called DumpLdml.cs and compile it with the following from CMD:
csc DumpLdml.cs /r:sysglobl.dll
Now you can run it on any culture on the machine. This code may come in handy in future posts, too. :-)
We'll try both ar-SA and ug-CN, with mn-Mong-CN for luck:
E:\Users\michkap>DumpLdml.exe ar-SA
Using the following culture: 'Arabic (Saudi Arabia)' (ar-SA)
E:\Users\michkap>DumpLdml.exe ug-CN
Using the following culture: 'Uighur (PRC)' (ug-CN)
E:\Users\michkap>DumpLdml.exe mn-Mong-CN
Using the following culture: 'Mongolian (Traditional Mongolian, PRC)' (mn-Mong-CN)
Now looking at the LDML for each, one finds some interesting info. Both ar-SA and ug-CN have the following in them for the font signature:
<msLocale:fontSignature>
<msLocale:unicodeRanges>
<msLocale:range type="13" />
<msLocale:range type="63" />
<msLocale:range type="67" />
<msLocale:layoutProgress type="horizontalRightToLeft" />
</msLocale:unicodeRanges>
while mn-Mong-CN has:
<msLocale:fontSignature>
<msLocale:unicodeRanges>
<msLocale:range type="81" />
<msLocale:layoutProgress type="verticalBeforeHorizontal" />
</msLocale:unicodeRanges>
The layoutProgress is referring to the bits I talked about previously in How To [NOT] detect that a locale is bidi -- the following bits in the Unicode subset bitfields:
123 | Windows 2000 or later: Layout progress, horizontal from right to left |
124 | Windows 2000 or later: Layout progress, vertical before horizontal |
125 | Windows 2000 or later: Layout progress, vertical bottom to top |
You can kind of tell where the language in the LDML comes from, huh? :-)
Anyway, it is clear that ug-CN has these bits set correctly, so the bug has to be in the .NET Framework code that synthesizes the Windows Only culture not using this information. Perhaps understandable given how obscure it is though -- further proof that we need our own LCTYPE containing the information in a more easily digested form? :-)
By the way Francois, I verified that this bug has already been reported in the .NET Framework, so no need to bug a new bug in. Though you could bump the number of occurrences if you wanted to. :-)
This post brought to you by ת (U+05ea, a.k.a. HEBREW LETTER TAV)
# Andrew West on 21 Dec 2006 5:09 AM:
I'm wondering what exactly verticalBeforeHorizontal used in mn-Mong-CN means, as the MSDN documentation at http://msdn2.microsoft.com/en-us/ms404373.aspx doesn't say anything about it. Chinese written vertically progresses top-to-botttom in columns running right-to-left, but Mongolian progresses top-to-botttom in columns running left-to-right. Which, if either, of these layouts does verticalBeforeHorizontal imply, and is it possible to distinguish the two vertical layouts ?
# Michael S. Kaplan on 21 Dec 2006 5:34 AM:
Look at http://msdn.microsoft.com/library/intl/unicode_63ub.asp for better info -- the text is exactly matching bit 124....
So it looks like it is claiming that the text preferentially flows vertically, in a left to right direction. Which is actually what you just said, right? :-)
# Andrew West on 21 Dec 2006 10:07 AM:
I still don't think that the documentation is very clear, but having now reread it for the third time my interpretation is that you can combine bits 123, 124 and 125 in order to specify almost any layout, so that for example if bits 123, 124 and 125 are all set then text should be laid out in vertical columns reading bottom-to-top with columns progressing right-to-left across the page (a possible Ogham layout); and with only bit 124 set then text should be laid out in vertical columns reading top-to-bottom with columns progressing left-to-right across the page (as for Mongolian and Phags-pa). Is that right?
# Michael S. Kaplan on 21 Dec 2006 11:10 AM:
Yes, that is correct.
Now I never claimed that it was exactly intuitive -- only that the text descriptions came directly from the fontsignature bits related to layout.... :-)
# Andrew West on 21 Dec 2006 12:04 PM:
OK, thanks. Just one more question. I know that the Unicode range bits are set in the OS/2 table of the font, but what about bits 123-125 -- are these also derived from the UnicodeRange field of the OS/2 table in the font?
Tim Chen on 9 Jul 2008 8:38 PM:
this is year 2008, and the problem still there.
and the culture ps-AF is suffering the same problem.
Guess it can be classified as a bug immortal now.
referenced by