What the hell is wrong with TranslateCharsetInfo, anyway?

by Michael S. Kaplan, published on 2006/08/28 09:49 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/08/28/728336.aspx


The other day, Colin had a customer with a question about some unexpected results from the Win32 function TranslateCharsetInfo:

My customer is having issues with the TranslateCharsetInfo API. They are using the TCI_SRCLOCALE to use the LCID as the source

fResult = TranslateCharsetInfo((DWORD*)(langid), &csi, TCI_SRCLOCALE);

What they find is that the results of this call do not match always match the values listed in http://www.microsoft.com/globaldev/nlsweb/default.mspx

Specifically:

  • LCID 0x0C04 – Chinese (Hong Kong S.A.R.) - returns codepage 936 instead of the expected codepage 950
  • LCID 0x0004 - Chinese (Simplified) – returns codepage 950 instead of the expected 936

Is what I’m seeing expected, if not is there a better way to achieve this? 

We did also try:

GetLocaleInfo (langid, LOCALE_IDEFAULTANSICODEPAGE, szLocaleData, sizeof(szLocaleData)/sizeof(char)) ;

However from this the 0x0C04 LCID returns the expected 950 codepage (getting better), but the 0x0004 LCID still returns 950 instead of the 936 that the web documentation suggest I should have returned.

Or am I missing something?

Many thanks,

Colin

The results of the two LCID values are actually caused by two entirely different issues.

In both cases, GDI is relying on the info that GetLocaleInfo with the LOCALE_FONTSIGNATURE value for the LCTYPE returns. In other words, GDI is depending on NLS for the info on what to do here.

In one case, the fact that

can be considered a legitimate design flaw in NLS that has only been fixed in Vista (via its ConvertDefaultLocale function whose logic all NLS API functions that take an LCID go through.

In other words, that bug no longer repros on Vista. :-)

Now for the other half of the question, let's use the CultureAndRegionInfoBuilder's secret LOCALESIGNATURE parsing with a simple bit of code like the following:

using System;
using System.Globalization;

namespace LDML {
    class LDML {
        [STAThread]
        static void Main(string[] args) {
            CultureAndRegionInfoBuilder carib = new CultureAndRegionInfoBuilder(args[0], CultureAndRegionModifiers.Replacement);
            carib.LoadDataFromCultureInfo(new CultureInfo(args[0], false));
            carib.LoadDataFromRegionInfo(new RegionInfo(args[0]));
            carib.Save(args[0] + ".ldml");
        }
    }
}

Then, after you save this as ldml.cs and compile it:

csc ldml.cs /r:sysglobl.dll

then you can save out the LDML that zh-HK (0x0c04) uses, and look at the markup afterward:

      <msLocale:fontSignature>
        <msLocale:unicodeRanges>
          <msLocale:range type="0" />
          <msLocale:range type="1" />
          <msLocale:range type="2" />
          <msLocale:range type="3" />
          <msLocale:range type="5" />
          <msLocale:range type="7" />
          <msLocale:range type="9" />
          <msLocale:range type="31" />
          <msLocale:range type="35" />
          <msLocale:range type="36" />
          <msLocale:range type="37" />
          <msLocale:range type="38" />
          <msLocale:range type="39" />
          <msLocale:range type="42" />
          <msLocale:range type="43" />
          <msLocale:range type="45" />
          <msLocale:range type="46" />
          <msLocale:range type="48" />
          <msLocale:range type="49" />
          <msLocale:range type="50" />
          <msLocale:range type="51" />
          <msLocale:range type="54" />
          <msLocale:range type="59" />
          <msLocale:range type="60" />
          <msLocale:range type="68" />
        </msLocale:unicodeRanges>
        <msLocale:defaultCodePages>
          <msLocale:ansiCodePage />
          <msLocale:ansiOemCodePage>
            <msLocale:codePage type="936" />
          </msLocale:ansiOemCodePage>
          <msLocale:oemCodePage />
        </msLocale:defaultCodePages>
        <msLocale:codePages>
          <msLocale:ansiCodePage />
          <msLocale:ansiOemCodePage>
            <msLocale:codePage type="936" />
          </msLocale:ansiOemCodePage>
          <msLocale:oemCodePage />
        </msLocale:codePages>
      </msLocale:fontSignature>

Note especially the code page that the LOCALESIGNATURE has (marked in RED above). So technically it is not GDI's fault for returning the wrong information, since it is simply relying on NLS.

Though technically you cannot blame NLS either, since all of the LOCALESIGNATURE data is provided to us by the typography team. Eventually we may want an update from them on this so that GetLocaleInfo can return consistent results between LOCALE_IDEFAULTANSICODEPAGE, LOCALE_IDEFAULTCODEPAGE, and LOCALE_FONTSIGNATURE....

How to fix is an interesting question, though.

In theory, given the increased usage of Simplified Chinese in Hong Kong in recent years would make it interesting to change the LOCALESIGNATURE's default code page to be 950, but to list both 936 and 950 in the code page section. And from a descriptive standpoint that might make a lot of sense.

In practice, however, the LOCALESIGNATURE's raison d'être is to provide informaton for creating a sensible default font to use for the locale. And generally speaking one does not have fonts that would support both -- the fact that either one may be used does not take into account that it really is an either/or proposition. So the best fix is likely just to make the LOCALESIGNATURE match the locale it sits in....

 

This post brought to you by (U+1789, KHMER LETTER NYO)


# SDiZ on 31 Aug 2006 10:02 AM:

Chinese (Hong Kong S.A.R.) and CP950 are so special.

Most Chinese user in Hong Kong use BIG5-HKSCS encoding, which is an extension to the BIG5 encoding.

Microsoft provide a varient of BIG5 on CP950. After installing a patch ( available http://www.microsoft.com/hk/hkscs/ ), the CP950 becomes BIG5-HKSCS…… my CP950 may not be same as your CP950……

# Michael S. Kaplan on 31 Aug 2006 10:26 AM:

Since there is no charset specific to HK, TraslateCharsetInfo will not help here.

Though note that Vista has a different solution to the problem than the hack/patch thing you are referring to. :-)

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2009/09/02 How ConvertDefaultLocale sorta broke backward compatibility in Windows 7, and why

2008/10/05 Can I get your [font]signature on this, please?

2008/08/15 Yet another time that UTF-8 can't be the ACP

2008/06/19 How do[es what] the common controls [call ]convert between ANSI and Unicode?

2007/03/20 Double Secret ANSI, part 2 (the brokenest one yet, sorry 'bout that!)

go to newer or older post, or back to index or month or day