Getting a locale from a code page?

by Michael S. Kaplan, published on 2006/03/12 03:11 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/03/12/549746.aspx


Late last month, Kranthi asked in the microsoft.public.win32.programmer.international newsgroup:

Is there anyway Code Page can be related to LCID? I mean, if I get Code Page, can i get LCID from it?

Well, the problem here is of course that this mapping does not really exist anywhere in Windows. Folks piled on to explain this....

Now the other way around is easy -- GetLocaleInfo with the LOCALE_IDEFAULTANSICODEPAGE LCType or the TranslateCharsetInfo function with the TCI_SRCLOCALE flag can get you a code page (among other things for that latter function).

But the fact is that there are just a handful of code pages and there are nearly 200 locales. So there is no result that you can get back that is likely to be meaningful for most of the actual work you might want to do with locales.

To be honest the only functionality in Windows that gives a locale when the source is a code page is the nLangID member of the DetectEncodingInfo structure that is returned by the IMultiLanguage2::DetectInputCodepage and IMultiLanguage2::DetectCodepageInIStream methods.

Not although that nLangID is documented as containing "Microsoft Win32-defined primary language identifier that has been detected for the source string", the truth is that it is little better than a "Code page 1252? That must be English, so let's return 0x0409" sort of method. Which is frankly just a little bit embarrassing, if you ask me.

If you absolutely need to get a locale from a code page, then you can enumerate the locales with EnumSystemLocales and for each locale use GetLocaleInfo with the LOCALE_IDEFAULTANSICODEPAGE, LOCALE_IDEFAULTCODEPAGE, LOCALE_IDEFAULTMACCODEPAGE, and LOCALE_IDEFAULTEBCDICCODEPAGE LCType values until you find a match. But the results are simply not going to be all that useful, so the best thing to do here may be to reconsider whatever decision led to wanting the functionality....

 

This post brought to you by "│" (U+ffe8, a.k.a. HALFWIDTH FORMS LIGHT VERTICAL)


Martin on 15 Mar 2011 4:55 AM:

Hi Micheal,

I need to compare 2 Unicode strings (case insensitively) and I only have the CodePage.

I am only interested to know whether or not the strings match i.e. I am not going to sort the strings.

I am just interested to know if (wcsicmp() == 0).

My plan is to:

1. Obtain the locale from the CodePage (as you described above).

2. Set the locale and run the comparison.

What do you think of my scenario, will the obtained Locale be useful for this type of operation?

If no is there a better approach?

Martin on 15 Mar 2011 5:51 AM:

Actually after a little bit more research my code looks like this:

static bool MYWCSICMP(const wchar_t* lpwszOne,const wchar_t* lpwszTwo,ULONG ulCodePage)
{
  char szLocale[128] = {0};
  std::string sCP = setlocale(LC_CTYPE,NULL);
  sprintf(szLocale,".%u",ulCodePage);
  setlocale(LC_CTYPE,szLocale);
  bool bRet = _wcsicmp(lpwszOne,lpwszTwo);
  setlocale(LC_CTYPE,sCP.c_str());
  return bRet;
}

Do you see anything wrong with this?


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day