Nothing stinks worse than the thread locale, other than the thread code page

by Michael S. Kaplan, published on 2007/05/29 02:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/05/28/2960411.aspx


The piece of mail I got (via the Contact link) from Ken was:

Hi Michael,
I have run into what I believe is a bug in MultibyteToWideChar() and WideCharToMultibyte() when the code page parameter is set to CP_THREAD_ACP, 'default language for non-Unicode applications' had been set to Hebrew.   This is seen when using the utility macros in atlconv.h like T2WC.

I've created a simple test app that shows unexpected results on some systems.  The code page inferred from CP_THREAD_ACP is not the same as GetACP().  I have reproduced this on two different systems set to use Hebrew, but not on two other systems set to use Traditional Chinese - one of which was set to the full Traditional Chinese localized UI.  The source is part of a default .net 2003 generated console project.

#include "stdafx.h"
#include <ostream>
int _tmain(int argc, _TCHAR* argv[])
{
    std::cout << "default code page is " << GetACP() << std::endl;
    std::cout << "_AtlGetConversionACP code page is " << ATL::_AtlGetConversionACP() << std::endl;

    CPINFOEX cpinfo = {};
    GetCPInfoEx(ATL::_AtlGetConversionACP(), 0, &cpinfo);

    std::cout << "Thread code page is " << cpinfo.CodePage << std::endl;
    return 0;
}

My results are:
default code page is 1255
_AtlGetConversionACP code page is 3
Thread code page is 1252

When my application calls T2WC, the results are incorrect and the codepoints are extended to 16 bits, but not converted to their Hebrew codepoints.  We are getting around this by using _CONVERSION_DONT_USE_THREAD_LOCALE, but I had wondered if others have heard of this problem before.

Thanks for your time,
Ken 

Regular readers may recall when I pointed out Why I think the thread locale really stinks.

(In fact, I was asked not too long ago to help clean up some of the bad usages of the thread locale in various parts of Windows in shell32.dll and shlwapi.dll, something I will probably be working on shortly!)

Anyway, after Ken pointed out that the use of _CONVERSION_DONT_USE_THREAD_LOCALE works around the problem, it seems pretty obvious that CP_THREAD_ACP is none other than the LOCALE_IDEFAULTANSICODEPAGE as returned by GetLocaleInfo with the return of GetThreadLocale as the LCID.

Now the thread code page is a pretty shaky thing, and not only for the reason that make me feel like the thread locale stinks. Imagine basing code page conversions on something that any code running in the thread can change any time. Yuck!

In fact, it is downright nasty that ATL and MFC made a breaking change in version 7.0 in this area (as described here):

String Conversions

In versions of ATL up to and including ATL 3.0 in Visual C++ 6.0, string conversions using the macros in atlconv.h were always performed using the ANSI code page of the system (CP_ACP). Starting with ATL 7.0 in Visual C++ .NET, string conversions are performed using the default ANSI code page of the current thread, unless _CONVERSION_DONT_USE_THREAD_LOCALE is defined, in which case the ANSI code page of the system is used as before.

Note that the string conversion classes, such as CW2AEX, allow you to pass a code page to use for the conversion to their constructors. If a code page is not specified, the classes use the same code page as the macros.

For more information, see ATL and MFC String Conversion Macros.

Yuck. I hate breaking changes that are bad. And this is definitely one of them. :-(

Sorry Ken, the strange differences here are kind of by [bad] design. And your workaround is actually the fix here -- it works around what I consider a breaking change that breaks a little bit of ATL here.

In the end, my best advice is to NEVER use either the thread locale or the thread code page. For anything. Ever....

 

This post brought to you by װ (U+05f0, a.k.a. HEBREW LIGATURE YIDDISH DOUBLE VAV)


# Bob Smith on 5 Jul 2007 9:19 AM:

Thanks for your explanation and warning. We fell into this hole in a big way resulting in finding your materials.

Do you know why the ATL & MFC team did this?

It would be interesting to know as breaking changes are normally not approved lightly and particularly the change does not seem to be appreciated and producing the desirable result.

# Michael S. Kaplan on 5 Jul 2007 11:16 AM:

Well, perhaps they had gotten feedback about how bad the non-updatability of CP_ACP is within apps was for some people. I'm not entirely sure, though I am also curious about it....

# John Swartzentruber on 11 Jul 2007 11:19 AM:

_CONVERSION_DONT_USE_THREAD_LOCALE didn't work for me. We're using MFC from mfc80.dll, so it appears that the CString::LoadString() code always uses the thread locale code page. Should this work? Are there any other work-arounds?

# Michael S. Kaplan on 11 Jul 2007 11:43 AM:

Workaround for CString::LoadString()?

Use Unicode and then no code page gets used or mis-used. :-)

I'll probably blog about why this case fails though, it is interesting....

# jswartzen on 12 Jul 2007 12:54 PM:

I've recommended the Unicode "workaround", and that's what we will use in the next release. Meanwhile, I'm stuck trying to make sure the current release works. It sucks because I know that whatever I do now will all be wasted effort when everything is Unicode.

# Michael S. Kaplan on 12 Jul 2007 12:58 PM:

Well, you can load the string yourself via LoadstringW and not use the method, right?

# jswartzen on 12 Jul 2007 2:06 PM:

I can. And I probably will, but we're talking about over 1600 calls in about 300 different files. And then probably putting it all back again when we're in  Unicode.

# Michael S. Kaplan on 13 Jul 2007 9:51 PM:

You could just override the one method though, couldn't you? Or you could even change the thread locale they sre using to one that will work better for you?

There are definitely cheaper options....

# jswartzen on 16 Jul 2007 4:44 PM:

It turns out we're going to solve this problem by ignoring it. Our next release won't be translated, so the code page issue won't be a factor. Our next release after that will be in Unicode, so the problem will go away for real then. Thanks for your help.

# Ståle L. Hansen on 3 Dec 2007 10:22 AM:

You can solve this by using SetThreadLocale(LOCALE_SYSTEM_DEFAULT), and then never touch the thread locale again. But be careful with setting CurrentThread.CurrentCulture, because that also changes the native thread locale.

# Mike on 30 Apr 2008 10:47 PM:

Setting the thread code page would be really useful for us.

We have a specific set of threads that handle communication with various external systems, which must use UTF8.  The rest of our system uses the system/user code page and/or UCS2.  It would be really, really handy if I could set our communications threads' code pages to UTF8 (65001), so that all those implicit conversions between MBCS and UCS2 done by _bstr_t would `just work' in the comms threads... saving me from converting a lot of legacy (non-MBCS-aware) messaging code to handle it explicitly.

In fact, that's how I reached this page (googling for a cheap way to do this).  You tease.

# mac on 14 May 2008 7:29 AM:

Hi,

We have a non-unicode application written in VC5. Recently we migrated it to VC8. This application includes MFC, ATL, C++, C tech.

Once I change language for non-unicode programs to 'chinese (prc)' on win xp and run the application, user is able to enter chinese characters in the text field. This is being stored and retrieved in the database.

This works fine in VC5. After migration to VC8 it started failing. User provided user input 我耳鼻喉科 is being converted to ÎÒ¶ú±Çºí¿Æ.

Research shows that CString to BSTR string conversion causes this problem.

CString strTemp = _T("我耳鼻喉科");

BSTR bstrTemp = strTemp.AllocSysString();

MSDN says -

In versions of ATL up to and including ATL 3.0 in Visual C++ 6.0, string conversions using the macros in atlconv.h were always performed using the ANSI code page of the system (CP_ACP). Starting with ATL 7.0 in Visual C++ .NET, string conversions are performed using the default ANSI code page of the current thread.

I tried using SetThreadLocale(LOCALE_SYSTEM_DEFAULT); which resolved my problem.

In our application there are several threads being forked by many dlls.

1) How many times I need to call this API? Where?

2) Does it have any side effect if switched back to 'ENGLISH' system locale?

Testing will be cumbersome. As my product contains 15 exe and 100 dlls (ATL/COM, C++).

-Mac


referenced by

2010/07/25 Which code page to use? The right one, of course!

2008/06/19 How do[es what] the common controls [call ]convert between ANSI and Unicode?

go to newer or older post, or back to index or month or day