by Michael S. Kaplan, published on 2008/06/19 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/06/19/8620349.aspx
The other day, Raymond Chen blogged about How do the common controls convert between ANSI and Unicode?, in response to a question in his suggestion box:
In the context of an ansi (not unicode) app: How do the common controls (listview for example) decide which code page to use when translating multibyte to widestring?
I had to debug an ansi app that was displaying corrupt strings on a traditional chinese system because the dialog font was causing the listview to use a codepage other than the system ACP when translating multibyte to widechar.
Although I would seldom if ever disagree with about anything that builds out of the Shell depot, in this particular case I know of two specific exceptions to the CP_ACP rule one generally sees, though the differences may have less of a direct relationship to the Shell/comctl32 code, meaning he might still be right within his domain. :-)
The two other behaviors I have run across in various versions of the common controls:
I honestly don't know much about the first one, but I remember reports of bugs where changing the thread locale (which changes the thread code page) would change the behavior here, and particularly on the pre-6.0 controls there was a real ANSI-Plus thing going on here that tried to move beyond CP_ACP, so while I had no proof it was true I suspected it might be.
The second one, I have more insight into since I had to debug it on a few occasions -- basically the text would not always be converted to Unicode at all; and the ANSI text is sent to GDI with a DC containing a font set to use a charset most associated with some other code page. GDI would then do its job to render and make choices there that it was kind of asked to, in a bizarre and not well understood sense.
As a rule, any time GDI tries to get into NLS stuff, the results are predictable -- buggered, every time. Thus we have problems like the ones I pointed out in What the hell is wrong with TranslateCharsetInfo, anyway?. Between problems like that and the one discussed in Double Secret ANSI, part 2 (the brokenest one yet, sorry 'bout that!) and Sometimes when you say 'the fix is in' you mean it in a good way, one thing is clear: the GDI folks should consider taking a trip over to the NLS team and giving them all atomic wedgies.
Just kidding, but you know what I mean.
For the Common Controls, when I was doing MSLU work I ran across many cases where having the latest updates on Win9x would give a lot of GDI-influenced support of text where adding MSLU and a CP_ACP mechanism broke test applications until I changed the code to do something more like this to get the code page to convert with:
UINT CpgFromHdc(HDC hdc) {
int chs;
CHARSETINFO csi;
chs = GetTextCharset(hdc);
if(TranslateCharsetInfo(&(DWORD)chs, &csi, TCI_SRCCHARSET))
return(csi.ciACP);
else
return(g_acp);
}
So anyway, the CP_ACP rule should be the only rule. but there are way too many pieces of Windows that assume they know better what to use....
On the other hand, so do I -- UNICODE! :-)
1 - Now you know how I feel about this one if you've ever seen Nothing stinks worse than the thread locale, other than the thread code page. I think I was fairly unsubtle on my feelings.
This blog brought to you by ਸ਼ (U+0a36, aka GURMUKHI LETTER SHA)