How do[es what] the common controls [call ]convert between ANSI and Unicode?

by Michael S. Kaplan, published on 2008/06/19 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/06/19/8620349.aspx

The other day, Raymond Chen blogged about How do the common controls convert between ANSI and Unicode?, in response to a question in his suggestion box:

In the context of an ansi (not unicode) app: How do the common controls (listview for example) decide which code page to use when translating multibyte to widestring?

I had to debug an ansi app that was displaying corrupt strings on a traditional chinese system because the dialog font was causing the listview to use a codepage other than the system ACP when translating multibyte to widechar.

Although I would seldom if ever disagree with about anything that builds out of the Shell depot, in this particular case I know of two specific exceptions to the CP_ACP rule one generally sees, though the differences may have less of a direct relationship to the Shell/comctl32 code, meaning he might still be right within his domain. :-)

The two other behaviors I have run across in various versions of the common controls:

Use of the thread code page (CP_THREAD_ACP)¹.
Use of the code page associated with the font charset selected in a device context.

I honestly don't know much about the first one, but I remember reports of bugs where changing the thread locale (which changes the thread code page) would change the behavior here, and particularly on the pre-6.0 controls there was a real ANSI-Plus thing going on here that tried to move beyond CP_ACP, so while I had no proof it was true I suspected it might be.

The second one, I have more insight into since I had to debug it on a few occasions -- basically the text would not always be converted to Unicode at all; and the ANSI text is sent to GDI with a DC containing a font set to use a charset most associated with some other code page. GDI would then do its job to render and make choices there that it was kind of asked to, in a bizarre and not well understood sense.

As a rule, any time GDI tries to get into NLS stuff, the results are predictable -- buggered, every time. Thus we have problems like the ones I pointed out in What the hell is wrong with TranslateCharsetInfo, anyway?. Between problems like that and the one discussed in Double Secret ANSI, part 2 (the brokenest one yet, sorry 'bout that!) and Sometimes when you say 'the fix is in' you mean it in a good way, one thing is clear: the GDI folks should consider taking a trip over to the NLS team and giving them all atomic wedgies.

Just kidding, but you know what I mean.

For the Common Controls, when I was doing MSLU work I ran across many cases where having the latest updates on Win9x would give a lot of GDI-influenced support of text where adding MSLU and a CP_ACP mechanism broke test applications until I changed the code to do something more like this to get the code page to convert with:

UINT CpgFromHdc(HDC hdc) {
    int chs;
    CHARSETINFO csi;

    chs = GetTextCharset(hdc);
    if(TranslateCharsetInfo(&(DWORD)chs, &csi, TCI_SRCCHARSET))
        return(csi.ciACP);
    else
        return(g_acp);
}

So anyway, the CP_ACP rule should be the only rule. but there are way too many pieces of Windows that assume they know better what to use....

On the other hand, so do I -- UNICODE! :-)

1 - Now you know how I feel about this one if you've ever seen Nothing stinks worse than the thread locale, other than the thread code page. I think I was fairly unsubtle on my feelings.

This blog brought to you by ਸ਼ (U+0a36, aka GURMUKHI LETTER SHA)

no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day