GB18030 isn't an ACP, either

by Michael S. Kaplan, published on 2007/07/11 23:45 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/07/11/3823291.aspx


The question went something like this:

I'm trying to display GB18030 text (say unicode 0x3400 character) using DrawTextA and WideCharToMultiByte. I am using the code page for GB18030 which is 54936.

Why doesn't this work?  Originally, I thought it had to do with font linking.
 
Thanks for the help.

You can see what is going on here -- the general assumption that the non-Unicode Win32 API will handle any/every code page that isn't Unicode. Which we know isn't true from the many times UTF-8 support in "A" functions has been discussed (if you look at Raymond's recent post that points out so many of the times I have talked about it, the subject has come up way too many times!).

Once I pointed out UTF-8 and Gb18030 at the same time (in UTF-8 and GB18030 are both 'NT' code pages, they just aren't 'ANSI' code pages).

Now GDI is fundamentally a Unicode thing internally and was even back in Windows 95, mainly because most of the plumbing is Unicode anyway.

The issue of which code page is used is not a simple answer like CP_ACP, as I pointed out in What code page does MSLU convert with?. The MS Layer for Unicode was designed to map to what the OS does in so many cases, including the GDI ones that were kind of based on the charset of a device context.

But all of the underlying code pages that the charset values map to are ACPs, and GB18030 cannot be an ACP, for much the same reasons that UTF-8 cannot.

Obviously, the quick answer is to use DrawTextW with the original Unicode text that isn't converted at all, rather than converting it and not being able to display all of the data that DrawTextA won't recognize....

 

This post brought to you by (U+3400, the first CJK ideograph in CJK Extension A)


no comments

go to newer or older post, or back to index or month or day