by Michael S. Kaplan, published on 2006/04/27 03:00 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/04/27/584859.aspx
Regular reader Mike Lippert asked the Suggestion Box:
Hi Michael,
Your blog is great and I really appreciate all you've written.
I just ran into some odd behavior I was wondering if you could explain.
Our app uses the Symbol font to display certain characters. We recently converted the app from ANSI to Unicode.
While testing QA set the system codepage to Russian (charset cyrillic 1251). Now many of characters drawn using the symbol font show as square boxes.
One of the characters displaying as a square box is at 0xD9, which is a logical and (U+2227). In the 1252 charset, that has the same codepoint as in Unicode U+00D9 (Latin capital letter U with grave). In the 1251 charset that position contains the Unicode U+0429 (Cyrllic capital letter shcha).
Since our app is now Unicode the MFC CDC TextOut maps to TextOutW.
Here's the odd behavior: when TextOut is called with the symbol font selected in the DC and a string consisting of the single Unicode character U+00D9, a square box is displayed. When it is called with the character U+0429, the "logical and" glyph is displayed.
So what seems to be happening when drawing with the symbol font selected, is that the Unicode string is converted to the current system codepage and those codepoints are drawn.
Is that really what's going on? I couldn't find any documentation to that effect...
Thanks,
Mike
He then followed up the next day with:
Michael, Apparently Mike thought he would make me his personal support line representative. I decided to get my revenge by pointing this out. :-) Now I am the first person to tell people to move to Unicode, believe me. But symbol fonts aren't really Unicode. In fact, like I pointed out in More than you ever wanted to know about CP_SYMBOL, GDI and NLS can't even agree on how to try and fit them into the "character" metaphor. In this case, it is clear from the behavior that Mike is seeing that the claim folks in GDI made that "GDI maps by a different scheme and will accept U+0020 - U+00ff" is not always going to be true -- especially (as in this case) if you are calling an MFC method that maps the bytes to Unicode for you before the call to TextOut/ExtTextOut happens.... So although I am a fan of Unicode, these symbols aren't Unicode in their current form -- so making an app Unicode but passing on symbol bytes like this will cause them to be mapped using CP_ACP. Which is pretty much guaranteed to be wrong. To fix? Well, if you make sure to pass the symbols as the appropriate Unicode characters -- either by OR you could just not do this one piece with Unicode at all -- symbols are just as happy not having to be in Unicode. So the advice in the title of this post can help a lot! Then (with any of those three methods) you should be able to see the symbols from that symbol font. Now at this point I will apologize to Mike Lippert for teasing him, hopefully he won't be too angry (and I doubt I have enough readers to start scaring away the ones who aren't insufferably rude!). Though I will say to everyone that you may want to look at the text in the Contacting Michael link about looking here for Product Support. I might have to add that the punishmentfor violations may be a tiny bit of good natured ridicule.... :-) This post brought to you by "∑" (U+2211, a.k.a. N-ARY SUMMATION)
If you've got a sec to look at the topic I just posted above I'd appreciate it as I'm trying to figure out how to work around that behavior now.
If you can't I totally understand, and I'll come up with something.
Thanks,
Mike Lippert
ps I understand if you want to delete this comment as it isn't really a topic request, but was the best way I could think of to communicate w/ you.
# mlippert on 27 Apr 2006 1:25 PM:
# mlippert on 27 Apr 2006 2:33 PM:
Gale on 5 Jan 2009 1:04 PM:
Thank you for this post. Quite helpful to me - especially the links.
referenced by