'Doctor, it hurts when I do this.' Well, don't do that!

by Michael S. Kaplan, published on 2006/04/27 03:00 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/04/27/584859.aspx


Regular reader Mike Lippert asked the Suggestion Box:

Hi Michael,
Your blog is great and I really appreciate all you've written.

I just ran into some odd behavior I was wondering if you could explain.

Our app uses the Symbol font to display certain characters. We recently converted the app from ANSI to Unicode.

While testing QA set the system codepage to Russian (charset cyrillic 1251). Now many of characters drawn using the symbol font show as square boxes.

One of the characters displaying as a square box is at 0xD9, which is a logical and (U+2227). In the 1252 charset, that has the same codepoint as in Unicode U+00D9 (Latin capital letter U with grave). In the 1251 charset that position contains the Unicode U+0429 (Cyrllic capital letter shcha).

Since our app is now Unicode the MFC CDC TextOut maps to TextOutW.

Here's the odd behavior: when TextOut is called with the symbol font selected in the DC and a string consisting of the single Unicode character U+00D9, a square box is displayed. When it is called with the character U+0429, the "logical and" glyph is displayed.

So what seems to be happening when drawing with the symbol font selected, is that the Unicode string is converted to the current system codepage and those codepoints are drawn.

Is that really what's going on? I couldn't find any documentation to that effect...

Thanks,
Mike

He then followed up the next day with:

Michael,
If you've got a sec to look at the topic I just posted above I'd appreciate it as I'm trying to figure out how to work around that behavior now.
If you can't I totally understand, and I'll come up with something.

Thanks,
Mike Lippert

ps I understand if you want to delete this comment as it isn't really a topic request, but was the best way I could think of to communicate w/ you.

Apparently Mike thought he would make me his personal support line representative. I decided to get my revenge by pointing this out.

:-)

Now I am the first person to tell people to move to Unicode, believe me.

But symbol fonts aren't really Unicode. In fact, like I pointed out in More than you ever wanted to know about CP_SYMBOL, GDI and NLS can't even agree on how to try and fit them into the "character" metaphor.

In this case, it is clear from the behavior that Mike is seeing that the claim folks in GDI made that "GDI maps by a different scheme and will accept U+0020 - U+00ff" is not always going to be true -- especially (as in this case) if you are calling an MFC method that maps the bytes to Unicode for you before the call to TextOut/ExtTextOut happens....

So although I am a fan of Unicode, these symbols aren't Unicode in their current form -- so making an app Unicode but passing on symbol bytes like this will cause them to be mapped using CP_ACP. Which is pretty much guaranteed to be wrong.

To fix? Well, if you make sure to pass the symbols as the appropriate Unicode characters -- either by

  • converting the bytes you have directly with a MultiByteToWideChar call with the CP_SYMBOL code page, or
  • making sure to put them in that U+0000 - U+001f, U+f020 - U+f0ff range that GDI will recognize as being symbol font stuff

OR you could just not do this one piece with Unicode at all -- symbols are just as happy not having to be in Unicode. So the advice in the title of this post can help a lot!

Then (with any of those three methods) you should be able to see the symbols from that symbol font.

Now at this point I will apologize to Mike Lippert for teasing him, hopefully he won't be too angry (and I doubt I have enough readers to start scaring away the ones who aren't insufferably rude!).

Though I will say to everyone that you may want to look at the text in the Contacting Michael link about looking here for Product Support. I might have to add that the punishmentfor violations may be a tiny bit of good natured ridicule.... :-)

 

This post brought to you by "∑" (U+2211, a.k.a. N-ARY SUMMATION)


# mlippert on 27 Apr 2006 1:25 PM:

Hehe, I cheerfully accept all the teasing you want to dish out, since it meant I'm also getting an answer.

Thank you, I know you've got many other real responsiblities, but when my web search turned up no info, I figured it was a valid topic others might be interested in as well, and I was likely to get a better answer from you, if you responded, than from anyone else (and I was right :-).

I wish I had found your other article, I was obviously using the wrong search terms.

I was considering the 1st option you mentioned, but it seemed like it would be fraught with problems, and implementing it was going to be a little painful. Knowing that GDI will translate the 0xf020-0xf0ff range appropriately should do the trick.

Thanks again,
Mike

# mlippert on 27 Apr 2006 2:33 PM:

I thought I'd add that I want to change the code stop using the Symbol font for these characters (they all have representations in Unicode), but that will also require finding a font that contains those characters and testing to make sure the new font still looks as expected (for example when drawing integral signs using the top [U+2320], extender [U+23AE] and bottom [U+2321] characters).

Mike

Gale on 5 Jan 2009 1:04 PM:

Thank you for this post. Quite helpful to me - especially the links.


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2007/03/21 Warning: when private is used in public, it can really suck

go to newer or older post, or back to index or month or day