The time has come to fix a bug that has been in Windows more than twice as long as I have...

by Michael S. Kaplan, published on 2013/02/13 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2013/02/13/10393199.aspx

Anyway, I was reminded when a colleague asked me whatever happened to the bug (is was too late in the cycle to fix at that time):

May I ask what was the decision on bug described in your WideCharToMultiBytes vs. DrawTextW blog post? Was a bug logged for this issue?

Why I'm asking: From time to time we receive questions about spacy Cyrillic in Visual Studio on EA OS's. Though we're telling them that there's nothing wrong with their code and that there's a workaround in VS (set font to Segoe UI instead of Automatic which forces font to be inherited from OS in VS settings, more on that here), it would be great to provide more details to those who ask. Reference to your blog post is good but bug # and your (Windows) final decision in addition would be even better.

I thought maybe I should go follow up on that bug in DrawText, since we have time.

The bug is in using WideCharToMultiByte to detect "wide" characters by converting it using a code page.

Because every CJK code page has some characters that are not double width that is situated one of the two byte ranges inside the code page (other characters that can hit this problem include random symbols and such).

Clearly to fix this in any version would require a different, more reliable test for full width characters!

For example, a simple call to GetStringTypeW(CT_CTYPE3, ...) checking for the C3_FULLWIDTH character type flag -- the non locale specific code that should have been there all along....

The fact that a partner team noticed the bug and saw my blog explaining how easy the fix is won't hurt my case.

!!!WARNING!!!: Ideographs (and Hiragana), which are probably what the original test is _meant_ to detect (since it's deciding what to treat as "a word"), _do not_ set C3_FULLWIDTH. I detected this in writing a test program to look at my question from last time about what it returns for Cyrillic.

Also - the assertion "The various non-full-width CJK fonts don't give the Cyrillic characters full width visually" is NOT true of MS PGothic or PMingLiU. And I don't know where you could possibly get the idea that "you'd just get fat Cyrillic", those fonts give the characters the extra width by making more white space, not by stretching the glyphs. Did you only check Meiryo, or did you only look at the glyphs in isolation without putting them next to each other?

And why would it be adding spaces just because it passes this [however broken or not] test? It doesn't do so for the Ideographs, which also presumably set cChars = 2.

Incidentally, here's the output of a test program I wrote to see what flags are set by what characters:

Char Val F H ----

U+0061: 8040 0 1 0000 (Latin)

U+03b1: 8000 0 0 0000 (Greek)

U+0410: 8000 0 0 0000 (Cyrillic)

U+ff41: 8080 1 0 0000 (Latin FW)

U+4e9c: 8100 0 0 0100 (Ideograph)

U+3042: 8020 0 0 0020 (Hiragana)

U+30a2: 8090 1 0 0010 (Katakana)

U+ff71: 8050 0 1 0010 (Katakana HW)

U+0000: 0000 0 0 0000

(The last column is the value with C3_ALPHA|C3_FULLWIDTH|C3_HALFWIDTH masked out - nothing unexpected there, just ideograph, hiragana, and katakana)