Can I show the whole string or can't I?

by Michael S. Kaplan, published on 2009/09/24 08:31 -04:00, original URI:

Jim's question from a few weeks ago was:

I’m being given a Unicode string and I need to determine if it will render cleanly using the system font (not displaying any blocks or “non-supported-glyph” symbols). I’ve tried using ScriptGetCMap() and GetGlyphIndices(), but both of these flag a character like 0x0C60 as not having a glyph – although it’s actually composed of multiple glyphs and it does render properly.

Our product allows an administrator to push policy to client machines, which includes a custom message to show in a notification balloon. The administrator can enter the text on a console and might include Japanese characters, for example, and that text gets pushed to a bunch of clients, some of which can’t display those characters. The client software is supposed to display the custom message if possible (no blocks displayed) and fall back to a built-in message if the custom message won’t display correctly.

Any pointers to APIs or sample code that will accurately determine if a string can be drawn?

Sound familiar?

Well, it should!

At first glance, it is the same problem discussed in Is that character in the font or isn't it?, and that is a blog that is chock full of potential solutions!

Unfortunately, Jim's question adds one element to the problem, one new wrinkle.

And that is to also try and figure out the problem for any fonts that the system might map to via linking/fallback/substitution, etc.

And this does not exist.

To be honest, it isn't actually a problem that is worth trying to solve. As Michael Warning pointed out in that thread:

This unfortunately is a really hard problem.  And the answer will be different depending on the text stack you’re using (GDI, GDI+, DWrite).  The problem is that each stack has a different set of rules for font fallback – how it automatically changes fonts around when it encounters a character that isn't’t supported in the font you asked for.

Now Michael is thinking about the macro problem -- the complexity of all of the different models and trying to deal with how improbable it would be to capture all of these differences in code.

But to be honest the micro problem (looking at any one of these technologies) is still pretty complicated -- the kind of project where one will almost certainly fail, in the end.

So what can we do?

Well, the answer I would suggest will have to wait for the next blog.... :-)

# Henry Skoglund on 24 Sep 2009 6:33 PM:

(Couldn't resist) Isn't it possible to solve by comparing bitmaps, I mean pixels?

What he wants is really a function that detects if the rendered output contains one or more "block" characters instead of the real McCoy, right?

One approach might be then:

1) Render to an offscreen bitmap one occurrence of the "block" character. (This assumes of course it is possible to always get hold of it in a reproducible manner, maybe there is a Unicode entrance for that.)

2. Render the client Unicode string to another, similar bitmap. This one will normally be much wider and possible a bit higher.

3. Now step through bitmap #2 using BitBlts that ANDs bitmap #1 and #2 with each other yielding a third bitmap. If that third bitmap is identical to bitmap #1 we have detected an occurrence of the "block" character by comparing pixels.

Note that we have to step through all x positions in bitmap #2, say bitmap #1 is 10 pixels wide and bitmap #2 is 1000 pixels wide, then we have to perform 991 BitBlts.

Finally we'll have to repeat this for all y positions if bitmap #2 is taller than #1, then we have to repeat those 991 Bitblts for all y positions.

A human eye will of course perform this function in an instant, but maybe lots of BitBlts could solve this problem as well.

Rgrds Henry

# Michael Madsen on 25 Sep 2009 7:08 AM:

@Henry: This could be a problem when subpixel rendering (ClearType) is used, since I don't believe the character has to start at the same subpixel position - at least not unless you only compare one character at a time.

Actually, comparing the individual characters should be a bit easier and more reliable, since you would have more information to go on (no need to try multiple different starting positions) - but since you need to work with more individual bitmaps, any potential speed difference might be consumed by the constant changing of the bitmap.

And, of course, this approach would fail in the event that the character you're trying to draw just happens to look like the "replacement character" character. Since a lot of the Windows fonts will use a square for this, this might be the case if you want to use one of the characters actually representing a square - but that might be acceptable as a false positive.

# Henry Skoglund on 25 Sep 2009 9:26 AM:

@Michael: ClearType I think is only applied when drawing on a screen. But to avoid it anyway, you could use 1-bit color depths (i.e. black and white only) for all the bitmaps.

I agree about the lack of a universal "fool's gold" character. But if you render the trial (#1 above) bitmap in the exact same font as you render the client string in, then hopefully whoeever designed that particular font chose a unique bit pattern for his/hers "fool's gold", so false positives could be avoided.

# Mihai on 25 Sep 2009 2:09 PM:

The easiest way to solve a problem is to not solve it at all :-)

Just ask everybody to have all the international support installed.

(it is there in Win 7 and Vista, just do it for XP)

If that is not possible, option two is to:

- assume everything is on in Vista/Win 7 (the support is there and the fallback works pretty well)

- for XP scan the string for characters in the possible supported "blocks" (simple script LTR, complex script and RTL, East Asian).

If you find an Arabic character (for instance) and "complex script and RTL" is not install, complain.


Yes, you can still wonder about some exotic scripts that Windows .

But you pretty much eliminated 99% of the problems. Deal with the rest in tech support, or an FAQ. Depending on the locations you are trying to sell, you might discover nobody is affected.

# Random832 on 3 Oct 2009 3:04 PM:

@Henry Skoglund: What if it substitutes another font in, and that font has a hardcoded box glyph for that character?

Unrelated to anything, what is the etymology of the term "stack" as used here, and as used in e.g. "TCP/IP stack"

# hskoglund on 4 Oct 2009 9:12 AM:

@Random832: Well the question was  "I’m being given a Unicode string and I need to determine if it will render cleanly using the system font..."

Any font substition then *hopefully* will only occur at the beginning of the rendering so the "fool's gold" character will not change.

I'd say the meaning of "stack" here, you could also say "text rendering technology".

# Random832 on 5 Oct 2009 3:27 PM:

I meant the situation from the first comment under Is that character in the font or isn't it? .

Font A is the main selected font - it has its box glyph that is used for any characters can't be found in a substituted font.

Font B claims to support one character set and defines a box for some characters in that character set which it does not in fact properly support.

So for a character that B claims to support but does not _really_ support, you get B's box, whereas for other random unicode code points that nothing supports, you get A's box.

(the captcha's on this site interact badly with tabbed browsing)

# hskoglund on 8 Oct 2009 3:48 PM:

Right, font switching might occur, but if we can find an event that is fired due to that, my suggestion might still hold.

The idea then, is to do the block pattern matching multiple times, as soon as the font changes (as well as doing it at the end of the text rendering as I suggested before). I.e. create multiple bitmaps #1 and #2, one pair created whenever the fonts switch, and toss them at the next event. That way the block glyph character that we match to should still be correct.

But alas, doing a cursory glance at the MSDN docs I find no such event :-(

(Nice discussion anyway!)

go to newer or older post, or back to index or month or day