Determining if a font is gonna get it done

by Michael S. Kaplan, published on 2007/03/30 23:59 -04:00, original URI:

Stuart asks:

I don't really know who else to ask this question so I'm hoping you know the answer or could find an answer for it. Uniscribe docs are somewhat lacking :(

We (Mozilla/Firefox) are using Uniscribe for text rendering now and everything works pretty well except in one case where we're trying to determine if a font had all the glyphs to render the string or not.

Certain fonts are giving us trouble. Constantia being a good offender.

We do something like:

HFONT hfont = CreateFontW(-12, 0, 0, 0, FW_NORMAL, FALSE, FALSE,
                          0, DEFAULT_CHARSET, OUT_SCREEN_OUTLINE_PRECIS,



sfp.cBytes = sizeof(sfp);

ScriptGetFontProperties(hdc, &sc, &sfp);

int nitems;
WORD *glyphs;
WORD *clusters;

memset(&control, 0, sizeof(SCRIPT_CONTROL));
memset(&state, 0, sizeof(SCRIPT_STATE));
state.uBidiLevel = 0;
state.fOverrideDirection = 1;

int maxItems = 5;
int nglyphs;
WCHAR string[3] = { 0x20, // space -- is supported by font
                    0x27c0, // three dimensioanl angle -- not supported
                    0 };

items = (SCRIPT_ITEM *)malloc(10 * sizeof(SCRIPT_ITEM));

ScriptItemize(string, 1, maxItems, &control, &state, items, &nitems);

glyphs = (WORD *)malloc(10 * sizeof(WORD));
clusters = (WORD *)malloc(10 * sizeof(WORD));
attr = (SCRIPT_VISATTR*)malloc(10 * sizeof(SCRIPT_VISATTR));

ScriptShape(hdc, &sc, string, 2, 10, &items[0].a, glyphs, clusters, attr, &nglyphs);

You end up with glyphs[0] and glyphs[1] both being 3.

    wgBlank 3
    wgDefault 0
    wgInvalid 3

How do you tell that one is missing the one glyph and not the other one? I would expect glyphs[1] to be 0 here. Is there some flag I can set somewhere?

Maybe there is a much better way to tell if glyphs are missing.  Something like GetGlyphIndicies's GCI_MARK_NONEXISTING_GLYPHS.  That sure would be nice!


Well, I won't disagree that Uniscribe samples are somewhat lacking, with some exceptions.

Let's see if we can rehabilitate the above sample to make it work properly! :-)

Luckily for me, I am right down the hall from the Typography team, so I was able to scoot down and talk to some people. And in this case I did not even have to scoot anywhere, as Sergey happened to be walking right by my office! :-)

He took the sample above and was quickly able to determine what was going on:

This is an old code, that is controlled by psa->s.fDisplayZWG flag, that hides control characters. This font works like you describe because it does not contain ZWJ/ZWNJ. I do not know why this code is there, it precedes my time. With ScriptShape, workaround would be to set this flag to TRUE.

There are some problems if you use ScriptShape for checking if glyph is present in the font. It will show you if this combination of characters does not generate missing glyphs. But this may not be true for certain scripts, e.g. surrogates or Hangul. Have you considered ScriptGetCMap? It is less powerful (do not understand surrogates or advanced shaping), but will show you default glyphs without doing any "tricks".


So, the workaround for the above code would be to add the following line after the ScriptItemize call and before the ScriptShape call:

items[0].a->s.fDisplayZWG = TRUE;

Now taking a minute to talk about Sergey's other suggestion to consider -- using ScriptGetCMap rather than ScriptShape....

The real problem is that (as Sergey mentioned) it is indeed less powerful. In addition to not being able to understand supplementary characters (which may or may not be a problem, depending), not being able to understand any of the cases where characters can be represented but not through a direct one-to-one connection between characters and glyphs can be blocking over large parts of Unicode. So one does give up a lot of functionality by going that route.

In any case, setting the fDisplayZWG flag of the specific SCRIPT_STATE struct will help the case in question work better, though keep in kind the intent of fDisplayZWG:

TRUE if control characters are shaped as representational glyphs. Typically, control characters are shaped to the blank glyph and given a width of 0.

Note that this flag can have some unintended consequences if one is doing the actual rendering, especially if a font actually contains representational glyphs (which is usually not going to be the case). But for the requested purpose (checking for strings that can or cannot be represented by the font) it can actually do quite well....


This post brought to you by(U+27c0, a.k.a. THREE DIMENSIONAL ANGLE)

# Lionel on 31 Mar 2007 12:01 PM:

Time for some cleanup? Maybe this old code is not correct any more.

# Michael S. Kaplan on 31 Mar 2007 12:20 PM:

What makes you say that, Lionel?

# KJK::Hyperion on 31 Mar 2007 8:34 PM:

Ah-HA! that explains why, after installing Uniscribe, unpaired CRs and LFs don't generate a null glyph in notepad anymore!

# Nobody Important on 3 Apr 2007 7:51 PM:

This appears to be the bug:

Now fixed, thanked to Michael.

# Michael S. Kaplan on 4 Apr 2007 5:33 AM:

Well, like I said Sergey provided the actual info to fix it, I just facillitated things. :-)

go to newer or older post, or back to index or month or day