Did he say shaping? It's not in the script!

by Michael S. Kaplan, published on 2006/05/31 04:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/05/31/611340.aspx


Back in early 2005 (in the post Keeping it simple with complex scripts), I talked a little bit about the way that the Uniscribe documentation gave several examples of how complex script shaping rules would be used by giving examples with Latin script cursive writing.

In retrospect this is kind of ironic, since Latin is not conventionally thought of as a 'complex script', certainly not when this documentation was written and to most people not even now.

(most of what I talk about here applies in interesting ways to complex scripts in general)

It is a funny thing, but if you use a Latin script language like English and you read the language that someone wrote out using cursive writing, you do not give the subtle change in the differences in how many letters connect a second though. You just assume that these small differences exist, and have no problem reading it.

And clearly the person writing has no trouble with these small difference

Developing a font that uses cursive (basically a 'handwriting' font) is a bit more challenging.

There is clearly no way, for example, to emulate the writer, who often needs to change the shape of the current letter based on the next letter. Because no rendering engine can read minds, the simple truth is that the initial form of the letter that is written may not match the final form once the next letter is typed -- unless you sacrifice the quality of what the final rendering will be by producing "position neutral cursive."

Of course if you choose to sacrifice that quality, it affects the reader's experience.

You could think of these two different methods, where one favors the writer by keeping the letters consistent and the other favors the reader by looking more like actual handwriting as a typography issue that lots of people don't really consider very often....

In general the font has to choose one of these two approches.

With that said, let's take a look at Segoe Script, one of the new fonts that ships with Vista.

Let's take a simple, common phrase that I am sure you find yourself using all the time, such as we welcome werewolves. Let's type it into Notepad on Vista. Every pharse starts with the first letter....

Ok, simple enough. Now let's add that second letter -- note that the connection points between w and e are such that they should be at the midpoint of the letter, not the baseline. So we can watch the w glyph change:

Just like one might do in handwriting. Ok, we'll finish the sentence:

And there are a few other examples there with other letters, too. Ok, so clearly Segoe Script is one of those fonts that is better for the readers than the writers.

Or is it?

Let's do the same thing in Wordpad, which uses a RICHEDIT control rather than an EDIT control:

Hmmm... right out of the gate there is something different. I'm almost afraid to continue:

It looks the that w did not even blink! Let's look at the whole phrase:

Clearly, our "reader" font has become a "writer" font. What happened?

It get's worse, actually. Let's look at a small managed application that renders our phrase using four different techniques and the two different kinds of Edit controls:

Suddenly everything looks more complicated than it did a moment ago, doesn't it? The EDIT control with ExtTextOutW betrays the pattern for the EDIT control, just the same kind of way that TabbedTextOutW does for RICHEDIT.

What is the underlying issue causing these seemingly pathologically diverse results?

Well, the issue is simple enough, and it is that Latin, unlike Arabic, Devanagrai, Tibetan, Sinhalese, and many others, is only sometimes considered to be a complex script. And by sometimes it is clear that I am saying in some code paths.

Fun wrappers around text rendering like Uniscribe, TabbedTextOutW, and DrawTextExW, will end up being treated in XP SP2, Vista, and other recent platforms as a complex script, while both the simpler (e.g. SetWindowTextW) and lower level (e.g. ExtTextOutW) functions will treat it like it is not.

And although ExtTextOutW has ETO_IGNORELANGUAGE, which is essentially an ETO_STOPTREATINGMELIKEIAMSOCOMPLEXYOUMISERABLECONTROLFREAK flag, it has no ETO_DONTIGNORELANGUAGE, which would be more of an ETO_IAMCOMPLEXHEARMEROARYOUMISERABLESIMPLETON kind of flag.

(Hard to believe that they don't have me authoring the names of more constants in the Platform SDK headers, isn't it?)

So the behavior you get here will be very much dependent on what method you use to get the text drawn and what control you use.

Unlike the situation in scripts that are pretty much always considered complex, which do not depend on the function called to know this extra work is needed....

Of course treating Latin as a complex script was not done exlusively for the sake of cursive Latin fonts, it was instead done for the support so many African languages that need the text to be considered complex to get all of the right shaping for diacritics.

Which means that the controls and functions that screw this up are being all that they can be for some languages, a topic I will talk about more another day.

Another topic I'll hit on in a future post is some of the additional issues with cursive fonts.

(Special thanks to MVP 'Ted' for first pointing out this issue to me, and Peter Constable for his help in getting the understandable if not intuitive explanation together!)

 

This post brought to you by w (U+0077, a.k.a. LATIN SMALL LETTER W)


# SDiZ on 31 May 2006 5:02 AM:

Will shaping work with CJK vertical-writing-style?
Somthing simliar to this => http://en.wikipedia.org/wiki/Cursive_script_%28East_Asia%29

# Phylyp on 31 May 2006 5:09 AM:

Wow, this is a really interesting post!

# Michael S. Kaplan on 31 May 2006 10:22 AM:

Hi SDiZ -- Vertical text is indeed posible (not just for CJK but in Vista for Mongolian!), but I do not know of any Grass script fonts (MS does not ship any as far as I know). So you would probably have to find your own.....


Hi Phylyp -- thanks!

# Ruben on 1 Jun 2006 2:53 PM:

Cool. But is it just me, or does Segoe Script look exceptionally ugly and hard to read, with letters jumping up and down like that?

# Michael S. Kaplan on 2 Jun 2006 12:29 AM:

Hi Ruben,

You have clearly never seen my handwriting before if you think Segoe Script is hard to read. :-)

# Ruben on 2 Jun 2006 5:12 PM:

Let's hope we're not going to see a Kaplan Script on Windows in the future then ;-)

But seriously, I think the offending script is a great showcase of OpenType features; it's an example to font designers as to what you can accomplish. Technically, that is. But by Bodoni, that's ugly. (Linotype's Zapfino, available on OSX, looks a lot better, BTW.)

# Dean Harding on 5 Jun 2006 8:34 PM:

I don't think Zapfino has contextual shaping, though does it?

# Ruben on 6 Jun 2006 4:16 PM:

OK, Zapfino doesn't have contextual features as far as I can see. I must have been confusing it with Zapfino Extra (http://www.linotype.com/zapfino). Très cool.

# Michael S. Kaplan on 6 Jun 2006 4:54 PM:

Bickham Script Pro also does this sort of thing, I believe....

# Ruben on 7 Jun 2006 3:31 PM:

Now, if only GDI+ could be convinced to support OpenType fonts, we could actually use them under .NET! As luck would have it, GDI+/WinForms is firmly TrueType-only. Yet these fonts are Postscript based OpenType fonts. So WinForms bends over backwards to disallow these wondeful fonts.

[rant]Judging by the number of MS programs that refuse to support this type of fonts properly (or at all), MS must truly hate Adobe, even long before the Office 2007/PDF debacle. And chuckle over us poor souls buying into the OpenType propaganda. The only places where you'll find real support for OpenType is in Adobe programs and on Apple's OSX. Office 2007 STILL doesn't support OpenType fully![/rant]

Oh well, gonna have to wait for WPF then...

# Michael S. Kaplan on 7 Jun 2006 4:06 PM:

Hi Ruben,

Sorry, I am not going to swing at a pitch in the dirt like this one.

But if you think that GDI+ does not support many of the various language/script tables defined in OpenType, then you don't know as much about GDI+ as you think....

No one hates anyone on this blog though. If you want to discuss that sort of thing you may want to find a different place to hold such a discussion. :-)

# Ruben on 8 Jun 2006 3:42 PM:

This is getting waaay off topic, but GDI+ really doesn't support OpenType. (According to my defintion of 'support', that is.) So maybe MS doesn't hate Adobe. Sorry about the wording--as you can see, I'm slightly passionate about this subject.

Sure, it partially supports some predefined tables. Ha!

One of the crucial parts of the OpenType specification is the dual outline format: TrueType vs. CFF/PostScript. The Bickham Script you mentioned is a CFF/PostScript based OpenType font. And GDI+ really won't allow such fonts. If you try to access such a font by name, you'll get Microsoft Sans Serif (System.Drawing.Font does that anyway). And if you try to load the .otf file, you'll get a file not found exception under .NET (that's right: file not found, not even file not supported). WinForms even goes so far as to disallow installed OpenType and Type 1 fonts, even when using GDI for rendering!

Another selling point of OpenType is the availability of optional tables for advanced typography, such as small caps, all caps, swash, oldstyle figures, discretionary ligatures, etc. (like the Cleartype fonts or Palatino Linotype have). Not a single MS product or platform apart from WPF supports this. And if you can't/won't use WPF, you'll need to mess around with Uniscribe yourself, if I'm correct, because neither GDI nor GDI+ support this.

So, from the OpenType specs, the following is not supported by GDI+:

- CFF outlines (80% of all OpenType fonts, not counting the oldstyle TrueType fonts you can buy everywhere at $1.99 that don't even support Latin 1 fully)
- selection of active features (aka tables)
- kerning (TrueType or OpenType style)

which pretty much leaves us with TrueType + (some) advanced script support, at best. But that's not OpenType support by a long shot.

So, as far as full blown OpenType typography support on Windows: it's going to be a loooong wait. Even though it's been six years after the introduction of OpenType.

# Michael S. Kaplan on 8 Jun 2006 3:47 PM:

Well, on an internationalization blog such as this one, it is easy and consistent to define OpenType support along the lines of the language/script support that exists -- and while GDI+ does not support as much as Vista or XPSP2 GDI/Uniscribe, it does support a great deal here.

But yes, this is incredibly off topic.... :-)

# Ted. on 9 Jun 2006 2:30 PM:

Thanks for this - I want to address two points: SetWindowTextW is not a text rendering function, it simply sets the internal text buffer to the text you pass it.  Some other API is still responsible for rendering the text to screen (whatever the control happens to be using internally).  

Second, you mentioned that ExtTextOut has no ETO_DONTIGNORELANGUAGE, but how about if it did? How likely would it be for something like that to be added to Vista?  It seems to be that would be the ideal solution, i.e. allow an app to force it to assume complex all the time.  I guess another would be to change over to TabbedTextOut (using a preprocessor macro remapping ExtTextOut to TabbedTextOut).  Other than that, what's your solution (i.e. shouldn't you get the Word 2007 guys to at least fix it!)

referenced by

2009/06/09 When keeping things on a level Plane[ 1] doesn't work anymore

2008/07/26 When the lines get too complex, the characters may go off script

2007/03/13 We need to be optimizing for more than just the simple cases

2006/08/11 Are ligatures supposed to be thought of as 'single characters' ?

2006/07/10 The PUA isn't complex enough

2006/06/29 Working beyond the BMP is going off script (according to GDI)

2006/06/16 Microsoft *does* support OpenType!

2006/06/16 Neither GDI nor Uniscribe solve the ultimate font problem completely, either

go to newer or older post, or back to index or month or day