by Michael S. Kaplan, published on 2006/05/31 04:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/05/31/611340.aspx
Back in early 2005 (in the post Keeping it simple with complex scripts), I talked a little bit about the way that the Uniscribe documentation gave several examples of how complex script shaping rules would be used by giving examples with Latin script cursive writing.
In retrospect this is kind of ironic, since Latin is not conventionally thought of as a 'complex script', certainly not when this documentation was written and to most people not even now.
(most of what I talk about here applies in interesting ways to complex scripts in general)
It is a funny thing, but if you use a Latin script language like English and you read the language that someone wrote out using cursive writing, you do not give the subtle change in the differences in how many letters connect a second though. You just assume that these small differences exist, and have no problem reading it.
And clearly the person writing has no trouble with these small difference
Developing a font that uses cursive (basically a 'handwriting' font) is a bit more challenging.
There is clearly no way, for example, to emulate the writer, who often needs to change the shape of the current letter based on the next letter. Because no rendering engine can read minds, the simple truth is that the initial form of the letter that is written may not match the final form once the next letter is typed -- unless you sacrifice the quality of what the final rendering will be by producing "position neutral cursive."
Of course if you choose to sacrifice that quality, it affects the reader's experience.
You could think of these two different methods, where one favors the writer by keeping the letters consistent and the other favors the reader by looking more like actual handwriting as a typography issue that lots of people don't really consider very often....
In general the font has to choose one of these two approches.
With that said, let's take a look at Segoe Script, one of the new fonts that ships with Vista.
Let's take a simple, common phrase that I am sure you find yourself using all the time, such as we welcome werewolves. Let's type it into Notepad on Vista. Every pharse starts with the first letter....
Ok, simple enough. Now let's add that second letter -- note that the connection points between w and e are such that they should be at the midpoint of the letter, not the baseline. So we can watch the w glyph change:
Just like one might do in handwriting. Ok, we'll finish the sentence:
And there are a few other examples there with other letters, too. Ok, so clearly Segoe Script is one of those fonts that is better for the readers than the writers.
Or is it?
Let's do the same thing in Wordpad, which uses a RICHEDIT control rather than an EDIT control:
Hmmm... right out of the gate there is something different. I'm almost afraid to continue:
It looks the that w did not even blink! Let's look at the whole phrase:
Clearly, our "reader" font has become a "writer" font. What happened?
It get's worse, actually. Let's look at a small managed application that renders our phrase using four different techniques and the two different kinds of Edit controls:
Suddenly everything looks more complicated than it did a moment ago, doesn't it? The EDIT control with ExtTextOutW betrays the pattern for the EDIT control, just the same kind of way that TabbedTextOutW does for RICHEDIT.
What is the underlying issue causing these seemingly pathologically diverse results?
Well, the issue is simple enough, and it is that Latin, unlike Arabic, Devanagrai, Tibetan, Sinhalese, and many others, is only sometimes considered to be a complex script. And by sometimes it is clear that I am saying in some code paths.
Fun wrappers around text rendering like Uniscribe, TabbedTextOutW, and DrawTextExW, will end up being treated in XP SP2, Vista, and other recent platforms as a complex script, while both the simpler (e.g. SetWindowTextW) and lower level (e.g. ExtTextOutW) functions will treat it like it is not.
And although ExtTextOutW has ETO_IGNORELANGUAGE, which is essentially an ETO_STOPTREATINGMELIKEIAMSOCOMPLEXYOUMISERABLECONTROLFREAK flag, it has no ETO_DONTIGNORELANGUAGE, which would be more of an ETO_IAMCOMPLEXHEARMEROARYOUMISERABLESIMPLETON kind of flag.
(Hard to believe that they don't have me authoring the names of more constants in the Platform SDK headers, isn't it?)
So the behavior you get here will be very much dependent on what method you use to get the text drawn and what control you use.
Unlike the situation in scripts that are pretty much always considered complex, which do not depend on the function called to know this extra work is needed....
Of course treating Latin as a complex script was not done exlusively for the sake of cursive Latin fonts, it was instead done for the support so many African languages that need the text to be considered complex to get all of the right shaping for diacritics.
Which means that the controls and functions that screw this up are being all that they can be for some languages, a topic I will talk about more another day.
Another topic I'll hit on in a future post is some of the additional issues with cursive fonts.
(Special thanks to MVP 'Ted' for first pointing out this issue to me, and Peter Constable for his help in getting the understandable if not intuitive explanation together!)
This post brought to you by w (U+0077, a.k.a. LATIN SMALL LETTER W)
# SDiZ on 31 May 2006 5:02 AM:
# Phylyp on 31 May 2006 5:09 AM:
# Michael S. Kaplan on 31 May 2006 10:22 AM:
# Ruben on 1 Jun 2006 2:53 PM:
# Michael S. Kaplan on 2 Jun 2006 12:29 AM:
# Ruben on 2 Jun 2006 5:12 PM:
# Dean Harding on 5 Jun 2006 8:34 PM:
# Ruben on 6 Jun 2006 4:16 PM:
# Michael S. Kaplan on 6 Jun 2006 4:54 PM:
# Ruben on 7 Jun 2006 3:31 PM:
# Michael S. Kaplan on 7 Jun 2006 4:06 PM:
# Ruben on 8 Jun 2006 3:42 PM:
# Michael S. Kaplan on 8 Jun 2006 3:47 PM:
# Ted. on 9 Jun 2006 2:30 PM:
referenced by
2009/06/09 When keeping things on a level Plane[ 1] doesn't work anymore
2008/07/26 When the lines get too complex, the characters may go off script
2007/03/13 We need to be optimizing for more than just the simple cases
2006/08/11 Are ligatures supposed to be thought of as 'single characters' ?
2006/07/10 The PUA isn't complex enough
2006/06/29 Working beyond the BMP is going off script (according to GDI)
2006/06/16 Microsoft *does* support OpenType!
2006/06/16 Neither GDI nor Uniscribe solve the ultimate font problem completely, either