Optimized for English (oh, and also Japanese, and maybe a few others)

by Michael S. Kaplan, published on 2006/08/14 00:11 -07:00, original URI: http://blogs.msdn.com/michkap/archive/2006/08/14/698304.aspx


The recent post about Are ligatures supposed to be thought of as 'single characters'? had a comment from RubenP that I thought could use some further conversation:

It must be said, but all the ClearType fonts with automatic fi ligatures look exceptionally bad for the sequence 'fij'; if you remember, the ij is quite frequent in Dutch, so that's a little troublesome.  (To me at least ;-)

But then again, the few fonts that contain a combining acute accent, hardly ever actually combine it with the j, and if they do, the accent is markably different from the accent on the (pre composed) i. Adding acutes to ij is actually something you'd want in Dutch (the acute is an emphasis mark and ij is a vowel; well a diphtong actually). But because of the very poor support for this kind of thing, even the official rule has become i acute + j, rather than i acute + j acute.

Oh, and how does one stop these ligatures from happening? For example, in Turkish? IIRC the fi ligature is a big no-no in Turkish typography, because you cannot distinguish it from f + dotless i.
With such silly things, I guess non-American digital typography still has a long way to go...

It is a fair point. What is often hinted at (like in Bill Hills's first post on fontblog) is that the two languages that got the most research and attention when it comes to ClearType and the many ClearType fonts are English and Japanese. And there id no shortcut to skip that research step....

It becomes obvious, when one considers the needs of languages like Dutch and Turkish such as those that RubenP pointed out, that not all of the Western Latin script languages were truly having their individual needs considered when the development of some of the so-called "C* fonts" took place.

The needs here are inded sometimes script-specific but more often language-specific. And it is way too easily (when adding features that might be thought to look good for one language) to unintentionally screw over another language. Not to screw it over too much, mind you. Just to screw it over about the usual amount, if you know what I mean.

It's not like you can change these defaults later -- imagine what it would do to page flow and formatting in documents if such a global change were made -- a backcompat nightmare, to say the least!

Perhaps, in retrospect, a more generic approach to these kinds of issues like the fi ligature could have been done in the C* fonts. After all, this is a lesson we already learned in Microsoft Sans Serif and Tahoma. But typeface design at its best is a much more organic process than trying to imitate another font.  So in the end if a particular feature is on by default in a font and that feature is not so good for your language, then perhaps using logic to come to the conclusion that this is not the best font for the language in question is in order? :-)

So while it is true that many people are excited about the optional language features in OpenType and the exciting readability of ClearType, I find myself much more excited about the next ten years -- when the work that has happened here can be further tuned to cater to the needs of even more languages than the ones for which ClearType is optimized now. And when the ability to work with optional OpenType features is available in products like Microsoft Word and Publisher. When the promises devlopered upon in technologies in Vista and Office 2007 are extended to cover so much more of the world....

In the meantime, my Visual Studio font is either Consolas or Courier New, depending on how much "Terminal Services to XP" work I have to do (since "ClearType over TS to an XP box" is not really quite there just yet!).

Makes for an exciting future, in any case. :-)

 

This post brought to you by and ij (U+fb01 and U+0133, a.k.a. LATIN SMALL LIGATURE FI and LATIN SMALL LIGATURE IJ)


# Adam on Monday, August 14, 2006 4:53 AM:

"It's not like you can change these defaults later -- imagine what it would do to page flow and formatting in documents if such a global change were made -- a backcompat nightmare, to say the least!"

Wha...?

I'm imagining what it would do to page flow and formatting in documents, and I don't see what the problem is. Some words get put on different lines, and some paragraphs become one line shorter or longer. So some table columns get a tiny bit wider and some get a tiny bit thinner. Very occasionally a paragraph may spill onto an extra page, or unspill from an extra page just before a page breaking element (new section header) causing all the following elements to be a page (or even two if the section element is e.g. always on an odd page) different from last time the document was viewed.

So what? Are you saying that our document processors can't handle that?

Last I heard, Word does that between different versions anyway, and can also do it in the same version depending on the printer/paper settings of the computer it's loaded on.

Heck, if my document processor gets a better layout engine, I *want* it to rearrange the document so its better than it was before. That's what it's supposed to do! That's the point. It's *supposed* to be automating these things.

Isn't it?

# Michael S. Kaplan on Monday, August 14, 2006 5:06 AM:

Hi Adam,

Actually, even one pixel differences can cause that sort of problem -- and if documents suddenly become hundreds of pages longer, the universe breaks as far as Word is concerned. They look at regressions here *very* carefully and treat them as huge bugs.

# Adam on Monday, August 14, 2006 9:51 AM:

"if documents suddenly become hundreds of pages longer, the universe breaks as far as Word is concerned."

*boggle* Why? What can't Word handle about this? What's wrong with it? Can it be fixed?

"They look at regressions here *very* carefully and treat them as huge bugs."

How is it a "regression" if the new rendering is *better* than the old? That's like saying that all the CSS fixes in IE7 are "regressions" from IE6. "Different" does not necessarily imply "going backwards".

Say, with the new processor speeds available, you figured out a way to add TeX's paragraph-at-a-time linebreaking/hypenation rules to Word in realtime - something that improved the "color" of the document and could reasonably be objectively regarded by professional typesetters as "better". That could make large documents "hundreds of pages longer", but wouldn't be a regression.

So how is fixing this sort of thing up a regression?

# Michael S. Kaplan on Monday, August 14, 2006 9:59 AM:

Hi Adam,

Word "handles" it by repaginating.

But if upgrading to a new version of Word means hasving to check and fix every document, then people don't upgrade. So the "fix" here is a break --- if the fonts change.

So the fonts, for the sake of back compat, don't break old documents.

# Adam on Monday, August 14, 2006 10:27 AM:

Sorry, having another thick day. What's wrong with repaginating? Why would you have to "check and fix" all your documents? How does this break a document?

# Michael S. Kaplan on Monday, August 14, 2006 10:43 AM:

Because documents can have lots of things that interact -- like figure labels on the same pages as the figures, pages that do not have just single lines on them, descriptions that are close enough to what they are describing, tables that line up, and so on.

Not everyone cares, but lots of people do. So it is a big deal to keep things working here.

# Adam on Monday, August 14, 2006 11:21 AM:

People do orphan control, "keep with previous/next", allowing breaks within tables, etc... *by hand* on 100+ page documents?!?

Wow! People are /weird/.

# Michael S. Kaplan on Monday, August 14, 2006 12:00 PM:

Most people do not understand how to use some of those advanced features in Word and many do not even know they exist.

I won't say people are weird, but I will say that people who do all that work don't want an upgrade to break them....

# RubenP on Monday, August 14, 2006 5:58 PM:

Well, I've known about the orphan control and such, but somehow Word still never seems to be willing to do my bidding. :-(

Plus, you do need to apply such tools dilligently, because adding a line halfway through your document can really screw up the partitioning of the rest of the document *because* of orphan control and "keep with next paragraph". Technically speaking, removing a word in TeX could potentially cause a longer pragraph than you started out with. (It's easier to get Word to behave on a 80+ page document, mind you. In other words: *very* hard.)

Still, for some reason, TeX seems a lot more robust in this sense (and others). Now only if someone could coerse TeX into properly supporting multiple columns and floating figures with text wrapping around them... OpenType in TeX is already well underway (XeTeX, IIRC), and there you actually do get control over each and every feature (such as complex scripts, ligatures, or explicitly *no* ligatures).

It boggles the mind when an ancient system like TeX can be extended to support OpenType even though it started out as a 7-bit system, but the only significant upgrade Word has gotten since Word 95 seems to be a spiffy new GUI. I know from good sources, that typographers were (again) not pleased with the 'new' Word.

Still, Word c.s. beat the heck out of the ancient TeX when you're writing a simple document, and the ease of use argument still holds. Just don't expect pretty typography: it's a word processor, not a typesetter.

# Adam on Monday, August 14, 2006 6:44 PM:

Ruben > "Plus, you do need to apply such tools dilligently, because adding a line halfway through your document can really screw up the partitioning of the rest of the document *because* of orphan control and "keep with next paragraph". Technically speaking, removing a word in TeX could potentially cause a longer pragraph than you started out with."

I still fail to see *why* having a longer paragraph just because you've removed a word is inherently a problem - especially if orphan control, "keep with next", etc... are doing their job properly.

How do things "screw up" exactly? How are you differentiating between a "screw up" and "the right thing" for the new text?

# Dean Harding on Monday, August 14, 2006 10:32 PM:

Adam: Imagine you have a 1,000 page document. If page 1 suddenly becomes one line longer, it will cause page 2 to reflow, and page 3 and so on. After a few hundred pages, you might be moving stuff down dozens of lines on each page - it's a snowball effect.

If you add figures and tables into the equation, you might find that the figure that used to be on page 547 is now on 548, but the description text for it is mixed in with the text on page 546 and 547 still. That means you'll have to go in and re-write that section so that the description of the figure is closer to the figure itself.

Now repeat that for all of the figures you have, all the tables you have, and the 1,000 pages and you can see how a one-line difference on page 1 is such a huge deal.

# Adam on Tuesday, August 15, 2006 3:43 AM:

"Imagine you have a 1,000 page document. If page 1 suddenly becomes one line longer, it will cause page 2 to reflow, and page 3 and so on. After a few hundred pages, you might be moving stuff down dozens of lines on each page - it's a snowball effect."

Um, no. *I* won't be moving anything. The reflow/layout engine will do that. That's what it does. That's what it's for.

"If you add figures and tables into the equation, you might find that the figure that used to be on page 547 is now on 548, but the description text for it is mixed in with the text on page 546 and 547 still. That means you'll have to go in and re-write that section so that the description of the figure is closer to the figure itself."

Or just mark the preceding paragraph "keep with next", the next paragraph as "keep with previous" and both "do not break". Reflow engine sorts it all out again, and the whole section is now immune from being screwed up against any further changes.

Your document is also now immune against screwups from being printed out on different paper sizes, or against weirdness if you decide to put extra whitespace above/below your chapter/section headings. (Given that with a long document you are probably more likely to worry about the content before playing with things like header style formatting, this is a valid concern.) Or if you've got a new image to replace diagram 34, but it's not exactly the same size/aspect ratio. In fact, the new diagram 34 is a better aspect ratio for what it's displaying than the old one was, and you don't want to squish it and spend ages trying to make it *exactly* the same size as the old one was.

Laying stuff out on a page is a completely automatable task, and has been since around 1978. Having a document reflow to fit new edits and technologies (e.g. better font rendering) is *how it's supposed to work*.

# Michael S. Kaplan on Tuesday, August 15, 2006 6:40 AM:

Hi Adam --

Having written a book in Word in the past, worked with publishers, and talked to them, I can tell you for fact that WRITING everything that way is not a natural task that people simply do. It is only something that is done when one is having trouble make a particular break work.

Once again, no one is doing it unless circumstances force it. And this what *development editors* of books are used to seeing.

I can understand that you may be a bit stubborn about accepting this one (esp. considering your single-minded traffic on the subject here!), but are you willing to concede that:

(a) not everyone does these things, and
(b) even those who do it may not do it all the time, and
(c) given (a) and (b), NOT changing font metrics avoids breaking users as easily?

# Adam on Tuesday, August 15, 2006 7:19 AM:

*grumble*

Yeah, you're right, of course. :)

Again, I am the weird one. I have to admit, I find writing the content first and sorting out presentation at the end more natural. But, as people have often told me, I do occasionally lose track of how much weight other people put on style. :-/

It is a pity you guys can't /ever/ update Word's layout engine to fix this sort of thing though - something I'd never quite realised before. That's a real shame.

# Michael S. Kaplan on Tuesday, August 15, 2006 7:48 AM:

Ah, but what could Word really do to fix things, in the layout engine or even anywhere else?

I do know that attempts to automatically set these attributes have been attempted in the past (though I am not sure how recently), and have tended to not work very well -- it is simply too hard to guess at the intent of the author in all of these situations....

# Adam on Tuesday, August 15, 2006 8:39 AM:

Sorry, I mean fix things like the font problem you were originally talking about.

Although they're not fixes, I did sort of mean to include enhancements like better line-breaking algorithms (e.g. as used by TeX) and possibly things like better kerning, etc...

However, I didn't mean "fixing" default settings like keep-with-next though. Although such parameters do affect what the layout engine does, I wouldn't class such a change as an improvement to the layout engine per se. That'd be the sort of thing you'd put in the document template.

# Michael S. Kaplan on Tuesday, August 15, 2006 9:32 AM:

Hi Adam, 

     Sorry, I mean fix things like the font problem you were originally talking about.

     Although they're not fixes, I did sort of mean to include enhancements like better line-breaking algorithms (e.g. as used by TeX) and possibly things like better kerning, etc...


Well, some things can be fixed, but an fi ligature will (for example) affect things in subtle ways if the ligature were to be removed. The key is that it may have to stay a ligature, tho they could perhaps make the spacing a bit different if they wanted it to act differently.

Temporarily they are saved by the fact that neither Word nor RichEdit is treating Latin as a complex script (the point I made in this post), so changes here would not break Word quite as much. :-)

# RubenP on Tuesday, August 15, 2006 6:03 PM:

In an ideal world, it would be programs like Word that would use the ligatures, not Notepad. I can't really think why the situation is reversed, to be honest. And in programs like Word, you indicate the language of the text. That way (because OpenType was designed that way, not because Office actually implements it that way, because it doesn't), the font designers could be told to not apply the fi-ligature for Turkish (that could be a rule laid out by the various localization experts), or to adjust the fi-ligature for Turkish, keeping the same metrics, but making the dot on the i visible again.

Still, that would be possible if Office actually supported the various OpenType features not related to complex scripts, but to typography. These f-ligatures are typographical, not semantic like the arabic ligatures.

Maybe in the next version of Word. Word 2011 if we're lucky ;-)

As an example that's even worse, many Adobe OpenType fonts implement a Th-ligature, to show off the kewlness of OpenType, I guess. But unfortunately, not many people think it's appropriate. Unlike the f-ligatures such as fi fj fl fb fh fk that were invented purely for readability, Th shouldn't really melt together in most fonts and it's even more English-centric than an fi ligature.

Oh, and try the automatic ligature example with German typographers! They believe that ligatures shouldn't happen between boundries of compound words, so Kaufläche does have an fl-ligature, but Kaufleute doesn't. Not even OpenType is going to help us here! A less WYSIWYG system like TeX has explicit provisions for this (because no-one in Germany has a keyboard containing the zero-width non-joiner, zero width no-break space or word joiner), so you'd type Kaufl"ache and Kauf"|leute or Kauf\/leute. It ain't pretty, though.

# John Hudson on Tuesday, August 15, 2006 6:33 PM:

Regarding the fi ligature in the sequence fij and suppression of ligtaure formation in Turkish:

The idea that fij looks bad must be news to Dutch typographers and type makers, who have been employing fi ligatures in their books for centuries. I have at had, for example, Jan van Krimpen's important _Memorandum over het maken van ontwerpen van Monotype-letters..._, which displays this typographic convention. In short, use of the fi ligature in Dutch is fijn.

Suppression of the fi ligatures in Turkish require support for OpenType language system tags and the Localised Forms 'locl' feature. It is unfortunate that such support has been considered a low priority by software developers. It should be seen as essential as other language shaping layout features such as those employed to ensure the correct forms of Arabic letters are displayed in the appropriate contexts. This is especially true in the case of something like the Turkish fi vs fı situation, in which there is the possibility of linguistic confusion.

I disagree, Michael, that font developers should be tailoring fonts to the limitations of current software, providing a generic approach that supports a broad range of languages at the lowest common denominator. The C* fonts are fine: they all have appropriate OTL language system tag structures for Turkish ligature handling. It is time for system and application developers to get off their butts and support this stuff.

# Christoph Päper on Wednesday, August 16, 2006 8:26 AM:

So Word, i.e. its layout engine, cannot be updated as it should, because it’s used by too many people who don’t know how to use it properly? The resulting dumbing down is a major reason for advanced users to do away with MS products completely in my experience.
IMVHO word processors should never have had drop-downs and buttons for font size, style and family except in style definition dialogs. Word’s styles side pane—whatever it’s called actually—works just fine.

Ruben, you’re right about German ligature rules—which also somewhat applied to ß (ss) until it became a true (lowercase-only) letter in 1996—and they make a lot of sense for any language in which compounds are written in one word, i.e. their components aren’t separated by spaces or hyphens as common in English for instance. Perhaps ZWJ and ZWNJ (or similar) should indeed be featured on standard keyboards, but I haven’t figured out yet where (and there are more important additions, like proper quote marks).

I’m awaiting the day an OpenType font is released that reliably replaces English th with thorn and eth respectively. I’m not seriously of course, but in some typefaces it makes sense or it’s at least a nice field for experiments and innovations to have every digraph (e.g. ph, ch, ck) ligated. (You could also derive w and y from u for languages where they are used as (semi-)vowels and from v otherwise, among other strange stuff.)

Dutch test: i+j: íj́; ı+j: ı́j́; í+j: íj́; ij: ij́, ij̋; ÿ: ÿ́, ÿ̋; y: y̋.

# RubenP on Wednesday, August 16, 2006 4:31 PM:

John: I think you got my point the wrong way. The fi-ligature is no problem with Dutch per se. It's the design of the ClearType fonts that is bad. If you look at the fi-ligature, you'll notice the top curve of the f gets an entirely different shape, and the dots on the i and j are quite a lot lower than the curve of the ligature ends. (Ack)
It's a bit "look at me, I'm a ligature". My personal opinion is that you shouldn't really conciously notice that there's a ligature.

Chris: I think we can agree these combining acutes don't really work, right? :-) It's a misunderstanding, though, that ij = ÿ; in Dutch handwriting, ij tends to take the shape of a round ÿ. In typing, it's always i+j. It's up to OpenType (and Word, dammit) to provide such a cursive ligature under language tag NLD *at the font designer's discretion*. Calibri has this feature under the 'dlig' substitutions (no wonder, as it was designed by Lucas de Groot, a Dutchman by birth). Shame on the Office team for not wanting to let anyone know.

# RubenP on Wednesday, August 16, 2006 4:36 PM:

Oh, there are two exceptions to the ij ligaturing that I know of (both of non-Dutch origin): bijoux [bi'Zu] and bijectie [bi'jEksi] (j is a consonant here, not part of the ij diphtong [Ei]). And let's face it, those are not words you use regularly anyway :-)

# Jean Yao on Friday, August 18, 2006 6:59 PM:

Please advise how to remove these ligature special characters in Word 2003? I have not been able to find anyone to show me the light.

Thanks,

# Michael S. Kaplan on Saturday, August 19, 2006 12:28 AM:

Hello Jean,

Well, you can use some other font, that doesn't define them?

referenced by

2008/02/22 Optimized for English (oh, and also Invariant, and NOTHING ELSE) Redux

2007/06/10 Did someone re-calibri-ate a font?

go to newer or older post, or back to index or month or day