The difference between Six Sigma and Sigma Diaresis is one must never fail; the other seems to do so by default

by Michael S. Kaplan, published on 2008/09/29 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/09/29/8968429.aspx

As the regular font ninjas and font experts and font mavens will likely be quick to agree with me about, I am not a font maven, expert, or ninja.

But every once in a while I can contribute productively. :-)

The other day, over on the VOLT users community forum, an interesting question was raised, by JGlavy:

Greetings,

I'm trying to make a font opentype friendly toward using Greek letters with non-canonical diacritics (such as used in Arvantic and Karamanli, etc). I can't even get the dieresis (combining, or otherwise) to keep from colliding with the capital Sigma. I've tried work-arounds such as precomposing a Sigma with dieresis but can't get it to output the precomposition when typing Σ and ¨ . I can make up some completely arbitrary letter like S with ¨ and then link it to the precomposed Sigma-with-dieresis and THAT will output properly....but only with Latin letters.

Any advice on how I can resolve this...or is this one of the Uniscribe limitations? It does seem that the Greek range won't let me put any diacritics over Capital or lower case that are as tall as capital letters.

Please help

JGlavy

now of course the first part of this is easy and any OpenType maven can get into it -- the liga or Standard Ligatures OpenType feature, which you can read about here:

Tag: 'liga'

Friendly name: Standard Ligatures

Registered by: Microsoft/Adobe

Function: Replaces a sequence of glyphs with a single glyph which is preferred for typographic purposes. This feature covers the ligatures which the designer/manufacturer judges should be used in normal conditions.

Example: The glyph for ffl replaces the sequence of glyphs f f l.

Recommended implementation: The liga table maps sequences of glyphs to corresponding ligatures (GSUB lookup type 4). Ligatures with more components must be stored ahead of those with fewer components in order to be found. The set of standard ligatures will vary by design and script.

Application interface: For sets of GIDs found in the liga coverage table, the application passes the sequence of GIDs to the table and gets back a single new GID. Note that full sequences must be passed.

UI suggestion: This feature serves a critical function in some contexts, and should be active by default.

Script/language sensitivity: Applies to virtually all scripts.

Feature interaction: This feature may be used in combination with other substitution (GSUB) features, whose results it may override.

Okay, that seems easy enough.

Update 30 Sep 2008: Actually, turns out the ccmp feature is better for this then the liga one; you can see the comments for details on why. I think I mentioned, not a font ninja? :-)

So all I need is a SIGMA and a DIAERESIS and I am done, right?

Well, not quite. And this is where I can become more useful, since we are moving into a place where I am (on a scale of ONE TO NINJA) going to place higher than I do when it comes to typography -- keyboards. :-)

So we'll back up. We assume that JGlavy knows about the font stuff since he is talking about several things intelligently that I don't fully grok (this is admittedly using the same principle I suggested here that Dale had an even better citation for!).

And we look at what is being done:

I've tried work-arounds such as precomposing a Sigma with dieresis but can't get it to output the precomposition when typing Σ and ¨

The important question here -- what is the ¨ here?

The liga entry has you include individual Unicode values. But almost no keyboards include U+00a8, aka DIAERESIS, except in the case of dead key combinations when no valid pair of characters is found or the broken Romanian keyboard is used.

Dead keys will, like fonts, use the precomposed characters.

Unicode would want you to use U+0308, aka COMBINING DIAERESIS. So you would type Σ (U+03a3, aka GREEK CAPITAL LETTER SIGMA) followed by that U+0308. Only it does not usually tend to exist on keyboards unless you include it in a custom keyboard you create!

Now of course the liga feature in OpenType has no higher knowledge of what we'll call "stupid sequences" so in theory nothing stops you from programming any sequence in and then it, when it notices such a sequence, using it. But in the interests of test that is correct and is easy to do searches on later, let's try to get the right sequence in the text stream -- U+03a3 U+0308.

You'll need to have an input method that lets you type it (or build your own, a-la-MSKLC!), and the font that does the magic so you can see the dots above the sigma.

Now, changing tracks a bit, if you use that sequence without help, you get Σ̈ and you can maybe make out the dots hiding behind the sigma. I wonder whether the habit of not including diacritics on the capital letters in Greek had more to do with the fact that there was nothing there to make them shape more harmoniously together (ref: Sometimes, uppercasing sucks), and that maybe if the right glyphs were added here then everyone would be okay with the diacritics.

You could then solve it all with some liga entries in your font.

Now the bonus of this supposition is that even if I am dead wrong, you can still solve the problem I mention in Sometimes, uppercasing sucks with the liga entry, simply using the letter without the diacritic as the form to show when combined with the capital letter. Thus being wrong is no blocker to me having provided the correct solution. :-)

This blog brought to you by all of the previously mentioned characters in this post

John Hudson on 29 Sep 2008 10:15 PM:

<liga> is not the correct layout feature for this sort of substitution; for this and other one-to-many substitutions involving precomposed diacritic combinations you should use <ccmp> 'Glyph Composition/Decomposition'.

Alternatively, you could use GPOS mark attachment positioning.

Michael S. Kaplan on 29 Sep 2008 10:29 PM:

When you say "not correct", what are the consequences? Or what are the benefits of using <ccmp> ?

Andrew West on 30 Sep 2008 4:45 AM:

<liga> is not correct because it is intended for composing ligatures (e.g. for composing a ligature of the letters ffi). On the other hand <ccmp> is intended for general purpose glyph composition or decomposition, including substitution of a sequence of base character plus combining mark or marks for a single glyph. The benefits of using <ccmp> is that it applies to all scripts, and it is always the first OT feature to be applied (which means you can get all your glyph composition/decomposition out of the way first, and then do other things with the resultant glyph stream); whereas <liga> only applies to "virtually all scripts", so may not work in all cases.

Michael S. Kaplan on 30 Sep 2008 10:22 AM:

Good enough for me. :-)

I added a note about this in the blog. Thanks guys....

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day