Truly a font of absolute last resort

by Michael S. Kaplan, published on 2008/06/12 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/06/12/8592568.aspx


At long last, after considerable back and forth and hemming and hawing, Apple has made their Last Resort font available, and you can get it here.

Of course Mac OS X has had this for some time, but now it is available for everyone else....

You can tell that Windows did not have this font in mind in its antiquated font installation UI:

See that line of "A" characters --- well, that is kind of what The Quick Brown Fox Jumped Over The Lazy Dog looks like if shown only with this font.

It looks even funnier in the font viewer:

where you can see it in all kinds of different sizes.

I wonder whether Windows would ever really try to optimize this experience if the font was not being included on the platform.

I mean sure it looks silly, but putting in code to handle a font one doesn't include? Hmmm, that would be weird.

The text above the EULA on the Unicode site is a little odd in some respects, e.g.

Overall, there are a number of advantages to using the Last Resort font for unrepresentable characters.

  1. Operating systems are freed from the overhead of providing a full Unicode font.
  2. Users see something more meaningful than a black box for unrepresentable characters.
  3. Users familiar with the scripts being represented with the Last Resort font will readily identify what needs to be installed to represent the text.
  4. Users unfamiliar with the missing scripts are shown easily-identified symbols rather than lengthy strings of unidentifiable characters.

Of those four points, #1 makes no sense -- the "overhead" of covering Unicode is still there; if the earlier screen shots prove anything they prove the font is useless to convey information beyond script identity. And #4 makes no sense to me in the context of #2 -- what is the difference between "a black box for unrepresentable characters" and "lengthy strings of unidentifiable characters" exactly, in this context?

For Windows, the font is kind of useless unless it is put in a place that lets it act as a fallback; its not like it would be common to explicitly pick the font to look at unidentified characters to get the range, as a common operation....

And then there is the mildly scary information on the info in the font and where it comes from:

The Last Resort font includes glyphs for scripts proposed for future encoding in Unicode and ISO/IEC 10646. There is no guarantee that all of these scripts will ever be encoded in future versions of the standard; the script names are taken from the UTC/WG2 Roadmap dated 2001-06-14.

Especially since the roadmap has had subtle and in some cases not-so-subtle changes since 2001, from new ranges to changed sizes of existing ranges. I have to wonder whether

Assuming Apple provided the information for the page describing the font, all three of these choices do not show Apple in the best possible light, though there are obviously different levels and degrees of sin involved here. :-)

Over on The Unicode List, the font was discussed a bit.

Like when Jukka K. Korpela commented:

I think the idea could be explained a little more, since there is now some ambiguity and uncertainty. The idea as I understand it is to provide a glyph for every Unicode code position, using a generic symbol (a symbol suggesting the general nature of the code position, such as "Latin letter" or "noncharacter") together with additional information in "fine print". This means that it should only be used as a "backup" font.

Even if this will be clarified, I'm still puzzled by the technicalities. The LastResort font does not seem to behave like a normal font. For example, in MS Word's font list, it appears as a sequence of A's in boxes, which is understandable (in a way) but confusing. If I try to change the font of a piece of text to LastResort, it won't change. In WordPad, it changes. If I copy the text to Word, it stays in LastResort. Is this due to some special treatment of fonts in MS Word? If I have text in Latin, Greek, and Cyrillic letters in WordPad and I change the font to LastResort, the Latin letters turn to the LastResort Glyph but the other letters are unaffected.

I'm afraid some guidance on the intended use and typical problems in it is needed to make LastResort really useful.

Adam Twardoch replied to part of that comment:

> Is this due to some special treatment of fonts in MS Word?
No, this is because the font is buggy.

And Jeroen Ruigrok van der Werven responded to Adam:

>No, this is because the font is buggy.
You're the second person I hear about that actually.

While Andrew West had a slightly different opinion:

> You're the second person I hear about that actually.
I don't think that the font is buggy, but 387,471 groups in the Format 12 CMAP subtable is possibly a little suboptimal.

Now it may or may not be buggy, but I suspect the behavior of not being able to switch to the font has more to do with Word's behavior in allowing font choice. Also combined with the subrange support as you can see in the TTF Extension utility:

It looks like that hasn't been updated much either, suggesting that there are some real differences between what the font thinks it supports and what the OS will interpret it as supporting.

You could flip a coin to decide whose fault that is. :-)

Andreas Prilop looked at the same matter from a different point of view, once again blaming the font:

The font is clearly broken. You can see this easily in Internet Explorer. You can define the "last resort font" only for the Latin script, but not for any other script. This means that the 'OS/2' table in the font is wrong. It claims that the font covers only Latin letters.

Someone needs to fix the 'OS/2' table in that font.

But given the info in ttfext I think this might be an MLang issue, with it being unwilling to claim that a font pointing multiple CMAP entries to the same glyph counts a the CMAP entry being supported. This is not exactly like the MLang bugs I mentioned in The undocumented limits are the ones that can trip you; it is more an incompatible architecture decision that someone would explicitly have to decide to either call it a bug and alter or consider things to by design.

Don Osborn mirrored some of my own thoughts and concerns about expectations:

This is interesting, but at first I thought "last resort" meant that this was a font that covered a selection of characters that are under-represented among the most widely available and commonly used fonts (however that would be determined!).

If I understand correctly, what it actually seems to be is a way of having something more informative than an empty box or question mark when a character in a text is not included on any font installed on the user's computer. It somehow determines the appropriate character block in which the missing character is encoded and returns the symbol for that block.

If that is correct, it might help to introduce the concept at the top in this way. It also means that the first advantage - "Operating systems are freed from the overhead of providing a full Unicode font" - is not entirely accurate. You would still need to find and load an appropriate font for the missing characters (for the script[s] involved or even a full Unicode font if it comes to that), but the LRF lets you know what character block(s) needs to be covered.

I myself would settle for good instructions on how to integrate the font into the experience for various operating systems (there are special issues beyond the ones already mentioned on windows that I could perhaps cover another day.

Ed Trager mentioned an alternative that is available:

Hi, Everyone,

This might be a good time to note that SIL provides a Unicode "fallback" font under the Open Font License which displays the Unicode value in hexadecimal:

      
http://scripts.sil.org/UnicodeBMPFallbackFont

The SIL fallback font can be quite useful for debugging purposes.

and Simon Montagu added:

This also might be a good time to note that the next version of Firefox, due to be released some time this month, falls back to displaying the Unicode value in hexadecimal for codepoints not covered by any available font.

to which Jukka responded:

This presumably implies that if the Last Resort (or LastResort?) font has been installed, a glyph from it is displayed instead. And this is better for people with no idea of Unicode, as well as to some who have some idea. It might be worse for people who know Unicode: they will see just the generic glyph (which they might or might not recognize in its intended meaning, by intuition or by having learned it), not information about the specific code point.

To get the best of both worlds, the user should actually see the generic glyph with the code number embedded into it (or available on mouseover, but that goes beyond font issues). There's actually "fine print" in the glyphs, and I expected it to contain such information, and I was disappointed at seeing that it, too, is just generic. But of course the font would become much bigger if it had a different glyph for every code position.

This did not make  much sense to me -- couldn't this work be done algorithmically, producing the actual image on the fly based on the character? or would performance be that horrendous with such an approach? Any font experts want to weigh in on feasibility?

Wayne suggested:

I've found a bunch of problems with this font; the code that generates them seems to make a mess of certain groups.  cf. U+0735 SYRIAC ZQAPHA DOTTED has an example of the overlaying seen elsewhere, as well as incorrect boundaries in the rendering.

It's a useful fallback, though, if the rendering bug is found.

but this may actually be Uniscribe trying to do special things with complex script text, which brings up the perennial arguments of people who hate the Uniscribe shaping engine/typography split, as ll as those who want a way to set in a font that I AM NOT COMPLEX so that Uniscribe knows to stay out of matters.

Marion Gunn provided an amusing final comment on the font itself:

For my own part, I have yet to get it to work on my old (10.4.11) office Apple (or even to appear as a listed font on the usual fonts list inside pre-existing MSWord docs). Is that down to my fault/stupidity (entirely possible), or to some other reason? Appreciative of any intelligent replies sent to me  privately or via list

Amusing why? Well, as Apple employee John H. Jenkins explains:

Um, Mac OS X already comes with the Last Resort font.  Why would you want to install it?  (I would assume that the system sees that it conflicts with the installed LR font and ignores it.)

Now the silliest bit of silliness is the proposal N3412 that suggests encoding the glyphs used by this font.

To this proposal, Adam Twardoch replied:

I think the idea to add the glyphs used in the Last Resort font as new Unicode characters is absurd (I still don't understand if this was an April Fool's joke or not).

Those stub glyphs that are used as replacement glyphs for Unicode codepoints, or one could view them as representative glyphs for entire Unicode blocks. Why would glyphs used solely to solve an intermediate technical problem that is only a meta-problem of the Unicode encoding standard, the Alans Smithee of Unicode characters if you will, get an identity on their own?

Of course, if a new "Last Resort" block is added, then it will also need a Last Resort glyph for that block. Will that glyph also be added as a new Unicode character? If so, will it be placed inside of the Last Resort block itself? (That would be rather self-referential, wouldn't it? Even without it, it is.)

I cannot imagine any single occasion why a human user would want to explicitly key in a Last Resort character to exchange some encoded information.

You don't seriously wanna do it, do you?

to which Doug Ewell responded:

Some of the most respected people ever to be involved in Unicode support the emoji proposal, so I pretty much believe everything now.

Though thankfully, most people have seemed pretty uniformly against this notion's justifications in the proposal on the Unicode List so I won't say much more about it for now. But the link may be a source of amusement for you, the reader.... :-)

 

This blog sponsored by every bleeding character in Unicode!


Andrew West on 12 Jun 2008 11:41 AM:

The font version is "6.0d1e3 (Unicode 5.0.0)", and as far as I can tell the font does seem to reflect Unicode 5.0 fairly accurately. It seems to have separate glyphs for each block in Unicode 5.0, all characters in blocks that are new to Unicode 5.1 are mapped to one of the font's "undefined" glyphs (depending upon which plane they are in), and there do not appear to be any reserved characters that are mapped to anything other than one or other of the font's "undefined" glyphs.

I have been looking at this font in some detail over the past couple of days, and although there are some interesting rendering behaviour issues associated with it, so far I have not found any major bug in the font tables (though if you look at the copyright field of the Name table you may notice that the copyright sign is replaced by U+FFA9, presumably due to bad conversion of Mac encoding to Unicode). But I am ashamed to admit that the huge size of the CMAP table (387,471 entries mapping 388,232 characters in the Format 12 subtable) did uncover a rather horrendous bug in BabelMap (fixed in the version I released this morning), which I think justifies my statement on the unicore list recently that "if more applications were to calculate Unicode coverage of fonts by parsing the CMAP table themselves then more applications would be crashing than there are now"

Si on 12 Jun 2008 12:49 PM:

I thought the Unicode list comment that Firefox would start displaying codepoints in hex rather than missing glyphs interesting. I have a feeling that will make for some extremely broken layouts on multilingual web pages on some systems.

Mihai on 12 Jun 2008 4:25 PM:

A good example where the user being able to change the font fallback list would be nice :-)

Günther on 13 Jun 2008 5:43 AM:

Displaying boxes with the unicode code point in hex for missing glyphs is pretty common. My browser (Galeon using the Mozilla engine on Debian GNU/Linux) does it, as do all other programs I use, and it does not destroy the layout on for example http://www.wikipedia.org/. In fact, just now I notice that those boxes come in different sizes, presumably because the layout code knows some of the properties of the glyphs from unicode tables or something.

Si on 13 Jun 2008 2:57 PM:

"Displaying boxes with the unicode code point in hex"

Okay I must have misready the firefox mail - though they were talking codepoints, not codepoints in boxes (SIL style).

Cheers, Si

Henrik Holmegaard, technical writer, mag.scient.soc. on 12 Sep 2008 5:56 AM:

If there can be no 'Unicode font' since the character set supports more characters (+ 100000) than the CMAP supports (65536) then it follows that some sort of solution to suggest to the everyday enduser how to render incoming character specifications that are out of gamut for the current configuration of the SFNT-housed fonts in the folders watched by the operating system is needed. A rendering has to be assigned, and information about what rendering to assign is important, but one would wish that the industrial design of that information were improved.

How the fallback font for identifying character specifications is implemented is a minor matter when 500 million in the EU can neither identify their native character spaces nor their native character names in the Apple Character Palette and the Microsoft Character Map. It is a bug in the metadata model of ISO-IEC 10646 that English transliteration and English translation is required for long identifiers. The Character Palette addresses the issue of English transliteration, but not the issue of English translation.

The Last Resort Font, the command Show Character Selected in Application in the Character Palette, the Favourites dialog in the Character Palette ... these are like little journeys in a land that lies far, far, far away from Communication 596 issued by the European Commision on 22 November 2005 to introduce a policy on multilingual literacy and learning. Interestingly, when I search by typing Æ in the Microsoft Character Map in Windows XP, I get 'Character not found' which I assume means that Æ is not in the standard character set? At least the Apple Character Map is not as daft.

Best wishes,

Henrik Holmegaard

Michael S. Kaplan on 15 Sep 2008 3:30 AM:

????

On windows, the goal is to cover as much as possible and automatically set the font if the one chosen does not have the characters. No fallback font is needed....

As for daft, the search is by code point, not character. :-)

Henrik Holmegaard, technical writer on 18 Sep 2008 4:08 AM:

By the bye, technical information on the fallback functionality in SFNT Spline Font file format imaging is published in Wikipedia, see below. The limit on the code points in the obligatory CMAP Character Map is published in the TrueType Specification version 1.0 of 1990, the softcopy being Microsoft Word for Macintosh version 4.

Hopefully, Microsoft will publish the previous post at some point.

With best wishes,

Henrik Holmegaard

Reference:

http://en.wikipedia.org/wiki/LastResort#Apple.27s_LastResort_font

Michael S. Kaplan on 18 Sep 2008 4:31 AM:

I suspect that one day I might understand what the hell you are talking about -- though I doubt it will be happening any time soon (not enough English in your English!).

Again, please understand that the megafont is NOT Microsoft's preferred architectural approach here. So what you want is at odds with recommended best practices on the platform.

As for the "daft" comment before -- the Windows character map searches by code point, not character. Were you using it incorrectly?

Henrik Holmegaard, technical writer on 18 Sep 2008 5:09 AM:

> the Windows character map searches by code point, not character. Were you using it incorrectly?

No, I was not using the Character Map incorrectly. Follow these steps to learn why one would not want to use the Microsoft Windows Character Map to teach the character spaces of the official writing systems of the European Union in a primary school class or a secondary school class. Note that dansk is an official writing system of the European Union.

Begin by launching Tegnoversigt and checking Avanceret visning. As the implementation is such that searching is limited to the obligatory CMAP Character Map of the currently selected SFNT Spline Font, select an SFNT Spline Font in whose obligatory CMAP Character Map you feel confident that the character space you wish to work with is imageable. Then follow these steps:

Tegnsæt: Unicode

Grupper efter: Alle

Søg efter: latin small letter a [result is U+0061: Latin Small Letter A]

Søg efter: Latin Small Letter a [result is as above]

Søg efter: Latin Small Letter A [result is as above]

Søg efter: LATIN SMALL LETTER A [result is as above]

Søg efter: latin small letter æ [result is: Ingen tilgængelige tegn]

Søg efter: Latin Small Letter æ [result is as above]

Søg efter: Latin Small Letter Æ [result is as above]

Søg efter: LATIN SMALL LETTER Æ [result is as above]

Ingen tilgængelige tegn is plural and the English is No available characters.

In other words, outside the United Kingdom and the Republic of Ireland there is no EU teacher in no EU school who can use the Microsoft Character Map to teach the foundations of the intelligent composition model that separates processing of character specifications from processing of glyph shapings. This has been elevated into the Cultural Diversity Focus Group of the CEN standards track and is about to be elevated into the Ministries of Education further to Communication 596 issued by the European Commission on 22 November 2005, with Microsoft Corporation on copy as a matter of course.

With best wishes,

Henrik Holmegaard

technical writer

 

Maybe others understand all of this, I have no idea what you are talking about. As such, and as this is completely off-topic for this post anyway, I'm cutting you off now.

Sorry...

Michael


go to newer or older post, or back to index or month or day