Why is the NULL GLYPH not the glyph representing NULL?

by Michael S. Kaplan, published on 2007/06/19 03:31 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/06/19/3397122.aspx


So yesterday Judy and I were talking about the 'NULL GLYPH', which is what several of the typography folks I have known over the years use as shorthand for the glyph to show when the requested glyph is not available.

Though technically, that is not the actual glyph used when one puts a NULL (U+0000) into a text stream.

Which is another interesting point, because the glyph that is in the font for U+0000 usually isn't shown either -- it is usually stripped out (occasionally it is treated as the end of the string and it truncates the text, too!).

But anyway, the thing that many people (myself included) often call the 'NULL GLYPH' is actually the NOTDEF glyph.

Looking at Microsoft's Recommendations for OpenType Fonts for more info:

First Four Glyphs in Fonts

TrueType outline fonts should have the following four glyphs at the glyph ID indicated. These were listed in Apple's original TrueType specification. These glyphs are recommended to allow for the same version of the font to work on both Windows and Macintosh.
Glyph ID Glyph name Unicode value
0 .notdef undefined
1 .null U+0000
2 CR U+000D
3 space U+0020


Shape of .notdef glyph

The .notdef glyph is very important for providing the user feedback that a glyph is not found in the font. This glyph should not be left without an outline as the user will only see what looks like a space if a glyph is missing and not be aware of the active font's limitation.

It is recommended that the shape of the .notdef glyph be either an empty rectangle, a rectangle with a question mark inside of it, or a rectangle with an "X". Creative shapes, like swirls or other symbols, may not be recognized by users as indicating that a glyph is missing from the font and is not being displayed at that location.

Suggested shapes of .notdef glyph

Note that U+000d isn't really seen all that much, either.

And U+0020 is not the kind of thing you can really see (which was the whole reason that the MSKLC font that shows characters with no visible representation that I mentioned before proved to be so useful!).

Kind of ironic (in the Alanis or perhaps the Katie sense) that to technically be a valid TrueType font, a font is expected to include four specific glyphs, three of which are never actually going to be shown to users as the glyph in the font and the fourth of which is always shown any time another glyph can't be found.

But anyway, it is not the 'NULL GLYPH', it is the 'NOTDEF GLYPH'. I'll try to remember that going forward.

Otherwise, it reminds me a bit of Raymond Chen's classic Why do some people call the taskbar the "tray"?.

And I don't want to be wrong! :-)

 

This post brought to you by U+000d, a.k.a. CARRIAGE RETURN


# ph_arnaud on 19 Jun 2007 11:35 AM:

Hi,

Any chance in that the 'NOTDEF GLYPH' would be a truly last resort glyph, and the normally displayed one for such a situation would actually be a dynamically generated glyph in the future - for example a box with numbers where the contents indicate the Unicode codepoint requested but not found?

I don't think it is an original idea, I'm sure I saw this already on some other platform and thought it was very informative.

thanks,

# Roie on 19 Jun 2007 11:50 AM:

Wasn't U+0000 always called NUL (only one L), at least in ASCII? I vaguely remember being corrected over that once.

# Michael S. Kaplan on 19 Jun 2007 3:11 PM:

The notdef with the code point value is I think on Mac OS X, right now? It is an idea that I think has come up before....

# Michael S. Kaplan on 19 Jun 2007 3:13 PM:

The standard that gave three letter "names" to the C0 snd C1 control chars cetainly looked at it that way, though I doubt programmers ever did, really.

# Greg_H on 20 Jun 2007 12:06 AM:

I would argue that there is a conceptual difference between the NULL glyph and a NULL character. From TrueType's point of view, the NULL glyph is the glyph that gets mapped if a cmap lookup fails. A NULL character has what ever meaning the character set defines.

.notdef is PostScript's nomenclature, NULL glyph is TrueType's nomenclature.

Greg

P.S. Personally I like more interesting NULL glyphs.

# Michael S. Kaplan on 20 Jun 2007 12:21 AM:

Hey Greg,

Well, that both explains the way people have been referring to it and also makes the title more fun, to boot!

I agree with you and think the Easter Egg ones are much more fun, the TrueType info is pretty stodgy, IMO. Not everything has to be boring.

# Nick Lamb on 21 Jun 2007 5:43 AM:

“I don't think it is an original idea, I'm sure I saw this already on some other platform and thought it was very informative.”

Most Pango backends do this, with a simple outline box and four or six digits of hexadecimal to indicate which code point is unavailable. For tiny font sizes the box is wide and the code point written horizontally, for medium and large sizes it's arranged to be closer to the normal aspect ratio of a character. Because Pango has comprehensive fallback by default it will only show this when you have no fonts installed that supply a suitable glyph. Unlike a generic null glyph this is usually enough for technical support (or geekier users) to figure out which font is needed.

OS X does something pretty nice here too, it creates a glyph containing a sample symbol (one representative symbol from each major Unicode range is included in OS X for this purpose) in a rounded box with the code point written along the edge of the box in tiny text. This means users can see at a glance that the unreadable text is Han characters or Greek or fancy Mathematical symbols or whatever and if they zoom the text they can read which particular code point is missing.

The simple dotted box is OK for 8x11 pixel terminal character sets or embedded applications with very limited resources, but Pango and OS X have a clearly superior solution for typical applications like reading web pages, email or spreadsheets.

Anyway the boxed ? mentioned in the article seems worth deprecating in the Unicode era because we use a somewhat similar glyph (the diamond question mark) for the Unicode replacement character U+FFFD. It needs to be clear that /those/ characters are broken and can't be fixed by installing a different font.


referenced by

2008/03/19 The forensic typographers found no link to Lucida Console, and the D.A. had nothing to fallback to

2007/12/12 SiaO as The Red Carpet (aka Characters just want to be seen)

go to newer or older post, or back to index or month or day