Unicode [prematurely?] posts the rules of how to handle the display of what you do not (in fact) support the display of

by Michael S. Kaplan, published on 2007/10/11 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/10/11/5398306.aspx


I am not speaking for Microsoft here, by the way, and I have no idea how they feel about the issues here.

First some definitions:

UNSUPPORTED -- adj, not sustained or maintained by nonmaterial aid; "unsupported accusations" 

UNSUPPORTABLE -- adj, not able to be supported or defended

UNICORN -- n, A mythical creature resembling a horse, with a single horn in the center of its forehead: often symbolic of chastity or purity.

UNICODE -- n, Encoding standard that has stated rules on how to support the unsupported that in my opinion will ultimately prove to be unsupportable.

It was over a month ago that I posted A dash of this, a dash of that (aka MS v. Adobe? Shaping engines v. fonts? Typographers v. Unicode? Everyone v. everyone else!), in which I pointed out that I really didn't care much for the position that Mark Davis was taking which essentially was to dictate what fonts should be doing in several cases where they do not support some particular characters.

In other words, a list of requirements was given to describe how to support what a person has explicitly decided not to support....

Let's ignore the logical problems with telling people to spend time supporting the things they have chosen (perhaps explicitly) not to support. And further let's ignore statements made in ignorance about the nature of fonts (e.g. "no size burden since the many characters can be mapped to the same glyph" which ignores the fact that the pointers themselves are a potentially unacceptable size burden!).

Having anyone in the position of working for companies that do not produce fonts stating opinions on what fonts or typographers should do (as fact) just strikes me as not the most sensible strategy.

I think it is fair to say that Mark either did not see my criticism or did not agree with it.

Because a new FAQ page has been posted (which lists that the Q&A was provided by MD and KW -- Mark Davis and Ken Whistler), without so much as a UTC meeting or really even a Unicore conversation that I recall seeing that was anything other than enough people disagreeing with some of these opinions that it should have been left to the UTC in my opinion, entitled

Display of Unsupported Characters

Take a look at this table of contents at the top of the document:

  • How should characters be displayed if the rendering system doesn't fully support them?
  • Which characters should be displayed as a visible but blank space?
  • Which characters should be displayed with a missing glyph?
  • What about default ignorable code points, then?
  • So which default ignorable code points should be invisible, if not supported?
  • What about unsupported variation selector sequences?
  • Are there any other special cases?
  • How does the recommendation not to give any visible display for a subset of default ignorable code points affect font design?
  • Does that mean that a font can never display one of these characters?
  • Wow, maybe they should have named the page: Unicode to Typography: You're My Bitch Now (ref) and just made the title as blatant as the page's expectations of an industry that has their own practices and has run its businesses long before Unicode even existed (all without being told their responsibilities for supporting what they do not support!).

    In fairness, they have a fair bit to dictate to rendering engines in this FAQ, too. I guess they want to tell everyone what their jobs are. :-(

    But the main justification is the inadequacy/inconsistency of handling the situation with fonts. There is an underlying theme that fonts do not act consistently here and therefore higher level layers like shaping engines should not respect the fonts here, even though those shaping engines suffer fron the same history of error and inconsistency. It is unsurprising (though quite diappointing) if representatives whose companies do not produce fonts will choose to not side with the fonts in such a situation. Which is clearly what appears to have been done here....

    In my opinion, James Kass said it best in his response to the very first FAQ point:

    Display issues such as these are beyond the scope of a character encoding standard.  Suggestions are fine, defining expected behavior is not.

    I guess maybe Microsoft doesn't have to mind so much since a lot of the FAQ matches what they do anyway, or at least try to. But the balance is not the same on all platforms so acting like it is ignores the strengths of other solutions to rendering problems.

    Because of this, not all of the font folks seem to think that this FAQ is appropriate - for example, James Kass (who also spoke out against Mark's initial words) posted a critique of several points in the FAQ on the Unicode List.

    Presumably the basis of this FAQ (and the logic by which it was posted without a UTC meeting talking about it) is that all of the points existed elsewhere in the standard and it that has never been collected in one place. However, it seems clear from some of the reaction that not everyone agrees with this not-too-well-communicated existing information. If one argues the information was not well-known and required an FAQ, it is equally possible to argue that not everyone agrees with the points therein.

    But since not all of the answers have citations to the appropriate places in the standard, it appears that perhaps some of it is new information (like lots of points in the Indic FAQ, which are argued in UTC).

    So the FAQ is either inappropriate (requiring UTC consensus?) or incomplete (requiring more citations?). Or both.

    Given that, posting what appears to be opinions as FAQ without that formal set of conversations is about as fair as me posting my opinions on the topic here. :-)

    And that isn't fair, at all (even when I do it, as I am doing now!). But then I am not running an internationally cited standard maintained by an industrial consortium, either. I am a lowly employee of a company with over 80,000 people on the payroll, skating the thin edge of executive tolerability on a regular basis.

    So I am allowed to do things like this; Unicode and its representatives are supposed to be better than that....

    I won't be at this UTC (I will be at the speaking at the co-located Internationalization and Unicode conference), which might be a good thing since I might be a lot more negative than Microsoft might be in this case, and when I'm in the meeting I have to represent Microsoft. But I hope there is some discussion there about what I truly think was not a smart document to post in its current form.

     

    This post brought to you by    (U+225f, a.k.a. QUESTIONED EQUAL TO)


    # Mikkin on 11 Oct 2007 4:06 PM:

    Unicorns indeed! Specifying how to support the unsupported verges on a Russellian paradox.

    # Andrew on 11 Oct 2007 7:07 PM:

    I love the first question.

    How do you actually create a font that has different behaviors not only across different rendering systems, but also across different versions of the same rendering system?

    # William Overington on 12 Oct 2007 2:50 AM:

    > Unicorns indeed! Specifying how to support the unsupported verges on a Russellian paradox.

    I had not heard of a Russellian paradox and its relation to unicorns until reading about it in your blog.  I have looked it up on the web and started to read and learn.

    Here is something related to a particular type of unicorn which I wrote some years ago, which might hopefully be of interest.

    http://www.users.globalnet.co.uk/~ngo/euto0008.htm

    The illustrations of the software unicorns were produced using some Microsoft clipart that came with Microsoft Office 97 with the manipulation of the layers in the more complex picture being done using Microsoft PowerPoint.

    The story is one of a collection which is indexed at the following page.

    http://www.users.globalnet.co.uk/~ngo/euto0000.htm

    There is also a screensaver, also produced using some Microsoft Office clipart, in conjunction with a piece of screensaver-producing software.

    http://www.users.globalnet.co.uk/~ngo/euto2001.htm

    Maybe some software unicorns are charging about!

    Yet a castle of software

    May fall to the ground

    If over its drawbridge

    Their golden hooves pound!

    As to the design of the .notdef glyph, I use my own design.

    It is in various of my fonts, for example the following.

    http://www.users.globalnet.co.uk/~ngo/CHRONTXT.TTF

    Some of my fonts have a .notdef glyph which is a variation on my original design.

    http://www.users.globalnet.co.uk/~ngo/fonts.htm

    William Overington

    12 October 2007

    # Andrew West on 12 Oct 2007 5:35 AM:

    I'm 100% in agreement with you on this one.

    Q: Which characters should be displayed as a visible but blank space?

    A: This is the easy one: all the characters that have the White_Space property, also generically known as “whitespace characters”. This set includes SPACE, of course, but also such characters as the tab control character, NO-BREAK SPACE, LINE SEPARATOR, and so on. For the full list, see the White_Space values in PropList.txt.

    Easy !? Yeah right. If it was so easy MD wouldn't be rushing to next week's UTC meeting to ask for a change in the general category of U+1680 OGHAM SPACE MARK (something that is not done lightly). Until and if U+1680 has its White_Space property unceremoniously stripped from it (Unicode 5.1 next Spring at the earliest) then this FAQ answer is just plain wrong. Talk about putting the cart before the horse!

    # James Kass on 12 Oct 2007 11:19 AM:

    Unicorns are pretty cool.  No comment on chastity.  When

    "Russellian" is mentioned, I think of Jane.  Is that a paradox?

    Here's a possible alternate response to FAQ-How should

    unsupported characters be displayed?

    A.  The display of unsupported characters depends on the system.

    It should be noted that, if the system does not support the

    character, the character will probably not display well.


    go to newer or older post, or back to index or month or day