Fight the Future? (#6 of ??), aka If we don't need it yet can we say it is a last resort?

by Michael S. Kaplan, published on 2008/04/06 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/04/06/8362202.aspx


Content of Michael Kaplan's personal blog not approved by Microsoft (see disclaimer)!
Regular readers should keep in mind that all I said in The End? still applies; the allusion to the X-Files continues for people who understand such references....

So the conversation is over 90 messages and it does not yet look like it is done. It all started with a brief piece of mail from Michael Everson:

Subject: N3412: Last Resort Pictures

Please see
http://www.evertype.com/standards/iso10646/pdf/n3412-last-resort.pdf

You can read the proposal if you would like. I will quote a bit from it here:

1. Introduction. The Last Resort font is a collection of glyphs which represents types of UCS characters. These glyphs are designed to allow users to recognize that an encoded value is a specific type of UCS character, a Private Use Area character, an unassigned character, or one of the illegal character codes. Apple Computer and SIL are two organizations which have shipped Last Resort fonts for some time now. Recently Apple decided to make its Last Resort font available for public distribution via the Unicode Consortium’s website (sometime after the publication of Unicode 5.1).

One of the principles of a Last Resort font is that is normally displayed when no other font is available for the character in question. Similarly, the Control characters encoded at U+0000-001F cannot be represented in text, since they are intended “do” things (even if few of them do much on modern operating systems. At 2400-243F the CONTROL PICTURES block provides glyphs for those characters so that they can be discussed and displayed in text. This facility would be as valuable for the Last Resort functionality as it is for the control characters. Accordingly, this proposal requests the encoding of a new block, U+E0200-E03FF LAST RESORT PICTURES, in Plane 14 of the UCS.

Paul Nelson of Microsoft was one of the many to comment:

"The multi-script researcher and Unicode contributor Michael Everson of Dublin drew all these glyphs for Apple." (http://developer.apple.com/textfonts/LastResortFont/) It would be great if the proposal had Apple as a contributor to the proposal.

> 5a. Are the proposed characters in current use by the user community?
> The glyphs are, but they are unavailable for use in text.

Why are we encoding these? Isn't Unicode a ***character*** encoding?

The proposal makes an assertion that encoding the characters will cause the Last Resort Font to be updated more quickly. My guess is that the act of encoding these characters and keeping the block current takes longer than updating the font.

I don't see the merit of encoding these glyphs as characters. It is better to have an informal standard for people who wish to make this type of font. I believe that using Apple's web site and keeping their repertoire current will be more effective than spending time standardizing them.

Good point. You might think by these words that Apple and SIL both support this proposal. Though John H. Jenkins of Apple had some initial thoughts:

I obviously can't commit the corporation to contributing to or cosponsoring a proposal without some internal discussion first, which is currently underway.

I would like to express a personal concern that the glyphs used are from Apple's LastResort font.  Apple isn't opposed to the font being publicly distributed through such channels as may be legally authorized to do so, but I'm personally uncomfortable with anything which would create the impression that this specific LastResort font is up for grabs and may be distributed/modified by anybody without permission from the font's owner.  (A similar problem would exist for Zapf Dingbats were Zapf Dingbat clones not widely available.)  Perhaps it would be prudent to use a set of glyphs which are distinctly *not* the ones which Apple uses.

This was followed by an official comment he put out there almost five hours later:

Apple uses these symbols exclusively as glyphs for characters which otherwise do not have glyphs and never as characters in their own right.  In light of the general policy of not encoding symbols which do not have an established use as characters in information interchange, we currently oppose this proposal and do not desire to be added as contributors or cosponsors.

Okay, so Apple is not in love with this proposal officially (I don't think we heard what SIL thought), and at least unofficially one person from Microsoft doesn't think it's Disco Lemonade either. Other comments from Michel Suignard and Peter Constable do not suggest a whole lotta love of this proposal from Microsoft either, and with one of them most likely doing the voting next time, this might border on evidentiary.

Anyway, it went on for over 90 messages, about one third of them from Mr. Everson, and most of the non-Eversonian responses ranging between "No" and "No, until there is more evidence of actual need" as summed up well by James Kass:

So the question is, are these symbols recognized by a sufficient number of people and is there a demonstrated need for them to be interchanged in plain text?  (OK, that's two questions.)

Clearly everyone following this thread recognizes these symbols, and that should be a sufficient number of people.  I look forward to seeing future versions of this proposal if the plain-text interchange need can be clearly demonstrated.

Other points, including the fact that this data already exists and could be generated with existing information on encoded characters and blocks in the Unicode Character Database, which led a few colleagues of mine to ask of me off-list questions like:

btw, what the hell is Michael Everson's deal? I've been following this with half an eye and he seems to be going off the deep end

That was probably the most polite one of the bunch. :-)

In truth there isn't anything wrong with him. It actually reminded me of Rodney Dangerfield's character's response to Sam Kinison's character in Back to School (YouTube clip here, it is at the end):

Good teacher. He really seems to care. About what I have no idea.

Because when everything is treated with equal intensity -- the only differentiating factor being the amount of apparent opposition from others -- it is left to other people to do the mental and actual triaging of what is most important to do. Which is why he has told me in public that he disagrees with the opposition to the Elvish scripts or Klingon (my opinions were given in Fictional could make things less functional) and he sees no reason for the former to be put off or the latter to be denied.

Makes me almost wish some company like Microsoft would hire him so that presumably his manager could help set the priorities on these items (since Ireland doesn't). :-)

Remember that I am not speaking for anyone else here.:-)

 

Nothing in Unicode wanted to support this blog about characters that they too deem unworthy of inclusion in Unicode....


John Cowan on 6 Apr 2008 1:50 PM:

You talk as if being hired were something that *happens* to you.  Michael hasn't got a job because, very plainly, Michael doesn't want one.  No job could help him serve his goals.  No company can possibly serve the breadth of concerns that Michael serves: the entire Fourth World community of minority literacy.

And if Michael has a bee in his bonnet over this and some other projects, well, it's preferable to complete cerebral inactivity.

Michael S. Kaplan on 6 Apr 2008 1:59 PM:

This is understandable, though the lack of triaging of what he pushes and how, his dismissive manner of proposals that come from people other than him, and many other factors in his approach work against him (and to some extent against Ireland) in both WG-2 and UTC.

He could take steps to improve that situation, and chooses not to. And that works against the interests of that entire Fourth World community of minority literacy. If he wants to serve them and he could do so better but chooses not to, one has to wonder why he never tries to do so....

Leo Davidson on 6 Apr 2008 5:51 PM:

Have I understood correctly that there is one range (U+0000-001F) of "control" (do-things) characters in Unicode and another range (U+2400-243F) that is there to provide symbols for those characters? (And presumable some other things as it's a bigger range.)

That seems fairly silly to me. If people want to discuss the do-things characters why can't they just use their names? Why do we need symbols for everything? Should we have Unicode characters for every person's name as well?

What if someone wants to discuss the characters in the U+2400-243F range, should we have another range of meta-glyphs for those, and another range of meta-meta-glyphs for those, and so on?

Perhaps I have completely misunderstood things -- I hope so! -- but it seems crazy to me.

As a programmer it really bugs me that UTF-16 was not enough and we now have surrogate characters (or UTF-32) to worry about. UTF-16 seemed good compared to UTF-8 because we didn't have to deal with the complexity of coding to support variable sized characters, as well as the even bigger complexity of working out kind of support is in all the APIs and libraries we use. To learn that we now do have the complexity of UTF-8 is sad, but one might think "oh well, for languages like Japanese it is still more efficient to use UTF-16 than UTF-8 so UTF-16 is still a better encoding for the world to use, and I guess they just ran out of characters so that's life..."

...But to continually read that the 65,000 odd characters in UTF-16 appear to have been squandered on really, really silly things (like meta-glyphs and fictional/long-dead langauges only used by a handful of specialists) really gets my goat.

Surely the bugs, complexity and general hassle caused by programs and programmers having to (and failing to, each in their own special way) support variable-length characters is orders of magnitude greater than the convenience factor of having a standard encoding or glyph for some thing that about 50 people in the world will ever use. :-(

I mean, would it really matter if scholars studying ancient hieroglyphics had to use the equivalent of a code page or private character range + custom font? Would the minor hassle caused to them really outweigh the massive hassle caused to every single other computer user on the planet because of the huge complexities (and thus bugs) of surrogate characters? Do scholars studying Meroitic really need the ability to copy and paste that stuff into a shopping list in Notepad? I'm guessing not...

I know I'm complaining after the horse has already bolted from the barn but... Well... Argh! Why's it gotta be the way it is? :( :-)

Andrew West on 7 Apr 2008 5:29 AM:

Michael working for Microsoft ?! Exactly how would that work? If I've learnt anything from your blog (especially in recent weeks) it is that Microsoft stiffles all free thought and actively discourages people from doing anything outside their strict job description (remember your India trip?). We'd never see another proposal out of Michael if he ever got ensnared by Microsoft.

Michael S. Kaplan on 7 Apr 2008 8:48 AM:

Well, in fairness what I am going through is by no means Microsoft policy -- it is the people managing me, but there are plenty of places within Microsoft where such things are valued and even expected....

Michael S. Kaplan on 7 Apr 2008 8:52 AM:

On the other hand, Michael has made enemies over the years, just like me. Given the perceived tendency of him to act like a man with a mission who can't be bothered to worry about things outside his expertise, it is almost like he is a younger version of me (since that is how I used to think about things before I learned that things I know nothing about are important to me beyond the obstacles they present)....

John Cowan on 10 Apr 2008 10:25 AM:

ME can be crude and insensitive, like most people at one time or another.  He is so more than most, perhaps.  In any case, we don't have a replacement for him and aren't likely to.

Leo: The Control Pictures are actually in use by line protocol analyzers, which need to display control characters rather than acting on them; that's why they were encoded, not out of some abstract desire to have symbols, but because symbols were in fact already in use.  Their cost in BMP space is trivial.  The real culprit is the excessive encoding of CJK characters.

Michael: If what you are undergoing is not Microsoft policy, it also apparently is not Microsoft policy to prevent it, which means that in effect you have a feudalism -- you are a vassal with a bad lord and no effective appeal to the overlord.  Which sucks.

Michael S. Kaplan on 10 Apr 2008 3:02 PM:

In the end his work would be more effective if he either started letting people help out with prioritization OR if he just learned it himself. Unfortunately he has proven to not be receptive to either idea....

Agree++ on the feudalistic aspects of this, FWIW. I love the work (especially the community aspects of it), but not so much the job, these days.

John Cowan on 17 Apr 2008 9:38 AM:

/me alludes vaguely to a tale in which a contractor employed by Company B was working on-site at Company A and did something semi-public that upset a lot of A employees.  An A manager called up a B manager and said "Eek."  The B manager spoke harshly to the B employee and said "No dog biscuit for three days!"

Many more A employees, including one of the most prominent of the original protestors, spoke up at this, with result that another A manager spoke firmly to another B manager and said "Fix it."  Fixed it was forthwith; dog biscuit restored.

Feudalism doesn't rule everywhere.

Henrik Holmegaard, technical writer on 12 Sep 2008 5:49 PM:

"In light of the general policy of not encoding symbols which do not have an established use as characters in information interchange"

Accessorising affects the architecture of device independence, for instance, advocating control characters for ligation (which makes e.g. HTML unsearchable as per Appendix F of ISO-IEC 10646:2003), or advocating device links in applications for colour correction (which makes it impossible to match to the monitor as per ISO 15076).

Could the cause of such challenges be that the Unicode community and the ICC community have no fully formed language of device independence. It is advisable that the character connection space and the colour connection space be as Euclidean as possible, and that the intelligent composition model and the intelligent separation model be as modular as possible.

Whatever is at work here, it's interesting to watch.

/hh

Michael S. Kaplan on 15 Sep 2008 3:33 AM:

Huh? Unicode does not encode color, and many implementations can ignore control characters in search. As usual, you are being verbose about things that make no sense. :-(


go to newer or older post, or back to index or month or day