Every character has a story #11: U+???? (The Invisible Letter)

by Michael S. Kaplan, published on 2005/06/14 02:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/06/14/428797.aspx


The story today is about a character that has not been encoded in Unicode as of yet. In fact, it was brought to the Unicode Technical Committee (UTC) once already and been turned down. but it had enough of a history behind it that I thought it might be worth having a story told....

It all started with some great work from the Council for Information Technology (CINTEC), the Information and Communication Technology Agency of Sri Lanka (ICTA), and others on a document entitled SRI LANKA STANDARD SINHALA CHARACTER CODE FOR INFORMATION INTERCHNAGE. The second version of the document that we saw was submitted to the UTC at the 99th UTC meeting, held in Toronto on June 15-18, 2004. The document was very well put together.

If I recall, the only real point of contention that came up was a description in the document of how to handle a vowel sign (which is a combining character) without a preceeding consonant (to act as a base):

A vowel sign without an associated consonant may be displayed by preceding it with a zero-width non-joiner (zwnj) character. e.g. ා = 200C 0DCF (zwnj + ා).

Subsequently there were several examples given doing this. There were several people who did not like this idea, as it really did not fit with the conventional usage of the  U+200c (ZERO WIDTH NON-JOINER).

This is obviously not a new requirement; for as long as there have been combining characters, there has been a need to describe how to display them when no base character was present. There have always been two Unicode code points that have been recommended for this purpose:

However, for some time the SPACE has been a problematic choice here, due to the way standards such as HTML allow the removal of spaces preceeding or following text, and it was a huge burden on formatters and parsers to have to handle this scenario and finding a SPACE character that was being used as a base character to not be extraneous....

So the feedback was set to be given to the Sri Lankan NB to commend them for the excellent proposal with one suggested change -- to use U+00a0 rather than U+200c to work as the base.

The thorny issue of what to do with U+0020 was talked about for a bit -- clearly some text would have to drafted explaining that while in some situations it represents a reaonable choice, in others it was a real problemtic one due to the spaces being removed, and then we moved on to other issues.

And then, on the way to the WG2 meeting, Michael Everson, after consulting with Peter Constable, Rick McGowan, and Ken Whistler, put together a document.

The name? Proposal to add INVISIBLE LETTER to the UCS.

The idea was a dedicated code point to act as an invisible base character for precisely this situation. And there was some even some art, provided by Mr. Everson:

What is that smudge in the middle of the invisible whatever? Hmmm, lets look at 200%.

Hmmmm. Still can't make it out. But it does look like more than a smudge. Let's take a look at 400%.

Clearly text saying something. My eyesight is 20/20 OS and 20/25 OD, but maybe I am just tired. Let's blow it up to 800% and see what this says.

Geez Mr. Everson, I guess you really are a geek! I'm not sure I would have immortalized it in a proposal (or if I did I probably would not say anything until it was too late).

Anyway, it went to WG2 as an FYI (I was not there, but I heard that Mr. Everson had hinted that the text said something and the chair (Mr. Mike Ksar) did what I did and blew up the image on the overhead. He was not amused at the time. 2 x :-)

At the next UTC meeting (#100) in Redmond, WA on August 10-13, the proposal was reviewed and the decision was made to make it a Public Review Issue (PRI), #41.

Then at the next UTC meeting (#101) in Cupertino, CA on November 15-18, a motion to accept the encoding of INVISIBLE LETTER failed (3 for, 6 against, and 1.5 abstained).

And then, the next motion, to add the INVISIBLE LETTER to the list of rejected characters, also failed (5 for, 0.5 against, 5 abstained).

I won't comment on that set of votes other than to note that most decisions these days in the UTC are actually passed by consensus, not via a motion and a vote (for example at that meeting there were 37 consensus decisions and 7 motions). This was a pretty amazingly contentious issue! Maybe we were all blowing off some steam? :-)

Personally, I would have rather it had been a small Invisible Jet, instead. Ms. Bennett has long pined for a Wonder Woman analogue for ✈ (U+2708, a.k.a. AIRPLANE). What better location for an invisible jet than hidden in an invisible letter?

Or we could just go back to the actual work....

So, worthy of a story, right? :-)

 

This post brought to you by "✈" (U+2708, a.k.a. AIRPLANE)


# Jeremy D on 14 Jun 2005 9:20 AM:

That character has no chance to survive make its time.
Launch every 'glyph'.

...I don't know whether to be amused or sad that I thought of that.

# Michael S. Kaplan on 14 Jun 2005 9:42 AM:

To quote Addy Santo:

All Your Post Are Belong To Us! Someone Set Up Us The Blog!


:-)

# Maurits [MSFT] on 14 Jun 2005 11:45 AM:

Did Wonder Woman ever forget where she parked her invisible jet?

I suppose the invisible jet would share a grapheme with all other invisible characters...

# Michael S. Kaplan on 14 Jun 2005 9:59 PM:

It is funny, we always got to see the Invisible Jet, those white lines and all. It really skewed our view on things, I think.

I wonder how the tower gave her clearance?

# Tim Smith on 15 Jun 2005 12:14 PM:

*Must not make sexist (and thirdgrader) comment about where would they place the running lights...*

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2007/11/11 Not invisible, though maybe you can see through it!

2006/08/13 Snakes on an Invisible Plane

go to newer or older post, or back to index or month or day