If it's not in Unicode, don't expect it in Windows

by Michael S. Kaplan, published on 2006/01/06 06:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/01/06/509784.aspx

I was reminded the other day of a scene from News Radio where a woman was telling David Foley about how "That's ironic. You know, like rain on your wedding day?", to which Dave replied "No, that's not ironic; that's unfortunate."

It also reminded me of the conversation about the difference between regular ironic and alanis ironic that it inspired with a linguist or two that I know.

It all came back to me when over in fontblog, I read as Kevin Larson introduced the Irony Mark.

Now although there is a French Wikipedia entry for Point d'ironie, I want to make sure everyone understands that we (Microsoft) don't add characters to Windows unless they are in characters that are in Unicode.

Of course that article suggests the possibility of using U+061f (ARABIC QUESTION MARK) for this character, but John Hudson points out in a comment to the fontblog post that this can cause other problems due to RtL and LtR context issues. Not to mention that it won't look like the character in question.

I just didn't want anyone to assume that there were random groups at Microsoft trying to introduce characters that were not in Unicode into our fonts -- because with the exception of the various symbol fonts (which use the Private Use Area), this is something that we try to stay out of as a company.

Even when we are talking about really important characters like Wonder Woman's INVISIBLE JET! :-)


This post of course brought to you by "؟" (U+061f, a.k.a. ARABIC QUESTION MARK)

# Vorn on 6 Jan 2006 12:57 PM:

Man, what am I gonna do when I need a glyph for the Invisible Jet, then?


# Michael S. Kaplan on 6 Jan 2006 12:59 PM:

Hi Vorn!

Yes, a question that countless throngs wonder about.... :-)

# Maurits [MSFT] on 6 Jan 2006 2:20 PM:

Here you go

# Maurits [MSFT] on 6 Jan 2006 3:47 PM:

Well, Unicode does already include the interrobang... and there's a proposal to include the gnaborretni:

Don't we also need:

* An Arabic/Spanish question mark? (backwards ¿)
* An Arabic interrobang? (backwards ? + !)
* An Arabic gnaborretni? (backwards ¿ + ¡)

# Michael S. Kaplan on 6 Jan 2006 6:16 PM:

The proposal was turned down, and there has been no interest expressed in the others....

# Maurits [MSFT] on 6 Jan 2006 7:52 PM:

Re: jet: All invisible things look alike؟ (note use of ironic point)

Let's talk about horizontal symmetery for a second...

There's evidence that Arabs use exclamation marks, even though there is no ARABIC EXCLAMATION MARK sequence. I suppose they're just using U+0021, since it has horizontal symmetry?

But commas and semicolons do have their Arabic counterparts...

This seems to be pictorial evidence of a ?! sequence in Arabic, unless I'm misreading a character that looks like ؟


The article talks about the prevalence of ؟ and ! in Arabic journalism... maybe I'll take a trip over there and push the interrobang؟

# Michael S. Kaplan on 6 Jan 2006 7:56 PM:

There is of course no evidence that they need or want *this* character -- which is the real point of it all. There is no actual user community looking for it other than people who are bored and want goofy characters added. :-)

# Michael S. Kaplan on 6 Jan 2006 7:58 PM:

Some would call using U+0020 as the INVISIBLE JET an over-unification. :-)

# Jan Billing on 6 Jan 2006 8:04 PM:

I once spent a little while making a Unicode version of Wingdings that used the Symbols and Dingbats area of Unicode plus Private Use Area for anything that didn't fit. I've no idea why I did it, since of course I can't legally distribute it. Oh well. At least I have it on my computer, so I can feel better about using the symbols in it!

# Gabe on 8 Jan 2006 3:52 AM:

That French Wikipedia article was pretty cool, if for no other reason than to see what the French call their punctuation.

For example, what we call a comma they call a virgule. This is interesting, because in English a virgule is a slash, and it turns out that a comma and a slash have common origins. The French word for slash is "barre oblique", which I assume is basically just French for "angled bar". A second instance of this sort of thing is that their name for a pilcrow (the backwards-P we use for a paragraph mark) is called "pied de mouche", while the word "paragraphe" is used to describe the section mark (vertically stacked double-S).

Another interesting difference is that their bracket ([) is called a crochet. I thought it was odd that they would name a bracket after a method for fashioning afghans until I realized that crochet probably means "hook".

The biggest thing I don't understand is that their braces ({}) are called "accolades".

Only apostrophe, point (period), and prime (a tick mark) properly round-trip between English and French.

# Dean Harding on 8 Jan 2006 5:43 AM:

> Some would call using U+0020 as the INVISIBLE JET an over-unification

That's right. There's a big difference between empty space and something that just *looks* like empty space...

# Michael S. Kaplan on 8 Jan 2006 9:55 AM:

Hi Dean, Well, Wonder Woman would certainly tend to think so if she could not find her plane on the roster with her special font to let her see her it!

# Pinocchio Poppins on 15 Mar 2006 8:23 PM:

So what about scripts of minority languages? The Unicode Consortium has been glacial at adding scripts to Unicode. See http://www.linguistics.berkeley.edu/sei/

# Michael S. Kaplan on 15 Mar 2006 8:43 PM:

Well, glacial is a relative term. And there is also ISO 10646 here as well.

But if a system is based on Unicode, there is no mechanism to randomly add characters that are not....

I am a huge fan of some aspects of the SEI, and I am almost violently against some other aspects. Not something I would necessarily blog about, but....

referenced by

2008/03/13 Yet another face to irony

go to newer or older post, or back to index or month or day