by Michael S. Kaplan, published on 2007/08/18 19:59 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/08/18/4455146.aspx
I am embarrassed by the Heinlein allusion in the title but then I am embarrassed by the whole post so consider this me trying to work outside my comfort zone....
As more and more of the genuine work of Unicode (trying to encode all the scripts of the world, past and present) is accomplished, a great deal of the work to do in the way of new scripts starts to slow down.
Then, as I pointed out in Fictional could make things less functional, there are entities that start to feel that "Unicode may not be done, but it is done enough for us." Those are the companies that were in it to get their particular needs met within a character encoding standard, and it is a perfectly valid approach.
Other entities actually care about the remaining scripts and want to see that work done. Organizations like the Script Encoding Initiative diligently work to try close that gap, though the pace is not going to set any speed records. This is hard work, and it is work for specialists, at least for most of the work to get proposals together.
I often wondered, in looking at many of the doctoral dissertations of linguists, who are in such a wonderful position to help provide this information for new scripts, why none of them are engaged directly. I mean, take this bit of Kieran's post about how she ended up at Microsoft:
A really great dissertation is read by maybe 100, 200 if we're being generous, people. Maybe it is of use to 10 or 15% of them. And that's a really, really great one.
(I know she was making a slightly different point about a different kind of linguistics dissertation but I am pretty sure the numbers are on the same order of magnitude here, too.)
Why don't more of them take that work to the next step and get that language into Unicode, taking genuine steps to do work that will be seen by a lot more people and will be hugely appreciated in this world where the proposals aren't getting produced fast enough because the people qualified to do it are not a huge group?
But that's an aside, I'll talk more about that another day (though if there are any linguists working on their dissertation about a language written in an as-yet-unencoded script, they should feel free to contact me and maybe we can work on a proposal to get that script in Unicode -- a true cherry on top of a Ph.D. sundae knowing that the work will be of tremendous benefit to a lot more than the small numbers quoted above! :-)
A third thing that some people involved in Unicode will find themselves doing is trying to perfect what is currently encoded. This can have bad consequences on existing implementations so it probably a good thing that after the Bidi mirroring snafu that the pendulum will tend to swing the other way for a bit. The burned child fears the fire, after all!
This group can still do lots of productive work, writing Unicode Technical Notes and generally working to improve their implementations. They are not a thumb twiddling sort and even if they aren't going to be able to change as many properties, they have plenty of work they can do.
A fourth thing that some people will do is move their focus into other areas like the CLDR, which works to try and provide locale data. Obviously working on a whole new standard within the standard is a way to occupy one's time if one is staying in standards.
There is also a fifth thing that is actually much like the fourth thing (basically people taking up new causes, new work items, new ways to keep busy), what I believe is the answer to the question raised in the title of this post -- the fact that there is now a Symbols Subcommittee, which according to this page:
Discusses and makes recommendations about the encoding of symbols, such as wingdings, train schedule symbols, mobile phone symbols, etc.
This may sound scary to you. It does to me. And not just because every single message sent to their mailing list is copied to the list for Unicode members, making me wonder why they even bother to have a mailing list as I receive two of each message (the mail system at unicode.org is not smart enough to avoid the duplicate sends!).
And as several people have pointed out to me and on the list recently looking at the introduction to the Unicode Standard, 5.0 text:
Note, however, that the Unicode Standard does not encode idiosyncratic, personal, novel, or private-use characters, nor does it encode logos or graphics. Graphologies unrelated to text, such as dance notations, are likewise outside the scope of the Unicode Standard. Font variants are explicitly not encoded. The Unicode Standard reserves 6,400 code points in the BMP for private use, which may be used to assign codes to characters not included in the repertoire of the Unicode Standard. Another 131,068 private-use code points are available outside the BMP, should 6,400 prove insufficient for particular applications.
Keeping that text in mind, consider this: one of the big items that the Symbols Subcommittee is talking about right now is whether to encode the Emoji (絵文字), the symbols that are so popular in the Japanese wireless market. Suddenly, lots of characters previously rejected may be okay to encode now if some of these people get their way, and the worry about the obviously faddish nature of things like Emoji will come full circle when wireless operators claim they need Emoji in text streams so that they can document the Emoji they support.
Thinking back to the innocent days of the contributions of William Overington and Bernard R. Miller, I cannot avoid a sense of deja vu in all this.
We could call it the next Comet Circumflex system, or the new golden ligatures, or the courtyard codes used for numbers and chess pieces, or we could notice that there are specific symbols in Bytext (a link that it pains me to give, truly, but it is topical -- read this question from his FAQ if you doubt me) that look a lot like the symbols I am talking about and how hypocritical it feels to have told this person that Unicode is not built for what these people wanted to do only to later form a committee to talk about the same thing those people wanted to do.
I thought all of those things were a bad idea then, and I still think so now, by the way.
Ignoring all that, I have a hard time seeing myself either
And there is also a proposal to encode the Japanese TV symbols used by ARIB, as well. The proposal was even written by a Microsoft employee.
So I guess we're getting into symbols too, though weren't we anyway with our [probably also to be encoded] WingDings and WebDings fonts (which have their own problems in Microsoft products and have for years because they aren't encoded and the silly silly features in WordMail AutoCorrect!).
It is a slippery slope that we head down here, and clearly I am not speaking for either Microsoft or Unicode when I say that I think it is a really bad road to be heading down.
This post brought to you by ☹ (U+2639, a.k.a. WHITE FROWNING FACE)
JM on 20 Aug 2007 4:00 AM:
This is going to sound silly, but many of your blog entry titles are really unhelpful. I use RSS to check the blogs I read, and not everything is always of interest. So a title like this is helpful: "Additional personal speculation on the Vista MUI SKU Story". A title like this is not: "Who are the heirs of Bernard R. Miller? (aka U+2323 when you say that!)" That's fairly typical of other entries, too... Maybe it's my fault for not getting the Heinlein reference, but in general, I prefer my titles to be direct. If you're naming a TV episode, you can get away with a clever title; if you're naming a blog entry you can't.
Now, most of your entries happen to be interesting enough that I can read them anyway, even if the title gives me no clue as to what it's going to be about. And I realize you often write about highly specialized topics that don't admit very short titles. But still.
Otherwise, keep up the good work. :-)
Michael S. Kaplan on 20 Aug 2007 4:08 AM:
I do feel pretty strongly about my titles and being able to express myself creatively through them.
Scoble told me once it negatively impacts my readership, which if I were in it for the numbers there are at least 20 things I would do differently than I do now (titles would be just one item to change)....
JM on 20 Aug 2007 4:52 AM:
Ah, I see -- the titles are not *for* me. (In case you don't get *my* reference: http://www.penny-arcade.com/comic/2004/03/24).
Well, it's your party, and it won't stop me from reading your blog regardless. As long as you don't feel strongly about expressing yourself creatively through variable names... (Please tell me you don't. :-)
Michael S. Kaplan on 20 Aug 2007 4:56 AM:
Ah, I *did* get that reference (though in fairness I have to admit it is because someone forwarded it to me a while back!).
For variable names, I usually like to stay consistent with the code around me. Which is kind of like the opposite, a chameleonly kind of expression? :-)
William Overington on 26 Aug 2007 6:55 AM:
> We could call it the next Comet Circumflex system, or the new golden ligatures, or the courtyard codes used for numbers and chess pieces, ...
The web pages for the original ideas are still on the web.
Since that time I have become very interested in designing fonts and some readers might like to have a look at those of the fonts which I have produced which are available for free download from the following web page.
Some of those fonts include glyphs for ligatures within the Unicode Private Use Area, many using the mappings of the golden ligatures collection. I have not updated the golden ligatures documents on the web yet there are a few additions which can be found in the Quest text and Chronicle Text and 10000 fonts. Many of my fonts do include a glyph for a ct ligature mapped to U+E707. It can be useful in desktop publishing situations where the software application does not support OpenType glyph substitution. My fonts are TrueType, not OpenType, though I am hoping to be able to have access to software to produce and use OpenType fonts at some future time. I like to think of my adding ligature glyphs in fonts in the Unicode Private Use Area as a way of making them available for use now by those of us who have less facilities than do some other people and also as somewhere to store them in the hope that one day they will be used in OpenType fonts, though I like to think that in my fonts they will always also be available using a gloden ligatures codepoint.
The courtyard codes were commented upon in a post in the Unicode mailing list some years ago. Searching for Mars weather in the archives finds the post easily. Maybe reading that post again now might be helpful. Progress cannot be stopped and the definition of character stated in that post does allow for progress.
Something which I particularly like about Bytext is the arrowed brackets which are intended for use with superscripts and subscripts and for designating in a linear run of text the lower and upper limits of integrals and summations in mathematics. Perhaps your committee could have a look at those please?
> ... and how hypocritical it feels to have told this person that Unicode is not built for what these people wanted to do only to later form a committee to talk about the same thing those people wanted to do.
It is, in my opinion, not hypocritical of you if you give proper academic credit to the people who put forward the ideas in the first place, perhaps mentioning, as you have, that you were against those ideas at that time. At the time of writing your blog you thought that those of my ideas which you mentioned were each a bad idea, maybe you still do. You are welcome to your opinion, I refine my ideas as I proceed. Reasoned debate is fine. If my ideas are accepted in time, including perhaps the one about special angled brackets for markup languages rather than using the angled brackets of basic latin, then it would not be hypocritical for you to be chairing a committee that accepts them if proper academic credit is given: there is no need to mention changing your mind, the academic credit of invention would be adequate.
While writing, I wonder if you might like to have a look at an idea which I have for a symbol please?
That has potentially far-reaching implications. If your committee can get that encoded into Unicode it would protect it throughout the world.
26 August 2007
William Overington on 28 Aug 2007 3:02 AM:
> ... every single message sent to their mailing list is copied to the list for Unicode members, ...
Are any of these messages available anywhere for viewing by the public please?
I have started a thread about the topic at the following place.
28 August 2007
James Kass on 4 Sep 2007 9:35 PM:
Emoticons again. Interrobang.
Since the graphics being discussed may be easily interpreted as something completely different from the matching suggested semantics, it might not be a bad idea to provide some alternate descriptions.
For instance, under the animated graphic which is supposed to represent the concept of "ROLLING ON THE FLOOR LAUGHING", a possible alternative interpretation/use of the icon might be:
"SEND HELP AT ONCE -- AM HAVING SEIZURE".
And, the GIF which is supposed to signify the symbol known as "PICTURE OF COW" might be annotated as "BROWNISH VAGUELY BLOB SHAPED OBJECT WITH SOME TINY SQUIGGLES", because that's exactly what it looks like on a thirteen inch monitor at normal text sizes.
That really crude symbol. (The one which inspires so much sophomoric humor.) Now, if you're familiar with...
Hmmm, this could go on and on, much to the chagrin of the humor-impaired. Funny people are already thinking up even more suggestions.
So here's a different idea.
People could each be responsible for entering their own interpretations for the ambiguous pictures. After the picture, there should be a user-defined descriptive string entered.
You're seeing where this is heading, of course. Plane Fourteen Language Tags could be used for those descriptive strings. After all, they're not really being used for much of anything else, and it's such a shame to see them go to waste.
That's not really a very good idea, either, though. Why would anyone bother entering a P14LT user-defined descriptive string when they could just use, ah, text?
William Overington on 6 Sep 2007 5:11 AM:
James Kass wrote as follows.
An interesting aspect of emoji is that they localize in the mind of the user, into the language of the user.
So, if the emoji were developed then they could be used to facilitate communication between people who do not understand the same languages.
Suppose that mobile telephones were developed such that they have an alternative form of output for text messages which could be used instead of a telephone call when desired. That form of output could be as an infra-red burst of data, using technology similar to that used for television remote control devices.
The signals could be received by another such unit, or by a display unit on a desktop, or by an autoresponding computer-driven information point.
Then a person could ask such questions as WHERE IS THE NEAREST VEGAN RESTAURANT PLEASE? and the question be understood in a tourist information centre even if the tourist and the staff do not understand the same language.
Certainly, there are potentially an extremely large number of potential questions, yet maybe a sequence of two or three emoji could be used so as to convey information.
For example, suppose that WHERE IS THE NEAREST ... PLEASE? is encoded as one emoji item, and RESTAURANT is encoded as another emoji item and VEGAN is encoded as another emoji item. That particular combination of three emoji, in any order, could convey the desired meaning.
If one adds other emoji items for nouns such as HOTEL and CAFÉ and emoji for various adjectives and emoji for a way of indicating directions then one could develop the system into something potentially useful to many people.
There could also be an emoji item to indicate that a specific name follows. For example, suppose that one is in Vienna and wishes to ask the directions to the Café Mozart. There could be an emoji item such as WHERE IS ... PLEASE? and that could be used in conjunction with the emoji item for CAFÉ and the emoji item indicating that a specific name follows and the word Mozart. Maybe the rule would be that the emoji item indicating that a specific name follows is never the last emoji item in a sequence so that the sequence always ends with an emoji item.
There are lots of empty planes in the Unicode map. I feel that there is a great potential to use some of them to develop such systems and other systems. For example, some of the code points in one of the planes could be used to encode colours.
Some time ago I carried out some experiments with encoding graphics using some Unicode Private Use Area characters.
6 September 2007
go to newer or older post, or back to index or month or day