Attempting to discourage some variations?

by Michael S. Kaplan, published on 2010/05/20 07:01 -04:00, original URI:

This is not something I have talked about very much.

I mean, I mentioned it a little bit in UCS-2 to UTF-16, Part 10: Variation[ Selector] on a theme... a while back.

And if you look far and wide, you may find additional odd mentions from time to time.

Variation Selectors.

Those funky characters that are meant to be invisible but if they happen to be there and are on a magical list, combine with the previous character to change the appearance of it to make it look different in some way.

Now to whatever extent I jumped the shark on Unicode it wasn't UTF-8S (now called CESU-8) or variation selectors, it was emoji.

But it was very nearly variation selectors....

And not really variation selectors per se; it was much more

Ideographic Variation Sequences.

which, with UTS#37 (Unicode Ideographic Variation Database) define a method by which Unicode, the standard that encodes scripts not languages which therefore cannot encode any random variation of a character, defines a means by which anyone who is dealing with CJKV who has some money and wants to encode their specific graphical forms of already encoded ideographs can basically do so, with minimal oversight compared to the process by which new characters and scripts are usually encoded.

If you think about this and decide it sounds like a train wreck waiting to happen, then you know how I feel....

Now the recent Unicode Technical Committee meeting just happened included an interesting point:

What was that thing King Canute was saying about laws to sweep back the tides?

I can't remember, exactly.

But I look forward to the first time something is rejected that is part of a set that one side feels is not an overlap, is not the same ideograph. And the fact that this mechanism meant to minimize the need for oversight now requires a whole lot of people to be watching things to avoid the duplication that this whole approach seems architected to allow.

Now when I say I look forward to this, I was being sarcastic. It actually makes me somewhat happy I don't have to be in those meetings anymore, even though usually I tend to miss them a bit....

Michael S. Kaplan on 22 May 2010 8:28 AM:

From Ken Lunde (sent via e-mail during the blog upgrade):

What is missing from the context with regard to getting two registrants to agree on sharing authority for the same IVS, meaning that the IVS is shared by more than one IVD Collection, is that this is done only if the affected IVD Collections are deemed parallel in their intent. This was discussed at the UTC meeting that resulted in the UTS #37 changes. The registered Adobe-Japan1 IVD Collection and the submitted Hany-Denshi IVD Collection are largely parallel, so depending on your point-of-view, the worst possible scenario happened, or the best possible scenario happened. In other words, it forced the UTC to take action on an issue that was inevitable. For example, if Hong Kong were to submit an IVD Collection, there'd be little need to coordinate or share IVSes, even if some of the glyphs happened to look the same, because they have different intents (they are intended for a different locale).

Michael S. Kaplan on 23 May 2010 9:10 AM:

This still seems to me like a train wreck, since collections that come up later won't have the benefit of this, and registrants who don't want to "share authority" won't choose to do so (and the IRG won't be around to enforce togetherness of essentially identical ideographs)....

Michael S. Kaplan on 27 May 2010 12:48 AM:

It would seem that there are some people out there who shared my worries about the general direction of variation sequences who welcome the opportunity for more oversight in cases where duplication can be avoided (see the miraicorp tweet here, for example:

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day