If this post really describes a bug, would I actually put it in the WYNN column?

by Michael S. Kaplan, published on 2007/07/31 19:59 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/07/31/4155336.aspx


There are several scripts that have the notion of case, like Latin, Cyrillic, Greek, Armenian, Coptic, Glagolitic, and others.

There are some folks like Michael Everson who believe that even if a character does not have both cased variants that they may well have them one day, and should thus both be encoded.

Some people agree with him, others would rather wait for letters to be attested first.

It is a perennial battle. :-)

But anyway, earlier today when I posted See that version there? It is going down, man! #2 (aka Everybody WYNNs), something occurred to me about U+01bf, a.k.a. LATIN LETTER WYNN, which has been around at least since Unicode 1.1.

And that is the fact that Unicode also has U+01f7, a.k.a. LATIN CAPITAL LETTER WYNN, and has had it since Unicode 3.0!

But the Swedish tables do not put these two letters next to each other.

This other letter is on our collation tables, given special weight for Turkmen (0x0442), which does not put it anywhere near the lowercase version.

And our default table does not put them anywhere near each other, either.

Of course in our casing table update in Vista, both characters are there and map to each other (prior to Vista they did not; this is one of the many mappings that was missing!).

And Unicode's default collation table does have them right next to each other.

Now this is somewhere between zero and three problems, depending on whether Swedish/Finnish, Turkmen, or the default table should be putting them near each other. They do not seem to be put together in languages that use one or the other, so it is honestly unclear whether one would expect them to be put together.

Is this yet another time where collation != case (like I have mentioned before), and an interesting one, to boot? :-)

Anyone from Sweden, Finland, Turkmenistan, or elsewhere have any opinions here about this issue with WYNN and CAPITAL WYNN?

And where does U+16b9 (RUNIC LETTER WYNN) fit into all of this? :-)

Clearly the letter is no longer what it was in Old English. But on the other hand, what was it then? Are there any actual bugs here?

Every letter with three characters in Unicode behind it has a story too, I guess!

(more on Wynn, here)

 

This post brought to you by วท (U+01f7, a.k.a. LATIN CAPITAL LETTER WYNN)


Wilhelm Svenselius on 1 Aug 2007 1:46 AM:

I'm Swedish, and my only input on this is that I have never seen the "WYNN" character before, let alone used it, but not considering W and V separate characters (as separate as A and Q, really) would seem extremely strange to me.

Åke Persson on 1 Aug 2007 5:50 AM:

In the Vista default table, U+01f7 LATIN CAPITAL LETTER WYNN has been given a weight to sort as a variation of the letter P, while U+01bf LATIN LETTER WYNN is sorted as a variation of letter W. Being the only casing pair not sorted close to each other, this definitly looks like a bug:-)

BTW, an other strange thing in the default table is that U+0192 LATIN SMALL LETTER F WITH HOOK is sorted after U+0191 LATIN CAPITAL LETTER F WITH HOOK. Being the only casing pair where the small letter is not sorted before the corresponding capital letter, this also looks like a bug:-)

Åke Persson on 1 Aug 2007 4:15 PM:

Yet another amazing fact about the Vista default table;-)

The letter YOGH (U+021D, U+021C) is sorted as a variation of the letter E!


referenced by

2010/03/09 Coloring outside the lines in the a-ness of the Hungarian Technical Sort

go to newer or older post, or back to index or month or day