by Michael S. Kaplan, published on 2006/02/21 03:11 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/02/21/535808.aspx
The communicative property of addition clearly does not apply to combing marks in Unicode.
Or at least it is not supposed to.
I mean, A + B is not the same as B + A, in any situation where that order is meant to enforce how they are placed in relation to each other.
Anyway, regular reader Mike Dunn asked in the Suggestion Box about an exception to this:
After reading your post about putting lots of diacritics on a letter, I wondered what determines the order that they appear in.
I looked at the sequences 0065 0302 0303 and 0065 0303 0302 using Tahoma on XPSP2 in Notepad and Word 2000, and in both cases the diacritics appear in the same order (tilde above the circumflex). This is the right order for Vietnamese, but if I were writing IPA, I would want the circumflex on top. Can the order be changed with control characters?
That, my dear Mike, is an excellent question. One that (now that you asked it) I was very curious about the answer. Why do these two sequences:
look the same, anyway?
Both U+0302 (COMBINING CIRCUMFLEX ACCENT) and U+0303 (COMBINING TILDE) have the same canonical combining class value -- 230, which means 'Above'. So there is no valid Unicode-type reason for them to re-order.
Now it is true that one character is a encoded as a precomposed sequence in Unicode and one is not, but still!
I was determined to find out what was going on.
Luckily, down the hall is the best freaking font team in the world, so all I had to do was head down the hall to ask somebody.
Hmmmm.... seems like a lot of people are out right now. I made it all the way down to Nick's office, where he was talking to Mushegh. Aha, maybe they would be able to help.
I started by apologizing to them, since although I do not consider them to be "the dregs" in any kind of quality sense, they ended up being treated as the dregs due to the distance between my office and theirs. They smiled, which I took as a good sign. And then I asked them about the above....
This is actually a known issue, It is a side effect of a bug in the way that the code was looking for precomposed forms (on the assumption that a precomposed version is more likely to look correct if it exists). The bug was causing precomposed characters with the wrong order for combining sequences to sometimes be found....
The good news is that Nick himself had checked in the fix for this bug in Vista, which now does things correctly:
It has not been backported to the prior versions of Windows, though that is the sort of thing which can of course be considered and triaged appropriately....
Now the other part of the question -- how to force the right behavior on the downlevel platforms, there were not too many ideas forthcoming.
Obviously if you are building the font you decide what precomposed characters will exist in it -- you can even have none exist and rely on the attachment points and such to build up the right character.
If you are not doing the font building yourself, you would have to find a way to break up the sequences without changing the display, which can be a real challenge (no one thought of anything offhand).
One way that I did find was putting together U+1ebd U+0302 (LATIN SMALL LETTER E WITH TILDE and COMBINING CIRCUMFLEX ACCENT), although I found it would work in some fonts (such as Segoe UI) and not so well in others (such as Tahoma). See below if you have these fonts both installed:
If you do not have Segoe UI installed then it will not look good, so don't bother reporting that as a bug!
So anyway, I headed back to my office and decided to perhaps not just rely on office locations to decide where I visit first -- because sometimes the best people to talk to would otherwise be dismissed as the dregs, and neither Mushegh nor Nick qualify as the dregs in my book. :-)
This post brought to you by "ễ" (U+1ec5, a.k.a. LATIN SMALL LETTER E WITH CIRCUMFLEX AND TILDE)
# Phylyp on 21 Feb 2006 4:38 AM:
# Michael S. Kaplan on 21 Feb 2006 7:55 AM:
# Andrew West on 21 Feb 2006 9:29 AM:
# bmm6o on 21 Feb 2006 11:31 AM:
# Michael S. Kaplan on 21 Feb 2006 12:04 PM:
# Johan Petersson on 21 Feb 2006 12:44 PM:
# Maurits [MSFT] on 21 Feb 2006 12:54 PM:
# Maurits [MSFT] on 21 Feb 2006 1:02 PM:
# Michael S. Kaplan on 21 Feb 2006 1:46 PM:
# Maurits [MSFT] on 21 Feb 2006 2:15 PM:
# Michael S. Kaplan on 21 Feb 2006 2:22 PM:
# Dean Harding on 21 Feb 2006 6:09 PM:
# Gabe on 22 Feb 2006 2:11 AM:
# Michael S. Kaplan on 22 Feb 2006 7:19 AM:
# Gabe on 22 Feb 2006 4:42 PM:
go to newer or older post, or back to index or month or day