by Michael S. Kaplan, published on 2007/08/17 03:16 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/08/17/4425267.aspx
From the TV show Angel, the episode entitled Epiphany:
Angel: I'm still not sure what happened...
Lorne: What's not to understand? You think you're the first guy who ever rolled over, saw what was lying next to him, and went GYAAAAH!?
Okay, my challenge now is to appear to tie THAT quote in while veering back into the appropriate. Hopefully I do well.
For a long time now, there are implementers of Unicode who really feel the need to stop tweaking existing property values in the Unicode Standard since the changes can have a huge impact on their implementations and on their customers, changes that are not always well understood until it is too late.
Opposing those people are the implementers of Unicode who notice discrepancies or inconsistencies and in the interests of trying to make values more understandable and consistent try from time to time (and UTC meeting to UTC meeting) to tweak those properties.
You can see the conflict here:
The second group says "It is messy how it is now."
And the first group says "Life can be messy, let's leave it alone."
So the second group says "This is a bug we should fix. what's the harm?"
And the first group says "I don't have what is bad about it right in front of me but when we do this sort of thing stuff breaks."
So the second group says "Well, let's just make it a Public Review Issue and then we'll know if anyone feels like we have broken them, and we can reconsider."
Then that first group, with no convincing thing to say against that argument, reluctantly concurs.
Then, a few months later, we are back in the UTC and there is little feedback against it and minimal feedback at all. With no reason to avoid the change, no one really has much to argue against. So it gets into the standard.
And then, half a year later, someone stands up, points out that the Emperor was wearing no clothes those day, and we have a corrigendum.
This time around there were two Public Review Issues: #80 which talks about (among other things) changing the Bidi_-Mirrored property on some characters including QUOTE characters, and #91 which is the update UAX #9 itself. Yada yada yada. and the change made it in to Unicode 5.0.
Luckily Jonathan Kew produced a document that was discusses in this recent UTC meeting which pointed out all of the things this broke. A corrigendum has been issued, which you can read about in Corrigendum #6: Bidi Mirroring. The data changes listed there are:
Changes to Bidi Mirroring
When this corrigendum is applied to Unicode 5.0.0, the Bidi_Mirrored property of the characters 2018..201F and 301D..301F is changed to "false" and their Bidi_Mirroring_Glyph is adjusted accordingly. Make the following changes to data files:
1. Change the 11 lines in UnicodeData.txt which define properties for these characters to have the following contents:2018;LEFT SINGLE QUOTATION MARK;Pi;0;ON;;;;;N;SINGLE TURNED COMMA QUOTATION MARK;;;; 2019;RIGHT SINGLE QUOTATION MARK;Pf;0;ON;;;;;N;SINGLE COMMA QUOTATION MARK;;;; 201A;SINGLE LOW-9 QUOTATION MARK;Ps;0;ON;;;;;N;LOW SINGLE COMMA QUOTATION MARK;;;; 201B;SINGLE HIGH-REVERSED-9 QUOTATION MARK;Pi;0;ON;;;;;N;SINGLE REVERSED COMMA QUOTATION MARK;;;; 201C;LEFT DOUBLE QUOTATION MARK;Pi;0;ON;;;;;N;DOUBLE TURNED COMMA QUOTATION MARK;;;; 201D;RIGHT DOUBLE QUOTATION MARK;Pf;0;ON;;;;;N;DOUBLE COMMA QUOTATION MARK;;;; 201E;DOUBLE LOW-9 QUOTATION MARK;Ps;0;ON;;;;;N;LOW DOUBLE COMMA QUOTATION MARK;;;; 201F;DOUBLE HIGH-REVERSED-9 QUOTATION MARK;Pi;0;ON;;;;;N;DOUBLE REVERSED COMMA QUOTATION MARK;;;; 301D;REVERSED DOUBLE PRIME QUOTATION MARK;Ps;0;ON;;;;;N;;;;; 301E;DOUBLE PRIME QUOTATION MARK;Pe;0;ON;;;;;N;;;;; 301F;LOW DOUBLE PRIME QUOTATION MARK;Pe;0;ON;;;;;N;;;;;
2. Remove the following 9 lines from DerivedBinaryProperties.txt, and change the count at the bottom of the file accordingly from 537 to 526:2018 ; Bidi_Mirrored # Pi LEFT SINGLE QUOTATION MARK 2019 ; Bidi_Mirrored # Pf RIGHT SINGLE QUOTATION MARK 201A ; Bidi_Mirrored # Ps SINGLE LOW-9 QUOTATION MARK 201B..201C ; Bidi_Mirrored # Pi  SINGLE HIGH-REVERSED-9 QUOTATION MARK..LEFT DOUBLE QUOTATION MARK 201D ; Bidi_Mirrored # Pf RIGHT DOUBLE QUOTATION MARK 201E ; Bidi_Mirrored # Ps DOUBLE LOW-9 QUOTATION MARK 201F ; Bidi_Mirrored # Pi DOUBLE HIGH-REVERSED-9 QUOTATION MARK 301D ; Bidi_Mirrored # Ps REVERSED DOUBLE PRIME QUOTATION MARK 301E..301F ; Bidi_Mirrored # Pe  DOUBLE PRIME QUOTATION MARK..LOW DOUBLE PRIME QUOTATION MARK # Total code points: 526
3. Remove the following 11 lines from BidiMirroring.txt:2018; 2019 # [BEST FIT] LEFT SINGLE QUOTATION MARK 2019; 2018 # [BEST FIT] RIGHT SINGLE QUOTATION MARK # 201A; SINGLE LOW-9 QUOTATION MARK # 201B; SINGLE HIGH-REVERSED-9 QUOTATION MARK 201C; 201D # [BEST FIT] LEFT DOUBLE QUOTATION MARK 201D; 201C # [BEST FIT] RIGHT DOUBLE QUOTATION MARK # 201E; DOUBLE LOW-9 QUOTATION MARK # 201F; DOUBLE HIGH-REVERSED-9 QUOTATION MARK 301D; 301E # REVERSED DOUBLE PRIME QUOTATION MARK 301E; 301D # DOUBLE PRIME QUOTATION MARK # 301F; LOW DOUBLE PRIME QUOTATION MARK
For more info (and soon for the link to updated data files for those who need them), see the link to the Corrigendum.
The wider issue here goes back to the original group #1 and group #2 I first talked about, and the dynamics which will likely need to start changing there in order for big companies (like Microsoft and others) to be able/willing to pick up changes more quickly.
Obviously staying multiple versions behind is the safest thing to do in products but not the most helpful for the standard or frankly for customers, but a better balance has to be reached when issues like this come up. The changes to the dynamic in the UTC need to be fewer changes with a common default being no changes without boith significant problems from implementers affecting customers, significant discussion, and then significant feedback when the PRI is issued....
This post brought to you by ‘ (U+2018, a.k.a. LEFT SINGLE QUOTATION MARK)
go to newer or older post, or back to index or month or day