by Michael S. Kaplan, published on 2007/08/24 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/08/24/4536979.aspx
That night I saw in the pipeline fair
A character that wasn't there
Non-existence won't stop the encoding; it's true
So it's coming soon to a Unicode near you!
It all started with Every character has a story #15: CAPITAL SHARP S (not encoded), and then continued in Every character has a story #26: CAPITAL SHARP S (might be encoded?).
And you can read the title of this latest blog post and know what is happening now without any hints from me....
Though I must admit the trip has been both long and strange.
It was decided within both ISO 10646 and Unicode that this interesting character was indeed going to be encoded (as per the pipeline, it was officially accepted on May 18th of this year and as of April 27th is in Stage 5 of the ISO process.
And I have probably learned more about the nature of letters within typography than any experience before or since!
Immediately after this process started, there was a whole bunch of discussion on the Unicode List about a very important topic:
WHAT DOES A CAPITAL SHARP S LOOK LIKE?!?
There were a whole bunch of proposals here, and much of the conversation then took a southward turn.
Like people suggesting that DIN should be dissolved by law for supporting the proposal.
And others pointing out that the proposal specified an enlarged version of ß. nothing more and nothing less.
But I have told you about the Unicode List, the next 100 messages oscillating between discussing typographic innovations that would make sense if the letter did indeed exist based on different theories of its etymology and people who remained unconvinced by the proposal even after it had been accepted since in their view it isn't a freaking letter in the first place.
Plus lots of SZ vs. SS arguments.
An informal survey of the Germans I knew all seemed to fall squarely in the camp of the insanity of DIN, though many of them considered the opinion to be redundant....
And then with a few people talking about the consequences for Unicode properties, just to add the vague scent of relevance to the discussion. :-)
John Hudson had in my opinion the most amusing observation:
The irony of the recent exchanges is not lost on me:
On the one hand, we have Marnen Laibow-Koser, who thinks that this character should not exist, but that it does, and therefore needs to be encoded.
On the other hand, we have me, who thinks that this character should exist, but that it does not, and therefore does not need to be encoded.
For Microsoft, it raises some interesting questions for both collation and case for the next version of Windows.
I mean, think about the issues I have already talked about in posts like What the %#$* is wrong with German sorting? where we make ss equal to ß so that the uppercase version "SS" will sort near the ß in a sort ignoring case -- where we do things that make less linguistic sense in order to give regular results that are intuitive.
So who would expect that if U+00df is equal to ss that U+1e9e wouldn't be made equal to SS? Meaning that in the collation tables, U+00df and U+1e9e would simply be case variants, with no real choice in the matter.
And as to casing....
Now just because we make the relationship in casing does not mean we make it in collation. After all, as I have pointed out several times before, collation != case.
But on the other hand, the case table is used in order to enforce the case insensitivity in the NT object namespace and the file system. And one clear issue is that there is no good reason to allow one to put filenames differing only by the presence of U+00df and U+1e9e in the same directory. Users would either never try it or they would never expect it to work. So it is quite possible that in the next version of Windows (which only does simple casing) it may make the most sense to make the two characters case variants of each other -- to enforce reasonable use of both letters!
There is still lots of time to decide, though at present I am leaning this way since it will give the most intuitive behavior for end users (even at the expensive of giving slightly unintuitive results for developers).
This post brought to you by ß and ß (U+00df and U+1e9e, LATIN SMALL LETTER SHARP S and CAPITAL SMALL LETTER SHARP S)
# Mihai on 24 Aug 2007 4:26 PM:
There are a lot of problems with this, and it's non-reversible mapping from ß to SS.
So I will come with another insane proposal: Germans should take the uppercase ß and start using it:
- it is encoded in Unicode
- it solves a lot of technical problems (thing that can surely be appreciated by the German mind, so good with engineering and technical problems :-)
- the experience of spelling reforms is still fresh
So it should be easy, and it will help all of us :-)
# Michael S. Kaplan on 24 Aug 2007 6:49 PM:
Well note that your proposal maps to exactly what I suggsted for Microsoft to do. :-)
# Theo on 26 Aug 2007 11:27 AM:
It blows my mind that Microsoft, in General, can be so brazen that
a)it finds no problem in making implementing a broken version of German sorting rules (and judging by the blogs author, it's not as if much time or thought was spent on this),
b)the .Net architecture is incapable of dealing with the upper/lower case problems,
c)the blog author still, many months after the problem surfaced, has not bothered to track down the various German orthographies and uses (spelling reform, regional uses - Swiss, Austrian, German etc). It's a very simple 5 minute search, for the basics: http://en.wikipedia.org/wiki/German_language#Present
And Microsoft expects the world to dump all else and use its defective products?
go to newer or older post, or back to index or month or day