Every character has a story #32: U+1e9e (CAPITAL SHARP S, Microsoft edition - Part 1)

by Michael S. Kaplan, published on 2009/07/28 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2009/07/28/9850675.aspx


 

Previous blogs about this letter:

Now once again, keep in mind that for most of the German speaking world this still isn't a letter....

So Windows 7, which once again (just like in Vista) was made to be updated to the most recent version of Unicode it could, LATIN CAPITAL LETTER SHARP S needed to be integrated.

Let's take a look at in WordPad, using the Segoe UI font:

Ok, interesting. Obviously they couldn't make it much taller. So they made it a little wider and left it at that.

I think that is probably what Unicode did too.

Makes me wonder what happened in Character Map. Let's take a look:

 

Hmm... Undefined? Oh I guess someone forgot to regenerate the list of names that Charmap uses. Luckily it can still display characters even when it doesn't know what they are.

Any testers want to put that bug in? :-)

You know I kind of wonder what they did for fonts that can't change the width.

Let's take a look at Consolas:

It's not there.

Oh damn, let's look at some other fonts, too.

Like Tahoma:

and Microsoft Sans Serif:

and the fixed width font that is in the font link chain, Courier New:

 

 And I am sincerely curious what the upper and lower case look like next to each other on that one. Let's take a look:

Interesting!

And it does meet the fixed width rules -- notices how the surrounding text lines up?

Though it makes me wonder what might have changed from the old font's lowercase character. Just a little bit curious....

Okay, so Courier New has it, yet Consolas does not.

Uh oh -- is this a C* font thing?

Let's look at Calibri, the default font in WordPad:

Crap.

Notice how RichEdit doesn't seem to be looking very hard for the substitute. Thank goodness Word is not this lazy!

How about everyone's favorite uber-font, Arial Unicode MS?

Double crap.

Or maybe we'll get another 20 or 30 people who will agree with me that Arial Unicode MS effectively [bites|sucks|blows].

Silver lining of a sort....

On last font I want to check out though.

Times New Roman:

  

Wow, I think I like this one best -- this is on I can really tell the difference on. Much more than the others. Truly.

Okay, let's move on, there is kind of a pattern and kind of a logic here. I'm happy. Well, as happy as I can be about a letter that doesn't really exist in the first place....

But just wait until tomorrow when I do part 2 of this blog. :-)


# Peter Gibbons on 28 Jul 2009 1:16 PM:

If you have a look at the string resources in getuname.dll where the character map gets the names from you'll notice that many if not all character names that where added with Unicode 5.1 are missing. For example the range for the Sundanese Script starting at 0x1B80. The other scripts are:

But what's more important is that the collation algorithms seem to process "ẞ" right. At least in explorer with filenames.

Regards,

Peter

# Michael S. Kaplan on 28 Jul 2009 1:56 PM:

Yep, the name thing was my point. :-)

I'll be jumping into the other issues tomorrow....

# Michael Everson on 28 Jul 2009 4:24 PM:

I drew the one used in the Unicode chart in close consultation with Andreas Stötzner. There are fairly useful specifications out there about how to construct the character using bits and pieces of other characters in the font to get the right proportions.

# Gwyn on 28 Jul 2009 5:03 PM:

Really stupid question here, but I can't seem to find that character in charmap. What did you do to be able to select it? The "Go to Unicode" function does not appear to be able to find it either. If I search for "sharp", it finds the LATIN SMALL LETTER SHARP S ok, but not the capital one.

# John Hudson on 28 Jul 2009 6:31 PM:

The uppercase eszett didn't make it into the recent extensions to Calibri, Cambria and Consolas. It wasn't included in Unicode when work on those extensions was spec'd.

# Michael S. Kaplan on 28 Jul 2009 9:11 PM:

Gwyn -- new for Windows 7, it is....

# Mihai on 29 Jul 2009 3:34 PM:

Someone at Ubuntu should get a memo on how that is supposed to work.

The conversion tables in Ubuntu version (8.10) map lower case sharp s to upper case sharp s (all locales, including German).

# Gwyn on 29 Jul 2009 7:24 PM:

Ok cool, thanks I'm not going mad then :) Carry on

# Michael S. Kaplan on 29 Jul 2009 8:09 PM:

Mihai -- I think you wanted the part 2 blog here.

I am jealous of Ubuntu -- they did the thing I wish Windows had. It is the better behavior in my opinion....

# Mihai on 30 Jul 2009 12:43 PM:

> I am jealous of Ubuntu -- they did the thing I wish Windows had.

> It is the better behavior in my opinion....

It feels a bit tricky.

I think the mapping should be to "SS", at least for the German locale.

But Ubuntu is limited by the design of the POSIX API, which does the case conversion in place.

I would really like a mapping to "SS" in public case conversion API, the way ICU (and Mac OS) do. That is what the (German) users expect.

# Michael S. Kaplan on 30 Jul 2009 1:10 PM:

Microsoft does simple casing here too.

I am willing to bet that within five years they will want simple (1 to 1) mappings to use the Capital Sharp S. What we should have done and what Ubuntu apparently does....

# Mihai on 4 Aug 2009 12:04 PM:

"I am willing to bet that within five years they will want simple (1 to 1) mappings to use the Capital Sharp S. What we should have done and what Ubuntu apparently does...."

Yes, it might make sense because the API is crippled (like the POSIX one). But as an API consumer I would want what my client wants: proper linguistic behavior.

So I would want a non-simple casing API, mapping to SS, like ICU.


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2009/07/29 Every character has a story #33: U+1e9e (CAPITAL SHARP S, Microsoft edition - Part 2)

go to newer or older post, or back to index or month or day