Collation != case, still

by Michael S. Kaplan, published on 2006/08/08 14:10 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/08/08/692390.aspx


Richard asked in the Suggestion Box, and I decided to dispatch quickly:

Why is it that English (en-US, because there is no en-GB) Windows and .NET don't know how to upper case a Latin Small Latter Sharp S even with the de-DE locale specified:

"\u00DF".ToUpper(CultureInfo.GetCultureInfo("de-DE"))

does not return "SS", but "ß"?

The Unicode casing file CaseFolding.txt has

00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S

Is this a Window's limitation? (Which would not help, given I'm trying to put together a demo of doing the right thing to build I18n into an application update.)

This is a question I have talked about many times in the past, as a simple search for U+00df indicates. And most importantly, since Casing and IgnoreCase are still not the same thing and Collation != Case (a.k.a. Collation <> Case), for now this is how casing will work on Microsoft platforms -- what Unicode refers to as simple casing....

 

This post sponsored by "ß" (U+00df, LATIN SMALL LETTER SHARP S)


# J. Daniel Smith on 8 Aug 2006 5:15 PM:

I know there are "good reasons" for things working the way they do...but anybody who has had even a semester of high-school German knows that ß upper-cases to SS; that is, "Straße" (street) becomes "STRASSE" (although I seem to recall that perhaps the rules are different in Austria and/or Switzerland?)

Since in .NET, ToUpper() returns a new string, it "should" be easier to "fix" this problem in that enviroment.

# Michael S. Kaplan on 8 Aug 2006 5:26 PM:

Well, "should be" is a relative term -- it is still using the same casing tables to do the work. We are more flexible in collation so we give the support....

# Richard on 9 Aug 2006 5:44 AM:

Thanks...

(Sharp S search failed to find anything... didn't try just the code point.)

# Richard on 10 Aug 2006 5:19 AM:

Or, rather I should say,

Search for "Sharp S" failed to find anything about case folding (quite a few hits around collation/equality.)

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2009/07/29 Every character has a story #33: U+1e9e (CAPITAL SHARP S, Microsoft edition - Part 2)

2008/02/24 The idea has to do more than just make sense to me (aka How S-Sharp are *you* feeling today?)

2007/10/25 Jokes that aren't really all that funny in the end (aka At least SQL Server isn't on our case)

2007/08/24 Every character has a story #28: U+1e9e (CAPITAL SHARP S)

2007/07/31 If this post really describes a bug, would I actually put it in the WYNN column?

2006/12/07 SQL and the CLR: Part 1 (the things we can make work well)

go to newer or older post, or back to index or month or day