by Michael S. Kaplan, published on 2004/12/03 08:13 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2004/12/03/274288.aspx
I think the Turkish folks have it right.
After all, say that we had all of the following characters in English:
Wouldn't we do the case mapping to put the dotted and dotless variants together (so that both #1/#4 and #3/#2 would be case pairs)? Be honest, doesn't that make more sense?
We even have a good reason, if you think about it. I mean, its not like the "I" in "him" sounds the one in "nice" and neither of them sounds like the one in "niece" and none of them sounds like the one with no sound in "friend". So with all of those different sounds, English would be a lot simpler if we had an extra pair of letters to work with. I have talked to a lot of native speakers of other languages about languages (occupational hazard), and many suggest that one of the hard things about learning English is the multiple sounds for the same letter. We could actually move towards simplifying things by adding the complication of a few variations on letters....
Ah well, that probably won't happen. But hopefully you can see the basis for languages that might have for wanting an "Å" or an "Ö" or a "Č" or an "İ" in their midst. And then like I pointed out at the beginning of this post, if all of the variants of "I" did exist, it would be crazy to case them in any other way....
Of course, as you may have imagined this plan does not exactly co-exist well with case insensitve registries, or filesystems (like FAT and NTFS). Suddenly that idea that seems more sensible looks like an awful security risk (I do not even have to imagine; I have built versions of Windows on my own development machine that would not boot because they were unable to find the "HKLM\SOFTWARE\MICROSOFT\Windows" registry key and have heard tales of the ones that were unable to find WIN.ini). And I have witnessed code reviews that had scores of developers scan through thousands of files in the .NET Framework to (among other things) properly not use "Turkic" casing when trying to look at the filesystem or the registry. Its amazing how difficult and expensive it can be to make a product behave intuitively....
See how I slipped the proper design into that last paragraph? If you said "yes" then I feel very clever, otherwise I don't. :-)
The right design is to use CultureInfo.CurrentCulture in your .NET code any time you want to get the (possibly different) casing behavior seen in Turkish and Azeri, like in strings that your end users would see. At the same time you would use CultureInfo.InvariantCulture for those cases where you want the invariant, unchanging behavior. And in unmanaged code you want LCMapString with the LCMAP_UPPERCASE/LCMAP_LOWERCASE transformations to use or not use the LCMAP_LINGUISTIC_CASING flag, depending on the same conditons.
Its easy to remember it and do it, if you learn it in the first place. :-)
# Norman Diamond on 24 Dec 2004 7:41 PM:
# Michael Kaplan on 24 Dec 2004 8:06 PM:
# Vorn on 24 Dec 2004 10:59 PM:
# cumaozturk on 27 Dec 2008 10:18 AM:
Help Turkısh
# cumaozturk on 27 Dec 2008 10:20 AM:
Turkısh help
referenced by
2013/04/04 You need to dot every İ, not dot any I, dot every i, not dot any ı, and cross every t in Turkish
2010/09/26 If case conversion were harder, people would do it less
2008/11/14 When features collide (aka Your LCID sucks, but sometimes the bug sucks more)
2008/06/25 Seeing the tears, my heart went out to her as I asked her "Why the Long S?"
2007/04/25 The nature of OrdinalIgnoreCase vs. intuitive expectations
2005/08/02 New in Vista Beta 1: more use of the word 'linguistic'
2005/06/05 The dasBlog 'Turkish I' thing figured out
2005/04/04 When casing does not need to roundtrip in .NET
2005/03/04 "Michael, why does ToTitleCase suck so much?"
2005/01/16 My apparent obsession with "case" puns
2005/01/16 How [case-]insensitive (apologies to Frank Sinatra)
2004/12/11 What does "linguistic casing" mean?