What does "linguistic casing" mean?

by Michael S. Kaplan, published on 2004/12/11 09:51 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2004/12/11/279942.aspx

The LCMapString API is the ultimate Win32 API that does casing. All of those functions in USER32 (CharUpper, CharLower, CharUpperBuff, and CharLowerBuff) are wrappers around calls to LCMapString with the LCMAP_UPPERCASE and LCMAP_LOWERCASE flags. All of these functions do the simple mappings like a --> A --> a, å --> Å --> å, etc.

For the most part, these mappings will properly roundtrip, which is a good thing since the mapping from lowercase to uppercase is used by the file system for its case insensitivity. All of these operations are linguistically sensible.

So what does LCMAP_LINGUISTIC_CASING mean? What happens when you pass that flag?

Well, two things:

  1. You get the right behavior for Turkic locales like Turkish and Azeri;
  2. You get a bunch of one-way mappings on all locales, e.g. U+03f1 (Greek Rho Symbol) will uppercase to U+03a1 (Capital Greek Rho), which will lowercase to U+03c1 (Small Greek Rho).

Now neither of these seems very "linguistic" to me. But they do fit into the Unicode Standard definitions of simple case mapping. Since both are described in Unicode, we should maybe have called it LCMAP_UNICODE_SIMPLE_CASING instead?

It is only the simple Unicode casing that is covered. No Microsoft products currently handle full Unicode casing, which can actually change the size of the target string.

Though we have our eye on this feature since it has been requested from time to time. Perhaps it would be higher on our list of things to do if more people were asking....

no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2008/11/01 What's the shape of the sort?

2008/06/25 Seeing the tears, my heart went out to her as I asked her "Why the Long S?"

2008/05/12 İn tıtlıng thıs ınclusıon ın re: the ınterests of Turkısh İSVs, am İ just tryıng to buıld İ's and ı's ınto the tıtle of thıs daıly contrıbutıon to SİAO (SıaO), amıgo?

2005/10/18 How best to alter case

2005/08/02 New in Vista Beta 1: more use of the word 'linguistic'

2005/06/24 LCMapString's *other* job

2005/05/26 The last word on the FINAL SIGMA

2005/04/04 When casing does not need to roundtrip in .NET

2005/03/04 And yet another blogger to keep an eye on!

2005/01/16 How [case-]insensitive (apologies to Frank Sinatra)

go to newer or older post, or back to index or month or day