by Michael S. Kaplan, published on 2005/05/26 00:01 -07:00, original URI: http://blogs.msdn.com/michkap/archive/2005/05/26/421987.aspx
Back in the beginning of April, I explained about the one scenario where casing does not need to roundtrip in .NET -- the Greek final sigma.
Anyway, the day before yesterday I got an email from someone who had been reading my blog and was looking at all of the one-way mappings that are in the linguistic tables (accessed with the LCMAP_LINGUISTIC_CASING flag, which I have discussed previously). He was wondering why that FINAL SIGMA could not be put into the linguistic tables since it is a one-way mapping.
A fair question, one I thought worthy of a post. :-)
If you are a native speaker of Greek, then you know that both ς (U+03c2, a.k.a. GREEK SMALL LETTER FINAL SIGMA) and σ (U+03c3, a.k.a. GREEK SMALL LETTER SIGMA) do indeed uppercase to Σ (U+03a3, a.k.a. GREEK CAPITAL LETTER SIGMA). But if we added this character to the linguistic table, then it suddenly ς would never work in the CharUpper/CharUpperBuff functions and would not work in the default call to LCMapString with the LCMapString function with the LCMAP_UPPERCASE flag.
Obviously that would not be a good thing.
Try to imagine how you would feel if attempting to uppercase the string hello would come out as HELLo. Wouldn't you consider it a bug? Especially is it used to come out with the HELLO you were expecting? You might be thinking about telling the platform GooDBYE, if you know what I mean.
Of course ideally the functions would notice whether the Σ was at the end of a word and then decide whether to use ς or σ, depending. But LCMapString does not really look beyond the character level here, so until it does that would not really be an option.
Though of course a more sophisticated application might work to provide results beyond the character boundary. Though I do not envy such programs; the boundary for them becomes quite fuzzy if you have non-Greek characters after the ς. Does that count as a new word or doesn't it? That is the kind of question where an API can never win -- no matter which way it goes, there will be some people who do not like the answer.
Anyway, that is why ς is not uppercased only in the linguistic table. Because there are too many cases where the results simply don't make sense, at least not as things are implemented currently....
This post brought to you by "ς" (U+03c2, a.k.a. GREEK SMALL LETTER FINAL SIGMA)
A character that wonders whether Unicode would have been simpler if it did not exist as an independent entity, and fionts could then decide whether to make it a "final" form or not....
# Maurits on Thursday, May 26, 2005 1:42 PM:
# Michael S. Kaplan on Thursday, May 26, 2005 3:27 PM:
# Maurits on Thursday, May 26, 2005 3:53 PM:
# Michael S. Kaplan on Thursday, May 26, 2005 3:59 PM:
# Maurits on Thursday, May 26, 2005 5:10 PM:
# Michael S. Kaplan on Thursday, May 26, 2005 5:18 PM:
# Michael S. Kaplan on Thursday, May 26, 2005 5:19 PM:
# Michael S. Kaplan on Thursday, May 26, 2005 5:33 PM:
# Maurits on Thursday, May 26, 2005 7:26 PM:
# Michael S. Kaplan on Thursday, May 26, 2005 7:45 PM:
# Maurits on Friday, May 27, 2005 4:41 PM:
referenced by
2009/07/30 I know I'll Never say Never... again, at least
2008/06/25 Seeing the tears, my heart went out to her as I asked her "Why the Long S?"
2007/09/14 How do I feel about lstrcmpi? I think it blows....
2007/06/12 The difference between 'Dangeous Characters' and 'Dangerous Minds' is the lack of Michelle Pfeiffer
2005/06/24 LCMapString's *other* job