If you decompose those city elders, you might be able to sort them out!

by Michael S. Kaplan, published on 2006/12/27 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/12/27/1366186.aspx


Clearly the City Elders in Athens and Sparta and Thebes and Argos, being long dead, have decomposed at this point.

Now yesterday I posted about how The city elders won't give this string weight, either (aka On being consistently dead wrong, aka Ordinal or bust?).

Perhaps we can use the wisdom of ancient Greece to help us?

Although the attributed cause is indeed what was behind the reported problem, and although one can take advantage of Vista's support of Unicode 5.0 as I stated, there is another solution that can work here for a wider range of cases.

One can actually take advantage of normalization in this case -- taking inspiration from the current state of the City Elders long past, you can decompose the Greek text to help out here.

You see, it goes something like this (using Richard's example of U+1F96, rendering support may vary for you depending on all sorts of OS/browser/font issues):

U+1f96 (GREEK SMALL LETTER ETA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI) decomposes to:

ᾖ U+1F26 U+0345 (GREEK SMALL LETTER ETA WITH PSILI AND PERISPOMENI + COMBINING GREEK YPOGEGRAMMENI) which decomposes to:

ᾖ U+1f20 U+0342 U+0345 (GREEK SMALL LETTER ETA WITH PSILI  + COMBINING GREEK PERISPOMENI + COMBINING GREEK YPOGEGRAMMENI) which decomposes to:

ᾖ U+03b7 U+0313 U+0342 U+0345 (GREEK SMALL LETTER ETA + COMBINING COMMA ABOVE + COMBINING GREEK PERISPOMENI + COMBINING GREEK YPOGEGRAMMENI)

and everything in that bottom row does have weight in both Server 2003 and the .NET Framework.

Thus "\u1f96".Normalize(NormalizationForm.FormD) will give you something entirely sortable....

And this cam be extended to the rest of the extended Greek text!

Now obviously this will not work for characters using scripts in Unicode that versions of Windows prior to Vista don't handle at all, like Tibetan or Mongolian. But Greek has worked for a long time, and Unicode Normalization gives a solution to the problem that will work quite well in Microsoft and third party products not yet running on Vista! :-)

 

This post brought to you by (U+1f96, a.k.a. GREEK SMALL LETTER ETA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI)


no comments

referenced by

2007/12/07 If it isn't really Tibetan, could it pinch hit for Burmese?

2007/06/07 Putting the camel's nose in Building 24

2007/02/18 He had the strength of an OX[IA], I tell you

go to newer or older post, or back to index or month or day