by Michael S. Kaplan, published on 2005/04/30 03:39 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/04/30/413676.aspx
Ok, here is the updated code for that internationally savvy palindrome checker. It supports that interesting situation with ligatures like fl (U+fb02, a.k.a. LATIN SMALL LIGATURE FL) vs. an lf on the far side, originally suggested by our old friend Maurits (with comments):
//////////////////////////////////////////
//
// IsPalindrome
//
// in : a string
// out: true if the string passed in is a palindrome.
//
// NOTES: This function handles both canonical/compatibility
// equivalences and grapheme clusters (a.k.a. text
// elements) as defined by the Unicode Standard.
//
//////////////////////////////////////////
bool IsPalindrome(string st) {
// A null string or a ZLS is not a palindrome
if ((null == st) || (0 == st.Length)) return false;// Convert to NFKC and set up the text element detection object
StringInfo si = new StringInfo(st.Normalize(NormalizationForm.FormKC));
int count = si.LengthInTextElements;for (int i = 0; i < (count / 2); i++) {
// get the text elements for comparison
string st1 = si.SubstringByTextElements(i, 1);
string st2 = si.SubstringByTextElements(count - i - 1, 1);// see if the text elements on each side are linguistuically equivalent
if (CultureInfo.CurrentCulture.CompareInfo.Compare(st1, st2) != 0) {
// they are not, so it is not a palindrome.
return(false);
}
}// both ends appear to be equivalent; it is a palindrome.
return (true);
}
Now, Maurits went on in his Channel 9 posting to discuss sort elements, or cases when two or more characters are to be given a single sort weight (kind of the opposite of an expansion like these ligatures). However, in my opinion these are not really suitable for a palindrome detection algorithm, as I don't think they are usually treated as letters except in the case where they are also treated as unique text elements (the case covered by the StringInfo code).
Any native speakers of languages with such constructs as the Spanish ch and the Hungarian dzs who think they should or should not be treated as a unit in trying to detect palindomosity should feel free to leave a comment to that effect. Also, if any of my collagues in the GIFT group agree or disagree here (and they are reading this!) they are invited to do the same (or stop me in the hall and accost me with this information!).
If I mistaken on this point, then a very interesting problem develops since there is really not an easy method for detecting such cases given the current collation function set (although I can imagine a few avenues of attack and we obviously have the underlying data if we had to support an IsPalindome function in Win32 NLS!). This might even make an interesting interview question one day for a very talented candidate.... :-)
Now, aside from all that, it is important to note that normalization makes some uses of text elements in this context completely unneccessary -- after all, either technology will treat U+0061 U+030a (a + combining ring) as identical to U+00e5 (a ring), one by conversion and the other by giving identical sort weights. Therefore, there is some overlap between the two technologies. However, there are some differences:
So it is fair to say that both technologies as provided in the Whidbey release are potentially useful in (of all things) the detection of palindromes, today and tomorow. I am sure the people who spec'ed, developed, and tested these feature are very proud for the technologial advances!
This post brought to you by "แซ์" (U+0e41 U+0e0b U+0e4c, a.k.a. THAI CHARACTERS SARA AE + SO SO + THANTHAKHAT)
The last two codepoints of which make up a text element, but all three of which make up a unique sort element!
# Krisztian Pinter on 30 Apr 2005 11:46 AM:
# Michael S. Kaplan on 30 Apr 2005 11:51 AM:
# Michael S. Kaplan on 30 Apr 2005 11:52 AM:
# krisztian pinter on 2 May 2005 11:33 AM:
# Michael S. Kaplan on 2 May 2005 11:47 AM:
# Maurits [MSFT] on 2 May 2005 12:15 PM:
# Maurits [MSFT] on 2 May 2005 12:28 PM:
# Michael S. Kaplan on 2 May 2005 12:32 PM:
# Maurits [MSFT] on 2 May 2005 12:38 PM:
# Tanveer Badar on 20 Dec 2007 9:25 AM:
Why didn't you do this
if ( string.IsNullOrEmpty( st ) ) return false;
instead of
if ((null == st) || (0 == st.Length)) return false;
Or are there any internationalization issues with IsNullOrEmpty?
# Michael S. Kaplan on 20 Dec 2007 11:46 AM:
No, it is just a choice to not push stack by mking it a method call. We all live in a world where we choose what code we will write (or if/when we will write code!).
referenced by
2005/06/15 Once more into the palindrome