Looking for that internationally savvy palindrome checker....

by Michael S. Kaplan, published on 2005/04/28 10:42 -07:00, original URI: http://blogs.msdn.com/michkap/archive/2005/04/28/413019.aspx


Jonathan Payne asked if I had an international thought about the palindrome pseudo interview question at this site:

http://channel9.msdn.com/ShowPost.aspx?PostID=19171  

I did. :-)

Using the new StringInfo stuff in Whidbey Beta 2:


bool IsPalindrome(string st) {
    StringInfo si = new StringInfo(st);
    int count = si.LengthInTextElements;

    if (count == 0) return false;

    for (int i = 0; i < (count / 2); i++) {
        string st1 = si.SubstringByTextElements(i, 1);
        string st2 = si.SubstringByTextElements(count - i - 1, 1);

        if (CultureInfo.CurrentCulture.CompareInfo.Compare(st1, st2) != 0) {
            return(false);
        }
    }

    return (true);
}

 

Quickest way to handle all those cool issues like cultural sensitivity and combining characters and supplementary characters and such!


# Michael S. Kaplan on Thursday, April 28, 2005 11:42 AM:

By the way, this is a good example of an interesting trick you can do to be impressive interviews in almost any group other than ours -- in most interview loops, devs are imperssed by people who point out the internationsl scenarios in the questions.

Does not work as well in our group, as we may be more likely to find mistakes and then it just looks like you are trying to show off and failing. But it is great in many other cases. :-)

# ronab49 on Thursday, April 28, 2005 1:54 PM:

you will get better performance (for a non-opitimizing compiler) and a more reaable code if you manually move the calculation of the for loop length before the for loop:

...
int nComparision = ((count % 2 != 0) ? count - 1 : count) / 2;
for (int i = 0; i < nComparisons; i++) {
...

Now, it is easy to see that the first line above is the same as (assuming count is >= 0):
...
int nComparision = count / 2;
...


I do not know about the details of StringInfo functions, but you can get better performance if it is possible to precompute the SubsStringByTextElements(j, 1) for all consecutive values of j, in less time than computing SubsStringByTextElements(j, 1) in some random order. That is, I would write the following code (I also like to consider zero length strings as palindromes):

bool IsPalindrome(string st) {
StringInfo si = new StringInfo(st);
int count = si.LengthInTextElements;

TextElementEnumerator k=si.GetTextElementEnumerator(st);

vector <string *> precomp_st;
int count=0;
do {
precomp_st[count++] = k.GetTextElement();
} while (k.MoveNext());

int nComparision = count / 2;

for (int i = 0; i < nComparisons; i++) {
if (CultureInfo.CurrentCulture.CompareInfo.Compare(*precomp_st[i], *precomp_st[count-i-1]) != 0)
return(false);
}

return (true);
}

# Michael S. Kaplan on Thursday, April 28, 2005 2:14 PM:

Interesting -- of course creating another object (a text element enumerator) when you already did the work to create an object that has the divisions might negatively impact some of the perf gains....

Also, there was a recent blog post on Brad Abram's blog at http://blogs.msdn.com/brada/archive/2005/04/23/411321.aspx which suggests that caching outside the loop can hurt performance.

(I knew about this post and still split out the count to make it look more readable!)

Finally, this code will fail for odd numbers of text elements since is just doing a "divide by 2" and not checking for the odd number case. :-)

Do you get the job if you optimize the code in a way that makes it not work and potentially slower? :-)

# Jonathan Payne on Thursday, April 28, 2005 4:31 PM:

Thank you - I suspect you would have done very well in that interview!

The classes you used look like a very neat way of handling this sort of problem. I guess in a lot of cases, working with strings in this way should be the preferred (or at least more correct) option over using array indexing even if internationalisation isn’t an immediate goal.

# Maurits on Thursday, April 28, 2005 5:07 PM:

I've written a test-driving app for palindrome-checking using any combination of CompareOptions:
http://channel9.msdn.com/ShowPost.aspx?PostID=63167#63167

Also see my follow-up posts to your post linked above

# ronab49 on Thursday, April 28, 2005 5:59 PM:

-- Interesting -- of course creating another object (a text element enumerator) when you already did the work to create an object that has the divisions might negatively impact some of the perf gains....

Not in C or C++. And I do not know enough about C# or VB to claim otherwise. However, I am going to provide you running times using VS .NET 2003 in a few minutes.

-- Also, there was a recent blog post on Brad Abram's blog at http://blogs.msdn.com/brada/archive/2005/04/23/411321.aspx which suggests that caching outside the loop can hurt performance.

Yep, I had read it, the premise was that if the JIT cannot deduce the range of the index variable at runtime, it will try to perform range checking for every access. However, my experince with compiler are that they are not that smart to deduce the effective range of slightly complex expression like the one you used.

Moreover, I believe the code behaves correctly for the cases with the odd lengths. Unless you point me to a case where it does not!

Now, back to the TextEnumerator ... As I wrote before, I do not know the details of StringInfo, but my educated guess is that everytime you use its length or substring functions, it will scan the string from its beginning. So, for a string of length N, the code gave about is O(N^2). However, it becomes O(N) if a textenumerator is used.

Besides, this annoying perf improvements. If you precomputer the text elemnts, using the collating functions could be much faster.

--Do you get the job if you optimize the code in a way that makes it not work and potentially slower? :-)

Was this an interview question ? Oh no! I should have reviewed my code before I hit the submit button then. :)

# Maurits on Thursday, April 28, 2005 6:04 PM:

--
I do not know the details of StringInfo, but my educated guess is that everytime you use its length or substring functions, it will scan the string from its beginning
--

Hmmm... I'd guess that it builds an array of text elements, similarly to the way I use the TextElementEnumerator to build a String[]
http://channel9.msdn.com/ShowPost.aspx?PageIndex=5&PostID=63246#63246

If that is in fact the case, then it wouldn't need to rescan.

# Michael S. Kaplan on Thursday, April 28, 2005 6:39 PM:

Basically, StringInfo is a wrappear around ParseCombiningCharacters. It calls the method once and never calls it again unless you change the string.

It has size is the size of an array of ints that store the text elements, so it does not walk the string again for the count (and array stores the length).

The object has the string in there, and is uses a char array sort of access -- so it is not going through the string each time, it would be like C's &lpwz[n].

PLUS if it is not a palindrome the original code exists without getting every string, while your code always gets all strings.

I am pretty the original code may be faster, especially for long non-palindrome strings.

# Michael S. Kaplan on Thursday, April 28, 2005 7:00 PM:

For the odd number case of 7, it goes like this:

0 vs. 7-0-1 == 6
1 vs. 7-1-1 == 5
2 vs. 7-2-1 == 4
3 vs. 7-3-1 == 3

So, will it work? Yes. But did you just needlessly compare a character to itself? Yes. :-)

So you could cache the calced smaller number and save yourself an extra compare....

# Michael S. Kaplan on Thursday, April 28, 2005 7:04 PM:

Maurits -- The new StringInfo stuff does not cache the strings. Only the string iteslf and the index values of each text element are cached (see the ParseCombiningCharacters method for details)

Grabbing a substring does alloc a new string, but it does not have to walk the string to do so....

# ronab49 on Thursday, April 28, 2005 8:14 PM:

-- Basically, StringInfo is a wrappear around ParseCombiningCharacters. It calls the method once and never calls it again unless you change the string.

Great! I rest the case then. All I was trying to do was what StringInfo will do by itself.

However, you should be able to see that the two expression:
((count % 2 != 0) ? count - 1 : count) / 2
and
count / 2
evaulate to the same number for all positive integer values of count.

...

The difference in using
((count % 2 != 0) ? count - 1 : count) / 2
versus
count / 2
as a loop upper bound is neglibile when optimizing the code.

But with nonoptimized code, the difference is
observable:

((count % 2 != 0) ? count - 1 : count) / 2
Total Time: 0.9375

count / 2
Total Time: 0.859375

# Michael S. Kaplan on Thursday, April 28, 2005 10:06 PM:

Hey ronab49 -- Well what about the third case -- do the calc outside the loop? That and a bunch of long strings may be slightly better....

Plus it just feels wrong to me to compare that middle character to itself. :-)

# Maurits on Thursday, April 28, 2005 11:44 PM:

Oh, I see... ParseCombiningCharacters stores the byte offset of the *i*th grapheme in offset[i], allowing for a fast lookup. If the data is seven-bit, or all single-byte Unicode, then offset[0] == 0, offset[1] == 1, ... offset[i] == i - but if any graphemes are multibyte (2 bytes, 3 bytes, 4 bytes...) then offset[i] > i...

Do you perform another internal optimization and store offset_optimal[i] as offset[i] - i? So if the data is all single-byte, offset_optimal[i] == 0 for all i... and in general offset_optimal[i] is the number of "extra bytes" in graphemes 0 through i - 1
?

# Michael S. Kaplan on Thursday, April 28, 2005 11:55 PM:

We do not do any other optimizations, no. The goal was a more usable method than ParseCombiningCharacters that would not be slower -- but it still keeps that int array that it returns....

But it is pretty fast. And arrays keep their lengths in a member rather than alking the array for the value.

# ronab49 on Friday, April 29, 2005 12:24 AM:

well, neither your code nor mine, compares the middle character to itself. Say, for count == 7, nComparisons will be (7 / 2) = 3. The compared grpahemes are then:

0 and 6
1 and 5
2 and 4

and the loop stops right here.

BTW, why is not there a preview button on the comments form?

# Michael S. Kaplan on Friday, April 29, 2005 12:34 AM:

Blame Community Server for the preview thing, it is out of my control....

# Michael S. Kaplan on Friday, April 29, 2005 12:36 AM:

And you are right on the count thing,not sure what I was thinking....

# Uwe Keim on Friday, April 29, 2005 3:25 AM:

Am I paranoid that I always use

"...if ( count <= 0 ) ..."

instead of

"...if ( count == 0 ) ..."

whenever I have int returns like in "string.Length"?

referenced by

2008/09/18 UCS-2 to UTF-16, Part 3: It starts with cursor movement (where MS simultaneously gets better and worse)

2008/07/24 When you assess, you make an...

2007/05/09 Sometimes you need more than StringInfo

2005/06/15 Once more into the palindrome

2005/04/30 Normalization vs. .NET text elements

2005/04/29 Where did the new StringInfo stuff come from?

go to newer or older post, or back to index or month or day