String.StartsWith || String.EndsWith != String.Equals, either

by Michael S. Kaplan, published on 2008/05/21 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/05/21/8520660.aspx

Research Developer Mahmoud's question was quite reasonable on its face:

Hi All,
I was having a problem, and I kept debugging it till I reached to a very strange problem, below is a simplified code that explains the problem I am facing.

        static void Main(string[] args) {
            char c1 = (char)0x0629; //Arabic letter Taa-Marbouta
            char c2 = (char)0x062a; //Arabic letter Taa-Maftohaa

            string s1 = c1.ToString();
            string s2 = c2.ToString();

            if (s1.Equals(s2)) {
                // s1 doesn't equal s2 so it won't enter
                Console.WriteLine("Won't enter the if statement because the two strings are not equal");
            }

            if (s1.EndsWith(s2)) {
                // Although s1 doesn't equal s2, and both of them are
                // 1 char string, s1 is considered ending with s2 ??
                Console.WriteLine("Shouldn't enter here also, however, it enters and prints this line!!");
            }

            if (s2.EndsWith(s1)) {
                // Although s1 doesn't equal s2, and both of them are
                // 1 char string, s2 is considered ending with s1 ??
                Console.WriteLine("Shouldn't enter here also, however, it enters and prints this line!!");
            }

            // Print their length, just to make sure both of them contains only one character
            Console.WriteLine("s1 length : " + s1.Length);
            Console.WriteLine("s2 length : " + s2.Length);
        }

As you can see from the code, Although s1 and s2 are to different strings, they are considered ending with each others. Does anyone have any ideas?

Thank you,
Mahmoud

This behavior is actually expected, and by design.

It is somewhat related to something I was talking about in Something .NET does less intuitively than they ought, where I referenced Josh Free's String.Compare() != String.Equals().

Because in most though not all versions of the .NET Framework in the world today, all of the following methods from the String class:

are in the same kind of "linguistic comparison" family, a family that String.Equals just is not a member of, in any version....

The two characters in question:

U+0629 ة ARABIC LETTER TEH MARBUTA

U+062a ت ARABIC LETTER TEH

are considered linguistically equal to each other prior to Vista and almost equal to each other in Vista and later -- which is where the seemingly odd equivalences are coming from above.

Since Arabic is in the default collation table, one can even test this in .NET on Vista by comparing en-US results to en-IN results, since the en-IN will go through the synthetic, "Windows only" path and will get the updated collation results that Vista provides.

Now the pseudo-mathematical expression in the title:

String.StartsWith || String.EndsWith != String.Equals

is really not entirely accurate since of course these are not analogous methods that do the same type of thing anyway.

Perhaps

String.StartsWith || String.EndsWith !≘ String.Equals

would be a bit better? :-)

I'll explain that "most though not all" stuff in a future blog post.

This blog brought to you by ≘ (U+2258, aka CORRESPONDS TO)

# Ben Bryant on 21 May 2008 3:02 PM:

just wish there was a method name prefix or suffix to identify the "linguistic comparison" family of the function, a la CompareStringOrdinal vs CompareString?

I discussed this kind of "family" problem for an older family split in an old post called: "The secret family split in Windows code page functions" at http://codesnipers.com/?q=node/46

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2008/09/25 You're not my type if you have no culture

go to newer or older post, or back to index or month or day