by Michael S. Kaplan, published on 2008/05/21 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/05/21/8520660.aspx
Research Developer Mahmoud's question was quite reasonable on its face:
Hi All,
I was having a problem, and I kept debugging it till I reached to a very strange problem, below is a simplified code that explains the problem I am facing.
static void Main(string[] args) {
char c1 = (char)0x0629; //Arabic letter Taa-Marbouta
char c2 = (char)0x062a; //Arabic letter Taa-Maftohaa
string s1 = c1.ToString();
string s2 = c2.ToString();
if (s1.Equals(s2)) {
// s1 doesn't equal s2 so it won't enter
Console.WriteLine("Won't enter the if statement because the two strings are not equal");
}
if (s1.EndsWith(s2)) {
// Although s1 doesn't equal s2, and both of them are
// 1 char string, s1 is considered ending with s2 ??
Console.WriteLine("Shouldn't enter here also, however, it enters and prints this line!!");
}
if (s2.EndsWith(s1)) {
// Although s1 doesn't equal s2, and both of them are
// 1 char string, s2 is considered ending with s1 ??
Console.WriteLine("Shouldn't enter here also, however, it enters and prints this line!!");
}
// Print their length, just to make sure both of them contains only one character
Console.WriteLine("s1 length : " + s1.Length);
Console.WriteLine("s2 length : " + s2.Length);
}
As you can see from the code, Although s1 and s2 are to different strings, they are considered ending with each others. Does anyone have any ideas?
Thank you,
Mahmoud
This behavior is actually expected, and by design.
It is somewhat related to something I was talking about in Something .NET does less intuitively than they ought, where I referenced Josh Free's String.Compare() != String.Equals().
Because in most though not all versions of the .NET Framework in the world today, all of the following methods from the String class:
are in the same kind of "linguistic comparison" family, a family that String.Equals just is not a member of, in any version....
The two characters in question:
U+0629 ة ARABIC LETTER TEH MARBUTA
U+062a ت ARABIC LETTER TEH
are considered linguistically equal to each other prior to Vista and almost equal to each other in Vista and later -- which is where the seemingly odd equivalences are coming from above.
Since Arabic is in the default collation table, one can even test this in .NET on Vista by comparing en-US results to en-IN results, since the en-IN will go through the synthetic, "Windows only" path and will get the updated collation results that Vista provides.
Now the pseudo-mathematical expression in the title:
String.StartsWith || String.EndsWith != String.Equals
is really not entirely accurate since of course these are not analogous methods that do the same type of thing anyway.
Perhaps
String.StartsWith || String.EndsWith !≘ String.Equals
would be a bit better? :-)
I'll explain that "most though not all" stuff in a future blog post.
This blog brought to you by ≘ (U+2258, aka CORRESPONDS TO)
# Ben Bryant on 21 May 2008 3:02 PM:
just wish there was a method name prefix or suffix to identify the "linguistic comparison" family of the function, a la CompareStringOrdinal vs CompareString?
I discussed this kind of "family" problem for an older family split in an old post called: "The secret family split in Windows code page functions" at http://codesnipers.com/?q=node/46
referenced by
2008/09/25 You're not my type if you have no culture