by Michael S. Kaplan, published on 2005/04/29 02:36 -07:00, original URI: http://blogs.msdn.com/michkap/archive/2005/04/29/413366.aspx
I used it in a very confusing and obfuscated way in Normalization as obfuscation in C#. And then yesterday I used it again in my internationally savvy palindrome checker, in a slightly more intuitive manner.
It is the all new StringInfo class in Whidbey.
Now the old StringInfo class had only static methods -- in other words it was a walking FxCop violation.
And the main method it had was StringInfo.ParseCombiningCharacters, which was a static method that would take a string and return an array of int values, each one of which would be an index into that string that showed where a new text element was started. A text element could be a single letter, a letter and a diacritic, a letter and a bunch of diacritics, a hugh and low surrogate representing a surrogat pair, etc.
ParseCombiningCharacters is an incredibly useful method, but it is not very intuitive to use, certainly not and use effectively. The same goes for the other methods for dealing with text elements (GetTextElementEnumerator and GetNextTextElement) -- people were just getting confused.
But people have no problem understanding the need to be able to count entities based on what a typical user might think a character is. Once one explains what a text element is, they immediately understand the need for ways to make use of them.
So we had some meetings to talk about how to make the ways to work with text elements more intuitive, at least as intuitive as the concept of a text element itself. In the last of those meetings, someone pointed out that people usually had no problem understanding the semantic of the Substring method or the Length property of System.String. Maybe we could learn a lesson from that?
And viola, the SubstringByTextElements method and the LengthInTextElements property were born!
Each behaves just like their cousins, the Substring method and the Length property, but rather than being based on UTF-16 code points, they are based on text elements, or what the user might reasonably point to and call a character. The same thing that the Win32 CharNext and CharPrev functions do (at least, when we have not accidentally broken them!).
Now the method and property are useless if there is not some object that they can hang off of which has the string. People were leery about adding them directly to System.String since they really want to try keep that object as lightweight as they can (and some would even say they are not trying hard enough on that). That's when somebody remembered this class you could instantiate yet had no instance methods, this FxCop violation with a hat. And we added a constructor that takes a string and a StringInfo.String property to retrieve the string later if you wanted or change it without having to tear down the object.
Now we were rolling....
Internally, it just uses that incredibly useful but not-so-intuitive StringInfo.ParseCombiningCharacters and stores that System.Int32 array. That makes StringInfo.LengthInTextElements a simple call to Length on the array, and StringInfo.SubstringByTextElements is a simple tip-toe through the array, using the very start and length parameters that the method contains in order to know where and how far to go. So we get to be intuitive and pretty fast at the same time. and we get to get rid of that FxCop issue, to boot. Everybody wins!
This post brought to you by "¾" (U+00be, a.k.a. VULGAR FRACTION THREE QUARTERS)
# Maurits on Friday, April 29, 2005 8:13 AM:
# Srikanth on Friday, April 29, 2005 9:23 AM:
# Michael S. Kaplan on Friday, April 29, 2005 10:07 AM:
# Wayne Steele on Friday, April 29, 2005 12:20 PM:
# Michael S. Kaplan on Friday, April 29, 2005 1:24 PM:
# Maurits on Friday, April 29, 2005 3:56 PM:
# Maurits on Friday, April 29, 2005 5:43 PM:
# Michael S. Kaplan on Friday, April 29, 2005 9:01 PM:
referenced by
2008/12/09 UCS-2 to UTF-16, Part 8: It's the end of the string as we know it (and I feel ellipses)
2008/11/24 UCS-2 to UTF-16, Part 6: An exercise left for whoever needs some exercise
2008/07/24 When you assess, you make an...
2007/05/09 Sometimes you need more than StringInfo
2007/03/04 String Indexing?
2006/11/10 Some people feel really insecure about the size of their [string] members
2005/06/15 Once more into the palindrome