String Indexing?

by Michael S. Kaplan, published on 2007/03/04 18:31 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/03/04/1806219.aspx

Sometimes I see a documentation topic that bothers me a little bit.

And then occasionally I'll see one that bothers me a lot.

Like that .NET Framework Developer's Guide: Custom Case Mappings and Sorting topic that I complained about in Custom Case Mappings?

(Note that none of that topic seems to have been fixed, though I had suggested the need for such when I posted my critique!)

Well, I was pointed another topic the other day -- this time it is the .NET Framework Developer's Guide: String Indexing topic.

It is in the VS 2005 documentation, yet talks about none of the new Stringinfo methods that I talk about in this post which were added specifically because there was so much feedback about how the existing methods were confusing.

Bummer.

And then there is this bit of text:

Next, a string containing surrogate pairs is created. In strSurrogates, for example, the Unicode code U+DACE represents a high surrogate and the Unicode code U+DEFF represents a low surrogate. Together, these codes represent a surrogate pair and must be parsed as a single text element.

Now I am all for getting more people to be aware of supplementary characters and the surrogate pairs by which they are encoded in UTF-16. But a quick scan of the ranges shown in The Basics of Supplementary point out that the high surrogate in question (U+dace) is used for the representation of Plane 12 characters.

And plane 12 is currently reserved, has no weight in sorting, and has no proposed, planned, or even provisionally thought to be needed allocation in the Unicode Roadmap.

Which on a scale of 1 to lame does not do so well as examples go....

The end note kind of takes the cake though, for me:

If you execute this code in a console application, the specified Unicode text elements will not be displayed correctly because the console environment does not support all the Unicode characters.

Okay, so why did we go through this?

Summary: why would the sample be built around an Arabic script example which can't ever work in the console and a weightless plane 12 example which can't ever work anywhere, wrapped up in the part of the technology that was deemed so hard to understand that we added a whole new model to how to handle them that is documented in the very StringInfo class that the Developer's Guide topic points to?

This post brought to you by U+dace, a HIGH SURROGATE code value used for Plane 12 characters

no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day