Update on the CharUnicodeInfo class

by Michael S. Kaplan, published on 2005/09/09 15:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/09/09/462934.aspx


Back in January of this year I said a little bit about the new CharUnicodeInfo class.

And then back in March of this year I talked about the stability of the Unicode character database.

I specifically tied these two together at the end of that second post:

So what does that mean for us in the world of the .NET Framework and the new class in Whidbey that captures (among other items) the Unicode general category, as described in A little bit about the new CharUnicodeInfo class?

Well, it means two thing, primarily:

  1. These values will not change very often.
  2. There are times that some will change. Not many, and there is always a carefully thought out reason, but it can happen. And the class is not called "CharMicrosoftSpinOnUnicode" which means that by and large the class needs to follow the standard. Any code that you write using the CharUnicodeInfo class must take this into account....

As Microsoft gets better and better about standards, it will become more and more important for code to recognize that this sort of thing is possible.

That text was recently brought to the test in the first release to include CharUnicodeInfo, in a decision to update the version of Unicode it supported from 3.2 to 4.1.

When I read that stability post, I remember that I was actually at all of the Unicode Technical Committe meetings when the changes between 3.2 and 4.1 were decided, and I can promise you that the concerns about compatibility with the changes that were made were very serious and were very extensively discussed.

And I remember the sense of deja vu I got when I was explaining to the various people involved with Whidbey breaking changes on why this update was important. They had the same concerns, even for a breaking change between beta versions of the class. But the truth is that a class purporting to represent Unicode has to represent Unicode. Truly. Even if it does mean an occasional change that impacts code that depends on the results.

I can promise you that the people at UTC meetings representing Microsoft will be taking a stronger interest in changes made here, to make sure they are as small as they possibly can be. And no smaller....

 

This post brought to you by "þ" (U+00fe, a.k.a. LATIN SMALL LETTER THORN)


no comments

referenced by

2013/09/27 In search of GetBidiCategory...

2008/11/10 Grease is the word; ░░░░░░ not so much...

2006/07/22 Behind the return of the Unicode IME

2005/12/23 What Unicode version do you support?

go to newer or older post, or back to index or month or day