Crossing the DIGITal divide...

by Michael S. Kaplan, published on 2004/12/01 06:58 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2004/12/01/272864.aspx


Digits have worked on Windows for a while. From the longstanding LCTYPE for GetLocaleInfo and SetLocaleInfo (LOCALE_SNATIVEDIGITS/LOCALE_IDIGITSUBSTITUTION) and its crazy weird support in Regional and Language Options in Windows (both of which support having the system render digits using the locale/user preference) to the also longstanding mapping type in FoldString to move many of the various ways to represent digits to plain old 0 to 9 (MAP_FOLDDIGITS), support has been in Windows for some time.

DIGITal Addeddum (08:45am): It occurred to me that I was a bit lax in talking about property support on Windows. Although this is in most cases pretty limited (a topic for another day!), for digits there is a great detection method -- the GetStringTypeEx API, with the CT_CTYPE1 info type. The returned array of character information will contain a C1_DIGIT info type for each code point that is a digit.

Of course life is easier on Windows since there is no parsing support. The support on the framework has been rockier -- there is no shipping version that has the equivalent of LOCALE_SNATIVEDIGITS or LOCALE_IDIGITSUBSTITUTION, and there is an honest mismatch that Raymond Chen and others have noted between Char.IsDigit (based on Unicode properties) and the semantics of the Parse method. The latter is not even supported in the newest "coming out RSN" (Real Soon Now) version of the .NET Framework (Whidbey), and although the former will be supported in Whidbey in new NumberFormatInfo properties (.NativeDigits and .DigitSubstitution) it is purely informational, with no effect on rendering at all.

To make it even more confusing, Whidbey supports a new CharUnicodeInfo class that has a GetDecimalDigitValue method. That method will get the Nd (Numeric Digit) value that any digit has, so that a developer can now not only detect what is a digit but they can also find out which digit it is -- the crucial step needed to make parsing support work.

So the interesting question is: why are these half measures there -- why do half of the feature without doing the other half?

Of course part of the problem is that calling it one feature (digit support) does not change the fact that it involves many different entirely separate features, owned by different teams.

Putting on the apologist hat (just for a moment, don't worry!), the items that are being added at least go as far as to let developers know what is available -- they can at least take it to the next level and add more sophisticated parsing support so that things that should be easy (like converting U+FF10 to U+FF19, the full-width digits 0123456789, to their thinner look-alikes) are at least possible without hacks. And maybe the next version of the CLR after Whidbey will support doing the extra step to parsing them in .Parse methods and in regular expressions.

But the big reason that support for NumberFormatInfo.NativeDigits and NumberFormatInfo.DigitSubstitution is being added to Whidbey is CUSTOM CULTURES. This feature is one that will be shared with future versions of Windows, where Longhorn will use the same support to allowed for the unmanaged analogue (CUSTOM LOCALES). And for that "feature sharing" to work it was important to make sure that at least the existing Windows support had a framework analogue.

Now the .NET Framework will have to decide in the next version how to take the next two steps and try to fill in the missing holes in the digit support feature. Each version makes for a stronger foundation (which is awesome) without providing the actual structure (which is not awesome in the slightest).


no comments

referenced by

2010/11/12 Suddenly, in a bit more time than a blink of an eye, "standards support" becomes "less i18n support"

2008/10/02 When swimming in a sea of CONTEXT, applications can drown (and there is no lifeguard)

2007/02/14 Nothing seems to be parsing the crap out of *this* number

2006/06/18 The Phantom of the Digits

2006/04/26 How to NOT Parse Unicode Digits, or How to: Parse Unicode Digits... NOT!

2006/02/22 And the digits just keep on coming

2006/01/18 Digits -- there is no substitute

2005/01/24 In Tamil -- sometimes, they are digits; other times, just numbers

go to newer or older post, or back to index or month or day