The downside of managing to go native...

by Michael S. Kaplan, published on 2011/06/21 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/06/21/10177222.aspx

It is common knowledge to those not guilty of my dear boy type offenses that native, Win32 NLS pre-dates the managed System.Globalization classes by several years.

And it is perhaps not as completely well known but still fairly common knowledge that the principal developer of the former was in fact the initial architect of the latter.

It is also pretty common knowledge that the underlying data of one has always a minimum been an extension of the other, eventually leading to a common data store and format, and a not insignificant amount of code sharing.

But even knowing all that, it is easy to forget some basic compatibility issues that exist between these two fraternal twins.

Like just yesterday when a tester asked:

Hi,

A component writes a date string using GetDateFormatEx() API with DATE_AUTOLAYOUT:

GetDateFormatEx(
        LOCALE_NAME_INVARIANT,
        DATE_AUTOLAYOUT | DATE_SHORTDATE,
        SysTime,
        NULL,
        dateStr,
        MAX_PATH,
        NULL))

This produces a string containing 'LEFT-TO-RIGHT MARK' (U+200E) characters:

DateTime.Parse() for this string is failing with “String was not recognized as a valid DateTime” because of this extra chars.

What is the correct way to parse the date string (in C#)??

Now GetDateFormat[Ex] has had this support in some form for quite a while:

Value	Meaning
DATE_AUTOLAYOUT	Windows 7 and later: Detect the need for right-to-left and left-to-right reading layout using the locale and calendar information, and add marks accordingly. This value cannot be used with DATE_LTRREADING or DATE_RTLREADING. DATE_AUTOLAYOUT is preferred over DATE_LTRREADING and DATE_RTLREADING because it uses the locales and calendars to determine the correct addition of marks.
DATE_LTRREADING	Add marks for left-to-right reading layout. This value cannot be used with DATE_RTLREADING.
DATE_RTLREADING	Add marks for right-to-left reading layout. This value cannot be used with DATE_LTRREADING

But the last decade of managed code support in the System.Globalization namespace has been unable to produce any version that will either use this functionality to format date strings.

And that same decade has failed to produce any code designed to parse strings produced via any of these flags.

Note that "support" for parsing would simply be adding the ability to ignore U+200e and U+200f, but supporting the parsing would certainly lead to a demand for support of the formatting.

Unfortunately, it is very common for tests of many different components to be written in managed code -- which means this question comes up a lot more often than one might expect, given the need to use these flags to get strings that will display properly....

The workaround?

You will need to walk the string, stripping out all instances of the following characters -- the first two in the table below are inserted by GetDateFormat[Ex] when passing any of the three flags above, the rest could be inserted by other, more sophisticated processes (or RtL language localizers doing their job):

Code point	Character name
U+200e	LEFT-TO-RIGHT MARK
U+200f	RIGHT-TO-LEFT MARK
U+202a	LEFT-TO-RIGHT EMBEDDING
U+202b	RIGHT-TO-LEFT EMBEDDING
U+202c	POP DIRECTIONAL FORMATTING
U+202d	LEFT-TO-RIGHT OVERRIDE
U+202e	RIGHT-TO-LEFT OVERRIDE

Now looking at the reason that we could really go more than a decade without managed code supporting something that native code added so long ago, there are a few (competing?) theories:

the flags are primarily for UI strings, so a programmatic library like .NET might simply decide that support is not a priority;
while the globalization classes are owned by the same team that owns the NLS code, the parsing/formatting code is not; this support would be a fairly significant feature to add to the code that the current owners might consider out of scope;
with different owners of different pieces, the situation is fairly stable/functional; a big effort to enhance the code might lead to a change in ownership, and the status quo is often preferred (note that this is one of the main issues that blocks custom calendar support, as I discuss in Will someone take up the job of Calendar support in .Net, please?, arguably a more "important" issue to solve).

In the long run, given that there are such issues, it would be nice if some team just forgot about the politics and tried to solve the problems....

From a Microsoft standpoint, the number of groups that write automation that use managed code is significant enough that I think fixing these problems could be justified solely on a "being a good internal Microsoft citizen" standpoint. But maybe that's just me. :-)

no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day