by Michael S. Kaplan, published on 2007/11/02 10:16 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/11/02/5808539.aspx
Oli reported what he thought was a bug:
Hi Michael,
I found some inconsistencies in the culture info provided on XP. For example, some cultures have a short time pattern with seconds in the pattern (e.g. Welsh). It seems that overall the only distinction between a long and a short time pattern is that the prior includes seconds, and the latter doesn't, so I'd take the Welsh case as an inconsistency.
Also for Welsh the (default) short time pattern is equivalent to the long time pattern, and none of the culture info date/time patterns includes a time pattern without seconds.
The same is the case for some other locales/cultures.
Let me know whether to file bugs through MS Connect (which usually are ignored or classified "by design") or whether you'd like me to send you more information about the inconsistencies directly.
Cheers, Oli
PS: The most annoying bug "by design" is that .Net's regular expression evaluator applies a shortest match strategy (instead of the usual longest match) on disjunctions. "a|ab" applied on "ab" will only match/consume "a", while "ab|a" applied on "ab" will match the whole input. The response I got was that perl does it just like that... My suggestion to then add a "longest match" variant of the disjunction operator so that one would have some control over what's going on was unfortunately ignored.
Let's leave aside that RegEx thing as I have no strong opinion on it and this blog is all about my strong opinions. :-)
The issue with the culture differences is by design.
A basic difference between Windows-Only cultures and built-in .NET Framework cultures, especially as they relate to the fundamental problems with that abomination the SHORT TIME that I have taunted previously in posts like Customizing the SHORT time format? and We do seem to be short on time....
In short, Windows-Only cultures are synthetic objects built from the built-in support within Windows, and since Windows does not distinguish between long time and short time as two unique settings, there is no option that has been reviewed by the owners of the data that is available.
Now given that the equivalent functionality of the "short time" format is available in Windows via flags to GetTimeFormat that I discussed in We do seem to be short on time..., it is possible to hypothesize working backward from it -- using GetTimeFormat with the TIME_NOSECONDS flag to format any string using a short time format, or better yet providing the short time format itself by revere engineering the results of that call.
But that is also a lot of work to support a time format which frankly speaking is really awful and has nothing to do with anything other than maintaining a legacy VB feature that has nothing to do with internationalization.... :-)
But let me take a step back and talk about Windows Only cultures. :-)
Ignoring occasional heroics that I find myself involved with (which, let's be honest are really the exception, not the rule!), there are a handful of times that I feel like I've been involved in something really unique, and these beasts are one of them....
It was back in the early days of MSKLC, and Igor (then an NLS tester) had happened to randomly try to combine the Enabled Language Kits (ELK) feature (also in beta at the time) with MSKLC. And this was before the update to the .NET Framework that kept managed applications from crashing if you used an ELK locale for the user locale -- crashing becasuse it was unable to create a CultureInfo from the validated return value of GetUserDefaultLCID....
So, predictably (in retrospect)1, MSKLC crashed.
The short term fix was to just make CurrentCulture something else entirely so the call would succeed and just fill in some other locale. One that was valid.
But the scenario was an intuitively compelling one, much more compelling than supporting Indic locale data on platforms that did not contain fonts -- because in this case the support was there and users would have a good faith basis for expecting them to work.
But everyone dreaded the long term fix, as the path that everyone's mind seemed to be on involved maintaining two huge dynamic data sets and updating the .NET ones any time the Windows ones were. In perpetuity (or whatever the "supported platform" definition of perpetuity entailed....
Everyone hated this road, but they felt like it was the only responsible one to take. We walked down this path, resigned.
I thought of another idea one night. What I was doing that night was really amusing and I might talk about it another day, if I am ever again as drunk as I was that night and able to blog at the same time.
The next morning I was at work, I put on a PM hat that was lying around (people tell me my office is a mess and they aren't wrong when they tell me that) and wrote a doc suggesting a feature of synthetic cultures -- any time Windows knew of a locale and .NET didn't, the managed code would just build its own synthetic CultureInfo by asking Windows for all of the information.
Though the PM hat on me was not a 100% fit then. So though the doc was more than a spec since it had actual implementation suggestions, it was also less than a spec since it did not have the feature name, general terminology suggestiona, explicit justification, or scenarios filled in (all crucial to getting it approved as a feature!) - luckily there were real PMs to finish it up and turn the document into an actual spec.
Thusly, synthetic cultures became Windows only cultures!
And Tarek (yes, that Tarek, the one who wrote the cool GetDateFormat wrapper!) put on his developer hat and implemented almost everything from that spec, including just about every feature I suggeted even though I assumed it would get whittled down at the time.
Now he was able to do this because the testers (who were wearing their tester hats already) actually felt would be EASIER to test with the fuller coverage of the feature was since there was a rich suite of tests covering managed vs. unmanaged functional parity to leverage. Every different behavior would be a special case that would be more work to test.
A great example of cross-discipline teamwork to solve a genuine problem originally found almost entirely by accident by noticing two unrelated features not working well with each other!
And an important functional direction to properly handle a world where both the Framework and an underlying platform exist, both have locale/culture support, and there can be a reasonable expectation of parity between them -- more on this (and some of the unexpected challenges) another time. :-)
1 - Predictably (in retorospect) is a phrase that really captures the spirtit of the kind of problem that you completely understand (just as soon as you understand it), with no new knowledge required in order to realize the cause -- just having the problem described allows you to imediately understand what is happening.
This post brought to you by h (U+0068, a.k.a. LATIN SMALL LETTER H)
# Maurits [MSFT] on 5 Nov 2007 12:26 AM:
RE: regex:
It has nothing to do with shortest vs. longest. If there's a series of options separated by |, the regex engine will try to match each option /in turn/.
For example, consider these two patterns:
1) a(b|bc)d
2) a(bc|b)d
And these two strings:
I) abdabcd
II) abcdabd
1) and 2) will both match I) on the "abd"; and they will both match II) on "abcd". So it has nothing to do with the length of the match, but rather the positioning of the matches in the string.
It is possible to generate silly regexes, of course. Take the final d off of 1), and it will *always* match the first "ab" sequence... never using the "c".
referenced by
2010/01/27 We do seem to be short on time... (Windows 7 edition)
2008/10/19 I see LONG TIME and SHORT TIME; where are SHORTER TIME and SHORTEST TIME?
2008/09/25 You're not my type if you have no culture
2008/03/16 On changing the world, or at least the way people order things in it
2007/12/22 CultureInfo subsetting attempts that suck
2007/11/26 When yesterday's workaround becomes tomorrow's potential solution...