'Evil date parsing', Parse, and ParseExact

by Michael S. Kaplan, published on 2004/11/26 12:30 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2004/11/26/270769.aspx


'Evil date parsing' has quite an ignoble history. Rooted in COM (which was itself rooted to older versions of Visual Basic), converions from string to date had the simple job of making a string into a date, no matter what the cost. The benefits are obvious, but the problems range from performance issues to the cost of getting bad data by improper parsing. The latter is in fact why the parsing was considered evil for many purposes, because when the format is dd/mm/yy there are just as many people who wanted "01/13/2008" to fail as who wanted it to succeed. The fact that it would succeed simply clouded the issue of the meaning of "01/06/2004" since the difference between January 6th and June 1st are obvious and frightening for some applications.

The .NET Framework's DateTime.Parse method has goals not unlike those from older products, but because of this often suffers from the same problems. This road leads to less performant code that for every customer who loves that it parsed their dated will find another who will be unhappy that it missed an entirely reasonable format. They did solve some of the evil problems, but there were still plenty more that came out. By trying to work for everyone, the method ends up with four groups of customers:

  1. those who are unhappy,
  2. those who are happy,
  3. those who are happy now but will be unhappy when they find out how screwed up their wrong data is due to incorrect parsing.
  4. those who are unhappy now since the code that used to work in VB5 or VB6 does not work since the parsing changed.

ParseExact, on the other hand, takes the exact formats specified in the DateTimeFormatInfo object and uses them -- it uses nothing else. There is no forgiveness for data that does not match, and the issue of whether or not gratuitous spaces should be forgiven makes for an interesting argument in the hallways of some buildings at Microsoft. Its goal that is more along the lines of "I gave you the format, now I will give you the strings in that format; just do the freaking job." This makes it faster and more exact as a semantic, and as such is much more suited if the flexibility of the other method is not desired. As a veteran of bugs implicit in evil date parsing, I am quite fond of a method with none of the problems it can cause.

In order to protect your own code, you may want to consider using ParseExact when you can, to help avoid those other problems. Flexibility is great when you need it, but when you don't its better not to risk the problems that are the price of flexibility....


# Adrian on 26 Nov 2004 12:23 PM:

I always use '6 jun 2004' format in code, be it sql,c# and when ever possible enforce this thru ui controls.
I seen systems require the user to set his/her date format in regional settings becuase the original programers couldn't cope with database asuming mmddyy where the local tradition is ddmmyy [shudder]

# AndrewSeven on 26 Nov 2004 3:51 PM:

Recently, I had an xml file that had a date in it.
I don't know what the name of the date format was.

I would have liked to be able to pass a format string and the date string to get a date.

# James Hancock on 27 Nov 2004 11:51 AM:

I've run into this all of the time. My solution is to use MMM/d/yyyy as the forced date format at all times, because it appears to be the only format that the .net framework won't screw up if the region changes (i.e. Canada versus the US, Japan versus the US etc.)

My biggest problem now is getting a nice UTC date /time stored and actually get it to come back at the right time even with day light's savings shifts. The only way I have been able to do this is to have a separate date field and an int for the number of minutes from Midnight and make everything relative to Jan 1st 1900 which shouldn't change. Otherwise we end up with head aches.

I really really really wish there was a format that .net would parse with time zone info on it. (i.e. that used by POP3 mail servers)

2004/01/06 1:00 -0500 should be parsed and it should parse correctly and consistantly. Currently .net will throw an error trying to parse that, which is ironic because it's the standard for POP3.

That in EVERY format and semantic should always translate to:

January 6th 2004 1:00 AM in EST (or CDT)

Given this information we can then handle appointments that are not time zone centric and act accordingly even if two offices sharing data are in two different time zones using two different regional settings.

I hear that SQL Server 2005 is going to handle Time zone information finally, now I wish .net did string date parsing correctly with it. (asside from it would save me a lot of frustration with Email messages...)

Oh and a tip for everyone because I think I get asked this more often than anything else. When writting inserts and updates to SQL Server or any other database server, always use yyyy-MM-dd HH:mm:ss as your format that you pass the date string because it will always be recognized by the SQL Server correctly, unlike every other format that just formats it however it feels like and often just fails entirely, especially with MM/dd/yy or dd/MM/yy (yes, both of these should be abolished... we need one world wide standard!)

# michkap on 5 Dec 2004 4:15 PM:

To Adrian and James -- note that both of your solutions ('d mmm yyyy' and 'mmm/d/yyyy') can suffer from three problems:

1) The "abbreviated month" in many cultures is not really abbreviated and thus you get full month names. This may be okay but sometimes people assume the behavior will be different

2) They both rely on the legacy behvior in the Parse method, which will consistently be able to understand the format. It may have extra performance penalty because of that.

3) Users may not like either format, and one common requirement of parsing is getting user input and converting it to actual data. Users will often expect the formst they put into Regional Options.

Time zones are a huge subject on all on their own -- I will probably take them up some day....

Finally, to Andrew -- see the ParseExact method, it may be exactly what you need.

# Norman Diamond on 21 Dec 2004 6:11 PM:

On Linux systems I frequently see dates displayed in a format that looks like an equivalent to Windows MMM d yyyy. MMM is one or two digits with the Kanji meaning "month". d is one or two digits without the Kanji meaning "day", and yyyy is four digits without the Kanji meaning "year". In addition to the strange ordering, it is really disconcerting to have Kanji attached to exactly one of these three components.

The usual human readable forms are variations of y M d, in which:
y is either an era name followed by one or two digits for the year and the Kanji meaning "year", or optionally a designation of the Christian era with four digits for the year and the Kanji meaning "year";
M is one or two digits followed by the Kanji meaning "month"; and
d is one or two digits followed by the Kanji meaning "day".
If a month or day only needs one digit, it is optional to do two digits with a leading zero. I haven't seen that done with a year though.

I usually abbreviate it to yyyy.M.dd. Though just now I notice that at least one Windows XP system has lost the personalized regional setting that I set around a year ago. (If I recall correctly, Windows 95 used to forget it very frequently.)

# Michael Kaplan on 21 Dec 2004 10:15 PM:

I know of no issue that causes settings to be lost. Given how they are stored in the registry, the only way I know of that they could be lost other than either than (1) the entire user profile bing lost or (2) some app calling SetLocaleInfo to change settings.

#2 is likely if you run misbehaving applications that do not respect user settings, unfortunstely.

BugBuster on 5 Nov 2009 11:55 PM:

This article only scratches the surface. The truth is, it is even more evil than described, because UI controls like the datepicker aren't doing things quite right. And I am talking about a functionality like - specify a format string as parameter in web.config describing the desired date format, read it and feed the datepicker with it and allow to use the same format for display and conversion between string and datetime.

If you're trying to implement that in a multi-cultural world (as we live in, in case you haven't noticed yet), you are opening a can of worms turning development into a coding hell.

Michael S. Kaplan on 7 Nov 2009 5:38 PM:

Since I was referring to things that are pretty far underneath while you are referring to high level UI controls, it would be more accurate to say that I am *not* scratching the surface whiule you are. :-)

BugBuster on 10 Nov 2009 4:02 AM:

I didn't want to criticize your article, only the inconsistent way date/time format is handled.

And yes, I am scratching the (UI)-surface, since you need to present dates properly to the end-user, wich isn't easy to do it in a consistent way when multi-language support is needed.

Michael S. Kaplan on 10 Nov 2009 8:06 AM:

I didn't think you were criticizing, I was just calling attention to the fact that this nearly 5-year-old resurrection of the post was not entirely the same topic -- it is related, but only tangentially.... :-)

On the other hand, nothing much else was going on with it, so no harm in the re-purpose!

jj on 20 Sep 2011 5:34 AM:

ujh

X-man on 15 Feb 2012 8:47 PM:

Good 1...

Titillated my tiny brain.. ;)


referenced by

2010/02/11 Evil Date Parsing lives! Viva Evil Date Parsing!

2006/01/13 Some date conversions are evil in other ways

2005/12/26 COleDateTime's ParseDateTime and locales....

2005/11/17 Parse vs. ParseExact, revisited

2004/12/20 The Time [Zone] is Right...

go to newer or older post, or back to index or month or day