Parse vs. ParseExact, revisited

by Michael S. Kaplan, published on 2005/11/17 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/11/17/492197.aspx


It is hard to believe that it has been almost a year since I talked about 'Evil date parsing', Parse, and ParseExact. But that post was done on November 26, 2004, so it must be true.

But the same sort of issue has come up recently in all sorts of other areas beyond that of dates. For example:

Both of these examples boil down to the fundamental distinction between Parse and ParseExact:

For better or worse (using those criteria) both 3,3 and 1,,00,0 are perfectly valid to the former and unrecoverable gibberish to the latter.

In many cases it can be worse, if you ask me. Especially when these values are filled in programatically where there is no user looking at the number to notice the typo (which may very well indicate a bug in an algorthim of some sort), but even in many cases of strings in the user interface since few users are immune to typos and there are many UI situations where people will type without recognizing the mistake.

But this is hardly the fault of either method, though -- if there is a fault, it would be in the consumers of the method who use their choice to make their interfaces either very strict or very casual.

I can understand the decision to allow more flexibility, especially considering the wide range of skills among developers who use the .NET Framework and Visual Studio. But for you non-Mirosofties, as developers who may be using the .NET Framework yourselves, it may make sense to be less flexible, to defend yourselves from the same type of issue.

I end up getting involved because of how often cultural preferences enter into the way that both methods work, which can be an additional source of confusion when trying to track down bugs.

(I admit is unfair of me to assume that you are a smarter developer just because you happen to read this blog, but there it is -- if you are someone who does read this blog the internationalization issues may occur to you more readily!)

One of the people asking about this issue suggested:

There should be something in between -- Parse is *way* too loose, and ParseExact is *way* too strict. Why couldn't there some setting in between, or some global setting to choose the level of strictness in property setting?

Both suggestions are appealing, and the image of Goldilocks complaining about ParsePoppaBear being too strict, ParseMamaBear being too casual, and ParseBabyBear being just right is pretty darn funny. But this would hard to implement in practice since this simple model is implemented in many different methods that run across a ton of different properties. Each of those methods would need to have this third method or this configuarble switch defined; more importantly, for each method would have to define what it means to vary the strictness and when not to.

It is easy to get people to agree on what they find to be outrageous -- extreme cases are not the problem though. It is where to draw the line between outrageous and harmless, recoverable error.

So odds are, we are stuck with these forgiving and unforgiving constructs -- and it is up to each developer which way they want to go when they are in the world of morphing strings into other types....

 

This post brought to you by "" (U+2260, a.k.a. NOT EQUAL TO)


# CornedBee on 17 Nov 2005 5:39 AM:

A few weeks ago, I read a quote from someone on the IE team, saying that they'd be very strict in the parsing of some new technology, having learned their lesson about forgiving parsing with HTML.
Perhaps the XAML people should talk to those guys.

In general, I think being forgiving when parsing is a mistake. Giving good error messages is way better. I don't think it's too much to ask of people to follow a specification, if only for the sake of interoperability. Let's take your example:
<Rectangle Width="3,3">

On first glance, I thought this specified a rectangle 3 units high and 3 units wide. I didn't look to close at the attribute name, and my native German brain was faster identifying the "3,3" as two coordinates than it was giving "Width" its exact meaning; only that it's something related to size had trickled through.

Then I re-read the thing and realized that Width couldn't refer to height. The second impulse was that you had given the rectangle a width of 3.3. Again the reason lies in my being Austrian: as you probably know, the decimal point in the German locales is ',', not '.', which is the thousands separator. So 3.3 was a reasonable guess, unlike 33, which is counterintuitive (what is that comma doing there?)

Only by actually reading the text did I find out that it was supposed to be 33.

Bottom line: loose parsing encourages sloppy coding. But sloppily coded stuff is harder to read not only for the machine, but also for humans. It is a good idea to enforce strict rules.

# AndrewSeven on 17 Nov 2005 9:47 AM:

Most parsing should be strict, would the loose interpretation of "3,3" be dependant on the current culture?
Many non-English notations use the comma as the decimal "separator". Is it called a "point" in German even though it is a comma?

Loose parsing has a place too, but mostly for treating user input where you will re-present the value for the user to verify after a loose parse.
When you pass the value into the system, the parse should then be strict.


# Mihai on 17 Nov 2005 12:35 PM:

I agree parsing in locale context is tricky.

But not here! XAML is a storage format and should be locale independent:
http://www.mihai-nita.net/20051025a.shtml

Or maybe I don't grok the problem this early in the morning :-)

# Maurits [MSFT] on 17 Nov 2005 4:00 PM:

XML has nothing to say about locali[sz]ation of floating-point attribute syntax, it just considers it another kind of PC Data.

But...

It is worth noting that the XML specs define the VersionNum entity as
'1.0'
and
'1.1'

respectively.

# Gabe on 18 Nov 2005 5:03 AM:

The problem with Parse is that it can allow incorrect data to be parsed without error, while ParseExact requires knowing what format the input will a priori.

It might be handy to have the inbetween function perform some sort of parsing and only return a valid value if the input is unambiguous. For example, "3,3" could be parsed as "3", "3.3", or "33", so it would return an error. Similarly would the date "01/02/03". However, "01/14/2005" can only parse correctly as a date one way.

I suppose this may be harder to implement that the other two, but it seems like it could be quite useful!

# Nick Lamb on 18 Nov 2005 11:28 AM:

Gabe, if you accept 01/14/2005 someone will write code that emits 01/14/2005, and of course then they'll be puzzled as to why it doesn't work on 02/03/2005.

# Richard on 21 Nov 2005 5:22 AM:

> XML has nothing to say about locali[sz]ation of floating-point attribute syntax

However XSD does. It specifies a period for the xsd:decimal type. xsd:float then uses this definition for the mantissa.

# Petteri on 23 Jun 2009 9:46 AM:

For me it came as full surprise that the ParseExact won't handle the MS defined DateSeparator that it uses in the MaskedTextBox.

Imagine my surprise when with my computer the mask

0000/00/00 in text box with datetime format yyyy/MM/dd gives

2009/06/23 and in my colleague's 2009.06.23

This comes to full stop when you try to use yyyy/MM/dd as parse string in the parse exact, mine works his does not.

To fix this I had to modify the parsing string by replacing the "/" with current date separator...

dateParse = dateParse.Replace("/", CultureInfo.CurrentCulture.DateTimeFormat.DateSeparator);

Now that I understand that the ParseExact is truly exact I know how to convert the strings...


go to newer or older post, or back to index or month or day