Some date conversions are evil in other ways
by Michael S. Kaplan, published on 2006/01/13 00:31 -08:00, original URI: http://blogs.msdn.com/michkap/archive/2006/01/13/512408.aspx
One of the very first posts I ever did in this blog was 'Evil date parsing', Parse, and ParseExact.
In it I talked about the heritage of evil date parsing that dated all the way back to early versions of Visual Basic.
The method of simply trying to fit the value into a date no matter what has a lot of bad things to say about it, though it does have some positive moments, such as one I was asked about the other day.
Someone wanted to take the Hijri date '30-10-1426' and use it to do some work with. Easy to do in VB 6.0 or VBA 6.x with simple code like the following:
VBA.Calendar = vbCalHijri
dte = CDate("30-10-1426")
Debug.Print dte
The value returned in the debug window was 11/1/1426. Oops! And this is especially an oops because reportedly this was one of those months where the moon was not spotted until the next day, a fact that Windows knows nothing about since the historical data is not kept. In the end, VB/VBA/COM are all smart enough to increment to the next month (where smart is defined here as not throwing an error).
But it gets worse.
The person asking the question was using SQL Server's support (which I talked about a bit here), running a query like this:
select * from dbo.tblAppointments
where convert(datetime, colDate, 131)
between convert(datetime, '1-10-1426', 131)
and convert(datetime, '30-10-1426', 131)
The problem here is that although SQL Server uses the same basic algorithm for its Hijri conversions, it does not forgive anything outside of its known range. So it returns an error, with text something like "The conversion of a char data type to a datetime data type resulted in an out-of-range datetime value."
When comparing these two sets of results that are both technically not correct, it does kind of take the phrase 'choosing the lesser of two evils' to a whole new level, if you ask me. In this case, however, with the armed with the "pragmatic" knowledge that the date value is actually legal, I guess I would prefer the VB/VBA/COM behavior in this case, since it does not cause the program to stop just because there is a limitation due to the lack of historical knowledge of prior moon sightings. At least the range checking would have worked....
On the other hand, I am sure you can hypothesize about cases where letting it continue might be more evil. So sometimes the SQL Server behavior might seem better.
It all depends on how you look at it, I guess.
What do you think is the better generic behavior in this less than ideal situation?
This post brought to you by "ڇ" (U+0687, a.k.a. ARABIC LETTER TCHEHEH)
# Maurits on Friday, January 13, 2006 3:54 AM:
If the computer doesn't know for sure, it should own up to it and say so.
Guessing is bad.
# Universalis on Friday, January 13, 2006 6:27 AM:
Some of the trouble comes from the assumption that SQL Server should be given valid dates in its search arguments. This seems unduly restrictive. For example, if you ask me to list the dates between 0 March 2006 and 0/0/2007, the answer is simple, obvious, straightforward and unambiguous, even though the dates themselves aren't valid. The same applies for dates between 1-10-1426 and 30-10-1426.
Incidentally, do Muslims really have more birthdays than the rest of us? Will a Muslim celebrate his 100th birthday after only having lived 97 years?
# Michael S. Kaplan on Friday, January 13, 2006 10:23 AM:
Indeed Universalis, and this is my conundrum on the issue!
The big worry could be similar to people being born on February 29th, a day that does not exist in over 75% of the years. :-)
# Maurits on Friday, January 13, 2006 12:11 PM:
> list the dates between 0 March 2006 and 0/0/2007
You can do this with DatePart, as a radix search...
CREATE PROCEDURE DatesBetween
(
@StartYear int,
@StartMonth int,
@StartDay int,
@EndYear int,
@EndMonth int,
@EndDay int
)
SELECT date
FROM Dates -- need to build this table first
WHERE
DatePart(year, date) BETWEEN @StartYear AND @EndYear AND
(
__ DatePart(year, date) > @StartYear OR
__ DatePart(month, date) > @StartMonth OR
__ (
_____ DatePart(month, date) = @StartMonth AND
_____ DatePart(day, date) >= @StartDay
__ )
) AND
(
__ DatePart(year, date) < @EndYear OR
__ DatePart(month, date) < @EndMonth OR
__ (
_____ DatePart(month, date) = @EndMonth AND
_____ DatePart(day, date) <= @EndDay
__ )
)
order by
__ date
Erroring on invalid datetimes is a feature, in my opinion.
# Nick Lamb on Friday, January 13, 2006 1:09 PM:
SQL Server gets this right (as well it might, this is an expected and iirc required feature of SQL). We also know that fixing this in MySQL was considered a significant improvement.
Everything that faces toward the programmer (including language features in e.g. SQL, web services and operating sytem APIs) should fail with diagnostics. That means at least a meaningful error code along with the input that caused the problem.
The documentation for such functions should clearly state what basis is used for the date-time representation or for both representations if the function performs an explicit conversion - and it should give an example of an error. Programmers can build on this strong foundation, by handling errors if they occur, and by avoiding errors where possible elsewhere in the system.
What's actually desirable in user-facing applications is a completely separate issue. For a start, (real scenario) what do you think should happen with a date of birth from medical records which reads "Winter 1923/4" ? Individual applications will need careful consideration and thorough testing, APIs with arbitrary defaults don't help much if at all.
# Michael S. Kaplan on Friday, January 13, 2006 1:22 PM:
Hi Nick --
I usually would agree with you (after all, I have been calling the 'feature' in VBA evil for a long time -- and I do not tend to be in favor of things I consider 'evil.'
In this particular case, however, it is not quite so black and white to me. Because it *was* a valid date and an inability to recognize that fact is a flaw. How serious is that flaw is hard to say, but there are definitely negative conseqences for developers and users.
# Michael S. Kaplan on Friday, January 13, 2006 1:23 PM:
Hi Maurits --
But in this case, the date is actually valid. Swallowing dates, refusing to admit thedir existence in an ostrich-like, head-in-the-sand maneuver is also a problem....
# Maurits on Friday, January 13, 2006 1:37 PM:
Indeed. There are more than two kinds of behavior, and their desirability is as follows:
1) Correctly reject all invalid dates. Accept all valid dates.
2) Correctly reject most invalid dates. Accept move valid dates. (Bug, but forgivable. Fix it or work around it.)
3) Accept anything.
4) Abandon the datetime format.
What does SQL3 have to say about datetime formats?
Frankly, I don't see Hijri support as a priority. I'd much rather have time zone support in datetime data types. There's a whole slew of problems on any computer with daylight savings time if you care about data that happens at the time change.
# Maurits on Friday, January 13, 2006 1:38 PM:
Er, "Accept move valid dates" should be "Accept most valid dates"
# Michael S. Kaplan on Friday, January 13, 2006 1:58 PM:
I'll be talking more about time zones very soon.... but I don't want to tell anyone that their system is not important, especially if we claim any level of support....
# Maurits on Friday, January 13, 2006 3:10 PM:
That's the crux of the matter... what is the cost of supporting the feature, and what is the benefit?
I must disagree that the VB/VBA behavior is "smart" - I think any Hijri calendar user would agree with me*. It's indeed very similar to the 2/29 issue you bring up. How would a Gregorian calendar user feel if their appointment they made for 2/29 showed up on 3/1 instead? If they got an error entering the appointment they could resort to other means of reminding themselves instead of silently being a day late.
* If there are any Hijri users around that disagree, please chime in...
# Michael S. Kaplan on Friday, January 13, 2006 3:47 PM:
I don't think I seriously called it smart, and I am on record on saying it is evil.
In this case it only does something smarter than denying the existence if a day somewhat by accident (and it still misrepresents the day, anyway).
The Feb. 29th issue is not an issue, unless it is not a leap year. And I would rather it show up on the day after the 28th than NEVER. :-)
# Maurits on Friday, January 13, 2006 4:04 PM:
OK, then I agree with you that it is evil and not necessarily smart. :)
With 2/29 I was trying to make an analogy. There is software out there that mishandles leap years. I personally ran into one that skipped reminders for 2/29/2000.
If the choice is between
1) Accept the appointment for 2/29/2000 but show it on 3/1/2000 because 2/29/2000 is an invalid date and I don't want to rock the boat or anything
and
2) Reject the appointment for 2/29/2000 with an "invalid date" error because everyone knows centuries are not leap years
I'll pick 2). They're both wrong, but to me, 2) is "more desireable" wrong behavior.
# Nick Lamb on Friday, January 13, 2006 5:54 PM:
"In this particular case, however, it is not quite so black and white to me. Because it *was* a valid date"
To me it /is/ black and white but you have to be careful to define your date-time system up-front. It's not enough to say "Hijri calendar" because this doesn't provide enough information. So you need to pick a specific calendar definition. Despite the political problems, I'd imagine that Windows would have to go with one [or more] of the pre-calculated calendars, which are accepted by some but not all Muslim authorities, and are based on astronomical data rather than reports of moon sightings.
So, assuming that this is all defined somewhere, 30-10-1426 is either valid or invalid, no grey areas, just like the 29th of February in any particular year of a Gregorian calendar. If it's not defined yet then someone had best go make some decisions and write them down.
As I said before, user-facing applications are a different kettle of fish. An application for scholars or followers of Islam may need to automatically transform the 30th day internally, as well as handling variable day lengths etc. But programmers whose users care will know about this, or if they don't they'll soon find out. Attempts to do it all without the programmer knowing are unlikely to succeed IMNSHO.
# Maurits on Friday, January 13, 2006 6:55 PM:
All right, 'fess up... what's the Kuwaiti algorithm?
http://en.wikipedia.org/wiki/Kuwaiti_algorithm
If a calendar is nondeterministic, there may not be an elegant solution other than to include month lengths in Windows Update patches.
# Michael S. Kaplan on Friday, January 13, 2006 9:47 PM:
I was actually involved with the topic Dr. International wrote to cover this issue (referenced by the Wikipedia article). AFAIK, the article's claims are not true.
# Marvin on Saturday, January 14, 2006 1:09 AM:
What a waste of bandwidth (in this otherwise interesting blog)! How can so many intelligent people seriously discuss this issue? You could also try to discuss methods of calculating the exact number of angels that can fit on a pinhead. A modern calendar that relies on moon sightings is simply moronic. If some people want to use it, they have bigger problems than SQL server bahvior. My suggestion to them would be to adopt a different calendar as all other peoples did in the past. If Microsoft wants to earn a few additional bucks by trying to accomodate these people... well good luck to Microsoft but I'd rather go and read about measuring angels on a pinhead.
# Michael S. Kaplan on Saturday, January 14, 2006 2:10 AM:
Hi Marvin --
Of course I am speaking for myself and not Microsoft, but I don't tend to consider anyone's religious beliefs to be moronic. The specific questions that it raises here (and how two different products handle the same algorithm) are interesting whether one believes or not....
# Marvin on Saturday, January 14, 2006 4:05 PM:
You misunderstood me. A calendar as used for business and information interchange is not the same as a religious calendar. I can calculate the time when I should sacrifice newborns to Moloch any way I want, including sigthing the moon, spitting into a puddle or standing on my head and counting to a hundred. This has nothing to do with the fact that if I want to communicate dates and times to somebody else I need to adopt some deterministic way to do so that can easily be replicated and used by other people. More so if I want to use computers to help me to determine and communicate dates. Note this way doesn't have to be the Gregorian calendar or any other existing one.
What is moronic is people trying to use a religious calendar for business and IT purposes.
As for the engineering problems I don't see any big ones here. You have two "functions" with different contracts. One considers a certain input invalid while another doesn't. Whichever behavior makes sense depends on user's requirements. If ligitimate use cases exist for both behaviors then an API author should provide two functions or allow to change behavior of a single one through some option. This situation is as old as programming itself. Where is the problem?
# Michael S. Kaplan on Saturday, January 14, 2006 4:19 PM:
Yes Marvin,
But if you look at what I said, there is honestly no way to tell from it what the purpose of the SQL Server application is -- perhaps it is for a religious organization?
I guess you may not see the conundrum here, or think there is anything interesting in this post. And that is fine -- you can certainly skip anything you don't find interesting (I know I do!).
But I know from the fact that there are people who had problems in the firt place that prompted a question and from the email I have received from this one that there at least a few others who also thought so too. I know that I can live with that.... :-)
# Marvin on Saturday, January 14, 2006 4:48 PM:
Re: "there is honestly no way to tell from it what the purpose of the SQL Server application is -- perhaps it is for a religious organization"
Then it uses the wrong technology. The 'convert' function is entirely inappropriate to handle things like dates of sacrifices. If I wanted to handle custom non-deterministic calendars (based on reign of king Arthur for example with a new day starting when a bird first heard singing) I'll get the corresponding tables and write a conversion rutine based on my interpretation of holly scriptures.
Re: "But I know from the fact that there are people who had problems in the firt place... "
Of course they do. And the solution for them is to use a good calendar to begin with rather than trying to force computers to guess when the bird first sung.
Long before computers many different peoples of different religions had realized this simple point and changed their secular calendars (while often retaining old ones for religious purposes). So, sorry, but in the year 2006 anybody who still has such problems should blame only himself.
# Michael S. Kaplan on Saturday, January 14, 2006 5:29 PM:
I guess we will have to agree to disagree. Since I don't think I am wrong, I don't think those others are wrong, and even if I did I probably would not approach it quite the way that you do (it is a tad judgmental for my tastes, but that is anyone's right)....
But to get back to the actual topic -- we have one algoruthm here that approximates a calendar in use, and there is this particular historical issue that MS products do not currently support. This makes it interesting (to some) to look at the different manifestations that the non-support takes. If that is not interesting to you then feel free to move along to another post. :-)
# Maurits on Sunday, January 15, 2006 1:05 AM:
As a counterpoint to Marvin's rather abrasive posts, I'll point out information I consciously withheld either... the Gregorian calendar has a little bit of nondeterminism too, the "leap second." So DateDiff(second, '1/1/2050', '1/1/2051') is not knowable until it is decided whether to have leap seconds in the year 2050.
# Michael S. Kaplan on Sunday, January 15, 2006 11:54 AM:
But I would not mind being a second (or a minute or an hour) late for meetings. :-) And of course there is an alternate calendar that is provided in .NET 2.0 and Vista to cover the Umm-Al Qurah (Saudi civil) calendar, as a conscious attempt to allow matters that do not relate to religious issues to have a bit more determinism....
# orcmid on Sunday, January 15, 2006 12:46 PM:
Back in the day when Turbo Pascal on CP/M-80 was a hobbyist's dream development environment, I remember cobbling up a small set of date conversions that would work between day numbers (signed ints relative to something like January 1, 1980 or thereabouts) and {yyyy, mm, dd} triples.
What thrilled me was I found a way to get both strict and sloppy in the same kit, and it even helped with testing and debugging the little functions.
First, the sloppy functions would create the proper day numbers from (yyyy, mm, dd+7), (yyyy, mm+1, dd) and so on. That is, daynumber(yyyy, 2, 28+3) would yield the day number for March 2 or March 3, and so on. These simple manipulations, as we know, are all quite handy.
The strictness, used after receiving input date that is supposed to be valid, was accomplished simply by converting to the daynumber and back to the calendar date. If it didn't agree with the input, the initially-given date was invalid. Solved the leap-year challenge, proper days in months, and whatever else. It was done with tables and the usual congruences, but you got strictness when it mattered and you had relaxed rules when that made date manipulation work well.
daynumber(yyyy, mm, dd) wasn't unreasonably flexible, and it did check ranges on its inputs so that you couldn't go to awful extremes.
The daynumber(yyyy+k, mm, dd+j) cases all worked well but what you could do to mm was a bit more restricted, because I was a bit stingy with the table for the number of days preceding each month in a given type of year. That is no longer a problem and it would be fun to do a more-comprehensive one, but still for a limited range of practical dates.
# Marvin on Monday, January 16, 2006 2:21 AM:
Maurits,
What you have written is not a counterpoint. The leap second is entirely inconsequential in both business and IT (which is what we are all talking about). It is routinely ignored or handled incorrectly by almost all software out there and nobody even notices. The few people who *do* care like timekeepers themselves, scientists and some others are exactly in the same postion as the religious people. They need to use special software and special algorithms.
But I do agree that leap second is a bad thing. Keeping the calendar in precise sync with earth rotations seems like an atavism to me. I suspect it will be abandoned when we will have colonies on Mars and Moon ;-)
Please consider a
donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.
go to newer or older post, or back to index or month or day