Sample solutions and discussion Perl Quiz of The Week #2 (20021023) Write a function, days_diff, to compute the time difference, in days, between two dates. The dates will be strings in the format Wed Oct 16 2002 For example: days_diff("Wed Oct 16 2002", "Wed Oct 23 2002") should return 7. days_diff("Wed Oct 16 2002", "Tue Oct 16 2001") should return -365. I thought this would be an easy problem. But as I should have remembered, almost nothing to do with date calculations is easy or simple. Some of the varied complications are discussed below. I had originally imagined two types of solution. One might use one of the heavy-duty CPAN date calculation modules, such as Date::Calc or Date::Manip; the other other use the standard Time::Local module. The Time::Local solution I produced looked like this: use Time::Local 'timegm'; my @mon = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec); my %m2n = map {\$mon[\$_] => \$_} 0 .. 11; sub days_diff { my @times; for (0 .. 1) { my (\$dw, \$mn, \$dm, \$yr) = split /\s+/, \$_[\$_]; push @times, timegm(0, 0, 0, \$dm, \$m2n{\$mn}, \$yr-1900); } (\$times[1] - \$times[0])/86400; } Here I used the 'timegm' function to turn each date into a Unix epoch time (number of seconds since the start of 1970), subtract the two epoch times to find the difference in seconds, and then divide the result by 86400 to get the number of days difference. 1. It might seem as though this could produce a fractional result. The question is: Is the interval between consecutive midnights in GM time (Greenwich Mean) time always exactly 86400 seconds? The answer is no, not exactly, because GMT days are occasionally 86401 seconds long. Astronomers throw in an extra 'leap second' at the end of June or December to keep the actual solar noon synchronized with chronological noon; the extra second is occasionally necessary because the Earth's rotation is gradually slowing. (They might in principle subtract a second sometimes because of random rotational changes called 'nutations', but it's never happened.) In particular, there were extra seconds at the end of June 1997 and December 1998. However, as far as I know, no Unix system actually uses GMT, and no 'gmtime()' function actually calculates Greenwich mean time! Instead, they all use UTC (Coordinated Universal Time) which is just like GM time, except without the leap seconds; UTC days are always exactly 86400 seconds long. My system even documents this: The ctime(), gmtime() and localtime() functions all take an argument of data type time_t which represents calendar time. When interpreted as an absolute time value, it represents the number of seconds elapsed since 00:00:00 on January 1, 1970, Coordinated Universal Time (UTC). So the function above works even across intervals where the GM calendar contains leap seconds, supposing that the 'timegm' function is actually producing UTC times. If not, the final difference should simply be rounded off to the nearest integer. 2. A second issue is that in many localities, mostly in Europe and North America, there's a practice called 'Daylight Saving Time' where the clock is set forward in the spring and backward in the autumn. For example, in Philadelphia, there was no local time 02:30 Sunday April 7 2002: plover% perl -le 'print scalar localtime(1018160911)' Sun Apr 7 01:28:31 2002 But an hour later it was: plover% perl -le 'print scalar localtime(1018160911 + 3600)' Sun Apr 7 03:28:31 2002 April 7 was only 23 hours long, because 2:00-2:59 was missing; similarly, yesterday (Sunday, October 27) was 25 hours long. GM time does not have any such adjustments, so they don't affect the sample function above. However, several people used the 'timelocal' function instead of 'timegm' in similar solutions, leading to incorrect results. For example, when called like this days_diff("Mon Oct 21 2002", "Mon Oct 28 2002"); # Daylight saving their functions would return 7.04166666666606 instead of 7, because of the extra hour. Again, rounding off would have solved the problem, but the solutions I saw that used timelocal() didn't round off either. 3. Here's a related issue. One poster on the -discuss list presented a solution that would check the day of the week for validity, a reasonable addition: my \$sttime = timelocal(0,0,1,\$stdayno,\$months{\$stmonth},\$styear); my @sttime = gmtime(\$sttime); if(\$stday ne \$days[ \$sttime[6] ]) { # (gmtime)[6] is the dayname return "start date was invalid\n"; } But there's an inconsistency in the implementation. See it? He uses timelocal() to convert the argument to epoch time, but then gmtime() to convert it back and get the day of the week. That works OK where I live, because 1AM local time is either 5AM or 6AM UTC the same day, so the day of the week is the same. But had this poster run his test program in Tokyo, where 1AM local time is 6PM UTC the previous day, he would never have had a success! 4. Here's a puzzling issue. For various reasons, none of the 'timelocal()' solutions posted to the -discuss list actually works. The one I excerpted above assumes that the dates are in the format Wednesday 16 October 2002 but the problem specification calls for Wed Oct 16 2002 Two others assume that the dates will be in the format Wed 9 16 2002 which seems rather unlikely. I'm curious about why the authors of these functions didn't get this right. Was it carelessness, or a deliberate modification of the spec? 5. The major defect of the sample solution above is that the timegm() function has a limited range. It returns its result as an integer number of seconds since 1970; on machine with 32-bit integers, this covers a range of about 136 years, from Fri Dec 13 20:45:52 1901 through Tue Jan 19 03:14:07 2038. Outside this range, it throws an exception. This may be acceptable to the accounting department, but the limitation should be noted. 6. Solutions using one of the heavyweight CPAN date calculation modules don't have this range limitation. These modules also do all the difficult work. Here's a solution I produced that uses the 'Date::Calc' module: use Date::Calc 'Delta_Days', 'Decode_Date_US'; sub days_diff { my @d = @_; s/^\w+// for @d; # Remove day of the week Delta_Days(map Decode_Date_US(\$_), @d); } 'Decode_Date_US' attempts to parse and translate a Date in US format, where the month precedes the day number. Unfortunately, the days of the week confuse this function, so I have to strip them out first. The function returns a year number, month number (1-12) and day number. The 'Delta_Days' function takes two dates in year, month, day format and computes the number of days difference between them. Steve Smoot produced essentially the same solution: sub days_diff { return Delta_Days(Decode_Date_US(substr(\$_[0],4)),Decode_Date_US(substr(\$_[1],4))); } 7. Shawn Carroll benchmarked a Date::Calc solution against a Date::Manip solution and found that Date::Calc was about 80 times faster. This is probably because Date::Calc is written in C, while Date::Manip is in pure Perl. The Date::Manip manual contains an extensive discussion of this point, and the tradeoffs between Date::Calc and Date::Manip. 8. Some people coded the date calculations by hand. This is tricky to get right, but has the benefit that if you get do it right, it doesn't have the range limitations of Time::Local. Date calculations are really complicated, and tend to end up looking like a big stew, so I didn't bother to debug the ones that didn't work. Here's one of the less stewish examples, provided by G. Rommel: sub days_diff { my (\$start, \$end) = @_; my (\$wd1, \$mo1, \$day1, \$yr1) = split ' ',\$start; my (\$wd2, \$mo2, \$day2, \$yr2) = split ' ',\$end; # Convert the month. my %mnum = ('Jan'=>0, 'Feb'=>1, 'Mar'=>2, 'Apr'=>3, 'May'=>4, 'Jun'=>5, 'Jul'=>6, 'Aug'=>7, 'Sep'=>8, 'Oct'=>9, 'Nov'=>10, 'Dec'=>11); # Days before this month this year. my @db = qw(0 31 59 90 120 151 181 212 243 273 304 334); \$mn1 = \$mnum{\$mo1}; \$startday = int((\$yr1 - 1601) * 365.2425) + \$db[\$mn1] + \$day1; \$startday++ if \$mn1 > 1 && (\$yr1%4==0) && ((\$yr1%400==0) || (\$yr1%100!=0)); \$mn2 = \$mnum{\$mo2}; \$endday = int((\$yr2 - 1601) * 365.2425) + \$db[\$mn2] + \$day2; \$endday++ if \$mn2 > 1 && (\$yr2%4==0) && ((\$yr2%400==0) || (\$yr2%100!=0)); return \$endday - \$startday; } Rommel transforms each input date into a count of the number of days since the beginning of 1601. (The 365.2425 is the average number of days in the Gregorian calendar year.) He then subtracts the counts to get the difference. Rommel notes that this approach fails to detect invalid dates ("Sep 37 2002" is interpreted as the same as "Oct 7 2002", for example) and that the function won't work after the year 5881210 because the number of days will no longer fit into an integer variable. 9. There was a long discussion on the -discuss list about Julian vs. Gregorian dates. The calendar presently in use in most of the world is the Gregorian calendar, first introduced by Pope Gregory XIII in 1582. Prior to this, most European countries used the Julian calendar, almost the same but with a different leap day schedule. When you see a date like "Tuesday July 2 1776" there is a question about what it means; the same label may be applied by the Julian and Gregorian calendars to different days. There some complications on top of this: * Not every country switched to the Gregorian calendar at the same time. Most Catholic countries switched immediately; other countries held out. Great Britain and its colonies (including what would eventually become the USA) switched calendars in 1752. Russia switched in 1918; this is why the October Revolution is now celebrated in November. * The switch was accompanied by a one-time modification of the calendar, to bring the dates back into line with the seasons. In Spain, for example, October 1582 had only 21 days. In Great Britain, September 1752 had only 19 days. In Sweden there was a big mix-up too complicated to explain here. http://www.geocities.com/CapeCanaveral/Lab/7671/gregory.htm contains some interesting details about these issues. So one might ask: What is days_diff("Fri Sep 1 1752", "Sun Oct 1 1752")? In Spain, it's 30, as one would expect. In England or the USA, one might like the answer to be 19---except that in England, Sep 1 1752 was a Tuesday, not a Friday. Several people tried to take this into account. All of the solutions were locality-specific. For example, one gentleman wrote a version that was accurate in France, taking into account that fact that in France, Dec 10 1582 was followed immediately by Dec 20 1582. I feel that this is misguided. It's interesting, but it's a lot of work and the payoff seems small. The gentleman I mentioned before who included the correct adjustment for France had his function deliver an error if you asked for Dec 15 1582, which didn't exist in France (or, more precisely, there was no date with that name): if (\$jd[\$i] > 15821210 and \$jd[\$i] < 15821220) { print "Not a valid date:\nFrance switched to the Gregorian calendar in 1582\n"; print "and 10 Dec 1582 was followed immediately by 20 Dec\n"; exit; } This person would have had a much easier time if he had lived in Israel rather than in France. The entire function could have been replaced with: sub days_diff { my \$date = shift; print "There is no date called '\$date'\n"; exit; } (In the Hebrew calendar, "Mon Oct 28 2002" is called "Heshvan 22, 5763".) If one is going to historically inaccurate date names, then why not also throw an error for 10 Dec 1793? There was no date with that name either, in France, because the Gregorian calendar was abolished for 13 years after the Revolution and was replaced by a new calendar, in which 10 Dec 1793 was known instead as Decade II, Decadi de Frimaire, de l'Annee 2 de la Revolution or more succinctly, "20 Frimaire II", perhaps. And this says nothing about the question of whether France will still be using the Gregorian calendar 7,000 years from now. It's my feeling that if you're really trying to convert historic dates, the interface presented by days_diff() is hopelessly inadequate. The sample solutions pretend that the Gregorian calendar was in use everywhere at all times; which isn't historically accurate, but it's probably the best you can do without expending an enormous amount of effort; see the GNU Emacs 'calendar' package, for example. It didn't occur to me when I posed the question that people would get worried about this. But the issue could turn out to be important for some applications. For example, if you're trying to compute interest payments for money borrowed before the calendar change, it would be unfair to charge a full month's interest for September 1752 or October 1582 or whatever, when those months were ten or eleven days short. But I think in such a case, you would really have to go back to the people requesting the function and ask what they wanted it to do. 10. Last week I observed with some surprise that when some code failed in some circumstances, people tended to come up with very complicated examples rather than simple ones. I observed this again this week. To illustrate the potential difficulty in handling Julian vs. Gregorian dates, folks brought up the September 1752 oddity in the Great Britain calendar. A simpler example would be that days_diff("Xxx Feb 28 1700", "Xxx Mar 1 1700") should return 1 if the dates are interpreted as Gregorian dates, but 2 if they are interpreted as Julian dates, since the Julian calendar has Feb 29 1700 and the Gregorian calendar omits it. 11. Astronomers use their own modification of the Julian calendar; they label the dates with numbers, with day 0 being a certain day about 6700 years ago, and increasing by 1 each day afterwards. If you could convert a (presumably Gregorian) date like "Wed Oct 16 2002" to astronomical form, you could then subtract the day numbers of two dates to get the number of days in between. Unfortunately, nobody implemented this right. One programmer who chose this path did this: sub days_diff { my(\$day,\$month,\$mday,\$year) = split(/\s+/,\$_[0]); my(\$day2,\$month2,\$mday2,\$year2) = split(/\s+/,\$_[1]); my \$monthToNum = { Jan => 1, Feb => 2, Mar => 3, Apr => 4, May => 5, Jun => 6, Jul => 7, Aug => 8, Sep => 9, Oct => 10, Nov => 11, Dec => 12, }; my \$jd_1 = _jday(\$year,\$monthToNum->{\$month},\$mday); my \$jd_2 = _jday(\$year2,\$monthToNum->{\$month2},\$mday2); return \$jd_2 - \$jd_1; } sub _jday { my(\$y,\$m,\$d) = @_; my \$jd = ( 1461 * ( \$y + 4800 + ( \$m - 14 ) / 12 ) ) / 4 + ( 367 * ( \$m - 2 - 12 * ( ( \$m - 14 ) / 12 ) ) ) / 12 - ( 3 * ( ( \$y + 4900 + ( \$m - 14 ) / 12 ) / 100 ) ) / 4 + \$d - 32075; return \$jd; } The '_jday' function here is supposed to convert a year, month, and day to an astronomical Julian day number. This programmer said: What I did do was Google for 'julian day' and I found http://hermetic.magnet.ch/cal_stud/jdn.htm That's a good approach in general, but unfortunately, he cribbed the code without reading the accompanying discussion on that page: Days are integer values in the range 1-31, months are integers in the range 1-12, and years are positive or negative integers. Division is to be understood as in integer arithmetic, with remainders discarded. This programmer's '_jday' function doesn't discard remainders, so it produces mostly wrong answers. A correct version: sub _jday { my(\$y,\$m,\$d) = @_; my \$a = int((\$m-14)/12); my \$b = int(( 1461 * ( \$y + 4800 + \$a ) ) / 4); my \$c = int(( 367 * ( \$m - 2 - 12 * \$a ) ) / 12); my \$e = int(( \$y + 4900 + \$a ) / 100 ); my \$f = int(( 3 * \$e ) / 4); my \$jd = \$b + \$c - \$f + \$d - 32075; return \$jd; } One other person posted an astronomical Julian day solution to the -discuss list, and made the same mistake. I think the moral here is something about how you can't just paste code into your program and expect it to work. 12. Here's a small test suite: use Test; BEGIN {plan tests => 7 } END { ok(days_diff("Wed Oct 16 2002", "Wed Oct 23 2002"), 7); ok(days_diff("Wed Oct 16 2002", "Tue Oct 16 2001"), -365); ok(days_diff("Mon Oct 21 2002", "Mon Oct 28 2002"), 7); # Daylight saving ok(days_diff("Thu Oct 31 2002", "Fri Nov 1 2002"), 1); ok(days_diff("Sun Jun 29 1997", "Tue Jul 1 1997"), 2); # Leap second ok(days_diff("Wed Dec 30 1998", "Fri Jan 1 1999"), 2); # Last leap second ok(days_diff('Wed Jul 4 1776','Tue Jul 4 1976'), 73048); } 1; To use this, put it in a file called 'DiffTest.pm'; then add the line use DiffTest; to the top of the file that contains your days_diff() function. Thanks again to all the subscribers, and to those who participated in the discussion. I will send another quiz on Wednesday. Sample solutions for this week's 'expert' quiz may be slightly delayed, since I have some other things to attend to this afternoon.