Sample solutions and discussion Perl Quiz of The Week #21 (20040805) You will write a program to perform scheduling. As we all know, tasks sometimes take longer than expected. Sometimes when this happens, the final deadline of the project is affected; sometimes it isn't. For example, consider the four tasks A, B, C, and D. B and C depend on A, which means that they cannot be started until A is finished. D depends on B and C, and cannot be started until both B and C are finished: .-> B . A :-> D `-> C ' Suppose we expect the four tasks to take the following times: A: 1 day B: 2 days C: 3 days D: 1 day Then we don't expect the project to be finished for at least 5 days: one day to complete A and start C; 3 days to complete C and start D, and another day to finish D. Any delay in any of the three tasks A, C, or D will cause the completion date to slip. We say that A, C, and D are on the "critical path". But B is not on the critical path, because B can go on while C is going on, and can take up to one day longer than expected without delaying the start of D. You will write a program which will calculate critical paths. The input to the program will be a file in the following format: A 1 B 2 A C 3 A D 1 B C FINAL 0 D Each line represents one task. The first field in each line is the name of the task. The second field is the expected duration. If there are any other fields, they are the names of other tasks which must be finished before this task can start. The program will find all the tasks on the critical path to the task named FINAL and will print them out, one per line. It may happen that the input specifies tasks that cannot possibly be completed. For example: A 1 B B 1 A FINAL 0 A B Here A can't start until B is finished, but B can't start until A is finished. In such cases, the program should diagnose an error. ---------------------------------------------------------------- Ten different people provided solutions for this quiz. Seven of the ten solutions used recursion, a natural fit for this problem. Four of the solutions used a single pass over the schedule, finding the critical path for each node by finding the critical path for each of its dependencies. Another four used two passes, first calculating start and end times for each task, then finding the critical path to FINAL in the second pass. The remaining two solutions built a list of all the paths to FINAL and chose the one with the longest duration. Two of the solutions stuck with Llama-only features, avoiding references in particular. (That's always an option, but not a requirement, for the regular quizzes.) One solution was object-oriented. Also, one of the solutions was written in Ruby. I wrote a test suite to test the solutions with various valid and invalid input. When viewing these results, keep in mind that the problem description did not specify how to handle: tasks being both direct and indirect dependencies of other tasks (valid #5, #6) blank lines in the input (valid #10) invalid input other than cycles (invalid #4, #5, #6) Valid Input Invalid Input 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 .--------------------------------.--------------------. rjk | y y y y y y y y n y | y y y y y y | roel | y y y y n n y y y * | y y y n n n | xavier | y y y y y y y y y * | y y y n n n | christian | y y y y n n y y y n | y y y y y y | mjd2 | y y y y n y y y y y | y y y n - n | rod | y y y y y y y y y y | y y y n + y | adrian | y y y y y y y y y y | y y y n n n | jurgen | y y y y y y y y y * | y y y n n n | jean | y y y y y y y y y y | y y y n y y | zsban | y y y y y y y y y y | y y y y y y | `--------------------------------'--------------------' * Successful, but with warnings - Questionable; no output at all + Questionable; error message slightly misleading Valid Input 1. Simple 1 2. Simple 2 3. Multiple Critical Paths 4. Complicated 5. FINAL depends on everything 6. Extra dependencies 7. Extra tasks 8. No dependency for FINAL 9. Out of order 10. Blank lines Invalid Input 1. 1-cycle 2. 2-cycle 3. 3-cycle 4. Repeated label 5. No FINAL 6. Missing dependency I'll start by discussing my own solution, because it uses only features from the Llama (AFAIK), it's fairly short, and I think it's an interesting approach. I started by deciding that I would require the input to be in order. (That is, a task must be given before any tasks that depend on it.) This allowed me to easily create a list of all the paths through the schedule (and also made it impossible to have cycles). #!/usrs/local/bin/perl -w use strict; my %paths; my %labels; while (<>) { next unless /\S/; my($label, $duration, @dependencies) = split; die "Duplicate label $label on line $..\n" if $labels{$label}++; if (@dependencies) { foreach my $dependency (@dependencies) { die "Label $label depends on itself on line $..\n" if $label eq $dependency; die "Unknown dependency $dependency on line $..\n" if !$labels{$dependency}; foreach my $path (keys %paths) { if ($path =~ / \Q$dependency\E\z/) { $paths{"$path $label"} = $paths{$path} + $duration; } } } } else { $paths{" $label"} = $duration; } } The heart of the program is hash that contains all the paths through the schedule as keys, and the durations of the paths as values. For example, the hash might look like this: (' A' => 1, ' A B' => 3, ' A C' => 4, ' A B D' => 4, ' A C D' => 5, ' A B D FINAL' => 4, ' A C D FINAL' => 5) The leading space makes the regexes simpler. If a task has dependencies, the script finds each path in the hash that ends with one of those dependencies and appends the task to make a new path. If a task has no dependencies, a new path started consisting just of that task. die "No FINAL label.\n" if !$labels{'FINAL'}; my @critical; my $max = -1; foreach my $path (keys %paths) { next if $path !~ / FINAL\z/; if ($paths{$path} > $max) { $max = $paths{$path}; @critical = $path; } elsif ($paths{$path} == $max) { push @critical, $path; } } Now that the script has all possible paths, it simply looks at those that end at final and grabs the one(s) with the longest duration. die "No critical paths.\n" if !@critical; foreach my $path (@critical) { my @path = split ' ', $path; print "$_\n" for @path; print "\n"; } Although this approach is simple and straightforward, it can use a lot of memory! This also highlights another of the redundancies in the problem description; what to do if there are multiple critical paths to FINAL. There are three possible answers: output each path separately (A B D FINAL / A C D FINAL); output any one of the paths (A B D FINAL); output all the paths intermingled (A B C D FINAL). I accepted any of these in my test suite, although personally I feel the first one is best. [ The problem statement asks only for a list of "the tasks on the critical path". The critical path itself isn't the goal; it's the listing of the critical tasks, because these are the tasks whose slippage will delay the end of the project. So option #2 above is simply erroneous. My own preference is for option #3, since #1 is redundant. - MJD ] Now for an example of a recursive solution. I'll discuss the one provided by Rod Adams, which is short and sweet. #!/usr/bin/perl -w my %tasks = (); while (<>) { chomp; next if /^\s*$/; my ($task, $duration, @prereqs) = split /\s+/; @{$tasks{$task}}{'duration', 'prereq', 'painted'} = ($duration, \@prereqs, 0); } First the script builds a data structure containing the duration and prerequisites for each task. The 'painted' value will be used to detect cycles. my ($duration, @path) = CritPath('FINAL'); print "Critical Path: ($duration days)\n", map(" $_\t$tasks{$_}{duration}\n", @path); The call to CritPath('FINAL') will recursively find the desired critical path. sub CritPath { my ($node) = $tasks{$_[0]} || die "Bad prereq $_[0]\n"; $node->{painted}++ and die "Tasks must be acyclic.\n"; If the task is already painted, there's a cycle. my @maxpath = (0); for my $req (@{$node->{prereq}}) { my @path = CritPath($req); @maxpath = @path if $path[0] > $maxpath[0]; } Find the critical path for each of the task's dependencies and grab the one with the longest duration. $node->{painted}--; Unpaint the task because there may be a separate, non-cyclic path to it. $maxpath[0] += $node->{duration}; return (@maxpath, $_[0]); } Add the current task to the end of its critical path and return. (If the task has no dependencies, its critical path is just itself.) There was an interesting bug in Rod's script. The Bad prereq check in the original was: my ($node) = $tasks{$_[0]} or die "Bad prereq $_[0]\n"; Do you see the problem? I didn't at first. It turns out the precedence is wrong. This line is parsed as: (my ($node) = $tasks{$_[0]}) or die "Bad prereq $_[0]\n"; A list assignment in scalar context returns the number of values assigned. In this case that is always 1, even if the value happens to be undef. So the conditional is always true and the die can never execute. In the code above, I fixed this by changing to the high precedence ||-operator, which is parsed as: my ($node) = ($tasks{$_[0]} || die "Bad prereq $_[0]\n"); Another way to fix it is to use scalar assignment instead of list assignment: my $node = $tasks{$_[0]} or die "Bad prereq $_[0]\n"; Some additional notes: 1. Mine was the only solution that required the input to be given in order. Something like Mark-Jason Dominus's tsort() sub could be used to fix this shortcoming. 2. All the solutions passed the straightforward test cases. However, Zsban Ambrus's solution (which happens to be written in Ruby) was the only one that passed all the additional tests as well. That's some nice defensive programming! 3. MJD's first solution went into an infinite loop on the 'Extra tasks' test, so I left it out and just used his second one. 4. I'm familiar with MJD's philosophy on warnings from an earlier quiz. If he had turned them on, however, he would have noticed that he was still printing values from %slack even though he had removed the code that calculated those values. [ I did notice this, even without '-w'. My mistake was to leave the debugging lines uncommented in the version of the program I posted to the mailing list. - MJD ] Thanks to everyone who participated! Additional thanks to Roel van der Steen for his sample input, which revealed a bug in an early version of one of the other solutions, and to Jurgen Pletinckx for pointing to sample input from a similar problem on the perl golf mailing list. [ Ronald also set me an archive file containing the solutions and test programs; I have placed this at http://perl.plover.com/qotw/misc/r021/quiz21.tgz . Thank you, Ronald! I expect to send the new quiz later today. - MJD ]