Sample solutions and discussion
Perl Quiz of The Week #21 (20040805)

        You will write a program to perform scheduling.  As we all
        know, tasks sometimes take longer than expected.  Sometimes
        when this happens, the final deadline of the project is
        affected; sometimes it isn't.  For example, consider the four
        tasks A, B, C, and D.  B and C depend on A, which means that
        they cannot be started until A is finished. D depends on B and
        C, and cannot be started until both B and C are finished:

              .-> B .
            A        :-> D
              `-> C '

        Suppose we expect the four tasks to take the following times:

                A: 1 day
                B: 2 days
                C: 3 days
                D: 1 day

        Then we don't expect the project to be finished for at least 5
        days: one day to complete A and start C; 3 days to complete C
        and start D, and another day to finish D.  Any delay in any of
        the three tasks A, C, or D will cause the completion date to
        slip.  We say that A, C, and D are on the "critical path".
        But B is not on the critical path, because B can go on while C
        is going on, and can take up to one day longer than expected
        without delaying the start of D.

        You will write a program which will calculate critical paths.
        The input to the program will be a file in the following
        format:

                A 1
                B 2 A
                C 3 A
                D 1 B C
                FINAL 0 D

        Each line represents one task.  The first field in each line
        is the name of the task.  The second field is the expected
        duration.  If there are any other fields, they are the names
        of other tasks which must be finished before this task can
        start.

        The program will find all the tasks on the critical path to
        the task named FINAL and will print them out, one per line.

        It may happen that the input specifies tasks that cannot
        possibly be completed.  For example:

                A 1 B
                B 1 A
                FINAL 0 A B

        Here A can't start until B is finished, but B can't start
        until A is finished.  In such cases, the program should
        diagnose an error.

----------------------------------------------------------------

Ten different people provided solutions for this quiz.  Seven of the ten
solutions used recursion, a natural fit for this problem.

Four of the solutions used a single pass over the schedule, finding the
critical path for each node by finding the critical path for each of its
dependencies.  Another four used two passes, first calculating start and
end times for each task, then finding the critical path to FINAL in the
second pass.  The remaining two solutions built a list of all the paths to
FINAL and chose the one with the longest duration.

Two of the solutions stuck with Llama-only features, avoiding references in
particular.  (That's always an option, but not a requirement, for the
regular quizzes.)  One solution was object-oriented.  Also, one of the
solutions was written in Ruby.


I wrote a test suite to test the solutions with various valid and invalid
input.  When viewing these results, keep in mind that the problem
description did not specify how to handle:
  tasks being both direct and indirect dependencies of other tasks
    (valid #5, #6)
  blank lines in the input (valid #10)
  invalid input other than cycles (invalid #4, #5, #6)

                     Valid Input                Invalid Input
             1  2  3  4  5  6  7  8  9 10     1  2  3  4  5  6
          .--------------------------------.--------------------.
rjk       |  y  y  y  y  y  y  y  y  n  y  |  y  y  y  y  y  y  |
roel      |  y  y  y  y  n  n  y  y  y  *  |  y  y  y  n  n  n  |
xavier    |  y  y  y  y  y  y  y  y  y  *  |  y  y  y  n  n  n  |
christian |  y  y  y  y  n  n  y  y  y  n  |  y  y  y  y  y  y  |
mjd2      |  y  y  y  y  n  y  y  y  y  y  |  y  y  y  n  -  n  |
rod       |  y  y  y  y  y  y  y  y  y  y  |  y  y  y  n  +  y  |
adrian    |  y  y  y  y  y  y  y  y  y  y  |  y  y  y  n  n  n  |
jurgen    |  y  y  y  y  y  y  y  y  y  *  |  y  y  y  n  n  n  |
jean      |  y  y  y  y  y  y  y  y  y  y  |  y  y  y  n  y  y  |
zsban     |  y  y  y  y  y  y  y  y  y  y  |  y  y  y  y  y  y  |
          `--------------------------------'--------------------'

* Successful, but with warnings
- Questionable; no output at all
+ Questionable; error message slightly misleading

Valid Input
1.  Simple 1
2.  Simple 2
3.  Multiple Critical Paths
4.  Complicated
5.  FINAL depends on everything
6.  Extra dependencies
7.  Extra tasks
8.  No dependency for FINAL
9.  Out of order
10. Blank lines

Invalid Input
1.  1-cycle
2.  2-cycle
3.  3-cycle
4.  Repeated label
5.  No FINAL
6.  Missing dependency


I'll start by discussing my own solution, because it uses only
features from the Llama (AFAIK), it's fairly short, and I think it's
an interesting approach.

I started by deciding that I would require the input to be in order.
(That is, a task must be given before any tasks that depend on it.)
This allowed me to easily create a list of all the paths through the
schedule (and also made it impossible to have cycles).

    #!/usrs/local/bin/perl -w
    
    use strict;
    
    my %paths;
    my %labels;
    
    while (<>) {
      next unless /\S/;
    
      my($label, $duration, @dependencies) = split;
    
      die "Duplicate label $label on line $..\n" if $labels{$label}++;
    
      if (@dependencies) {
        foreach my $dependency (@dependencies) {
          die "Label $label depends on itself on line $..\n"
            if $label eq $dependency;
    
          die "Unknown dependency $dependency on line $..\n"
            if !$labels{$dependency};
    
          foreach my $path (keys %paths) {
            if ($path =~ / \Q$dependency\E\z/) {
              $paths{"$path $label"} = $paths{$path} + $duration;
            }
          }
        }
      } else {
        $paths{" $label"} = $duration;
      }
    }

The heart of the program is hash that contains all the paths through the
schedule as keys, and the durations of the paths as values.  For example,
the hash might look like this:
(' A' => 1, ' A B' => 3, ' A C' => 4, ' A B D' => 4, ' A C D' => 5,
 ' A B D FINAL' => 4, ' A C D FINAL' => 5)
The leading space makes the regexes simpler.

If a task has dependencies, the script finds each path in the hash that
ends with one of those dependencies and appends the task to make a new
path.  If a task has no dependencies, a new path started consisting just of
that task.

    die "No FINAL label.\n" if !$labels{'FINAL'};
    
    my @critical;
    my $max = -1;
    foreach my $path (keys %paths) {
      next if $path !~ / FINAL\z/;
    
      if ($paths{$path} > $max) {
        $max = $paths{$path};
        @critical = $path;
      } elsif ($paths{$path} == $max) {
        push @critical, $path;
      }
    }

Now that the script has all possible paths, it simply looks at those that
end at final and grabs the one(s) with the longest duration.

    die "No critical paths.\n" if !@critical;
    
    foreach my $path (@critical) {
      my @path = split ' ', $path;
    
      print "$_\n" for @path;
      print "\n";
      
    }

Although this approach is simple and straightforward, it can use a lot of
memory!


This also highlights another of the redundancies in the problem
description; what to do if there are multiple critical paths to FINAL.

There are three possible answers: output each path separately (A B D
FINAL / A C D FINAL); output any one of the paths (A B D FINAL);
output all the paths intermingled (A B C D FINAL).  I accepted any of
these in my test suite, although personally I feel the first one is
best.  [ The problem statement asks only for a list of "the tasks on
the critical path".  The critical path itself isn't the goal; it's the
listing of the critical tasks, because these are the tasks whose
slippage will delay the end of the project.  So option #2 above is
simply erroneous.  My own preference is for option #3, since #1 is
redundant.  - MJD ]

Now for an example of a recursive solution.  I'll discuss the one provided
by Rod Adams, which is short and sweet.

    #!/usr/bin/perl -w
    
    my %tasks = ();
    while (<>) {
      chomp;
      next if /^\s*$/;
      my ($task, $duration, @prereqs) = split /\s+/;
      @{$tasks{$task}}{'duration', 'prereq', 'painted'} =
        ($duration, \@prereqs, 0);
    }

First the script builds a data structure containing the duration and
prerequisites for each task.  The 'painted' value will be used to detect
cycles.

    my ($duration, @path) = CritPath('FINAL');
    print "Critical Path: ($duration days)\n",
          map(" $_\t$tasks{$_}{duration}\n", @path);

The call to CritPath('FINAL') will recursively find the desired critical
path.
              
    sub CritPath {      
      my ($node) = $tasks{$_[0]} ||  die "Bad prereq $_[0]\n";
      $node->{painted}++         and die "Tasks must be acyclic.\n";

If the task is already painted, there's a cycle.

      my @maxpath = (0);
      for my $req (@{$node->{prereq}}) {  
        my @path = CritPath($req);
        @maxpath = @path if $path[0] > $maxpath[0];      
      }

Find the critical path for each of the task's dependencies and grab the one
with the longest duration.

      $node->{painted}--;

Unpaint the task because there may be a separate, non-cyclic path to it.

      $maxpath[0] += $node->{duration};
      return (@maxpath, $_[0]);
    }

Add the current task to the end of its critical path and return.
(If the task has no dependencies, its critical path is just itself.)

There was an interesting bug in Rod's script.  The Bad prereq check in the
original was:

  my ($node) = $tasks{$_[0]} or  die "Bad prereq $_[0]\n";

Do you see the problem?  I didn't at first.  It turns out the precedence is
wrong.  This line is parsed as:

  (my ($node) = $tasks{$_[0]}) or  die "Bad prereq $_[0]\n";

A list assignment in scalar context returns the number of values assigned.
In this case that is always 1, even if the value happens to be undef.  So
the conditional is always true and the die can never execute.

In the code above, I fixed this by changing to the high precedence
||-operator, which is parsed as:

  my ($node) = ($tasks{$_[0]} ||  die "Bad prereq $_[0]\n");

Another way to fix it is to use scalar assignment instead of list
assignment:

  my $node = $tasks{$_[0]} or  die "Bad prereq $_[0]\n";


Some additional notes:

1. Mine was the only solution that required the input to be given in order.
   Something like Mark-Jason Dominus's tsort() sub could be used to fix
   this shortcoming.

2. All the solutions passed the straightforward test cases.  However, Zsban
   Ambrus's solution (which happens to be written in Ruby) was the only one
   that passed all the additional tests as well.  That's some nice
   defensive programming!

3. MJD's first solution went into an infinite loop on the 'Extra tasks'
   test, so I left it out and just used his second one.

4. I'm familiar with MJD's philosophy on warnings from an earlier quiz.  If
   he had turned them on, however, he would have noticed that he was still
   printing values from %slack even though he had removed the code that
   calculated those values.  [ I did notice this, even without '-w'.
   My mistake was to leave the debugging lines uncommented in the
   version of the program I posted to the mailing list. - MJD ]

Thanks to everyone who participated!  Additional thanks to Roel van der
Steen for his sample input, which revealed a bug in an early version of one
of the other solutions, and to Jurgen Pletinckx for pointing to sample
input from a similar problem on the perl golf mailing list.

[ Ronald also set me an archive file containing the solutions and test
  programs; I have placed this at
  http://perl.plover.com/qotw/misc/r021/quiz21.tgz . Thank you,
  Ronald!  I expect to send the new quiz later today.  - MJD ]