Sample solutions and discussion
Perl Expert Quiz of The Week #21 (20040812)

[ Aaron sent this to me on time, but I have been behind schedule in
  handling QOTW materials because I have had to work on my book.  I
  will send out both regular and expert quiz #22 this Wednesday.
  Thanks for your patience.  - MJD ]


        In the 1960's, the grad students at the University of Chicago math
        department worked on a series of astoundingly useless and
        time-consuming puzzles.  One of these follows.  Consider the set of
        all possible strings of the alphabet ('a' .. 'z').  Let us agree to
        consider two strings "equivalent" if the following conditions hold:

            1. They contain precisely the same letters, and
            2. They both appear in Webster's Third International
               Dictionary.

        In such a case, the two strings are considered interchangeable in
        all contexts.  For example, "am" and "ma" are equivalent, and this
        also implies that "amount" and "maount" are equivalent, as are
        "grammar" and "grmamar" and "gramamr" and "grmaamr".

        Moreover, equal letters can be cancelled from the front and back
        end of any string.  For example, "abby" and "baby" are equivalent,
        and, cancelling the trailing "by", this implies that "ab" and "ba"
        are also equivalent, and can be exchanged anywhere.  When two
        letters can be exchanged in this way, we say that they "commute".
        The third floor of the math building at UC had a huge 26x26 chart;
        the square in column i and row j contained a proof that letters i
        and j would commute.

        Sometimes these proofs can be rather elaborate.  For example, the
        dictionary has "dire" and "ride", so, by cancelling the trailing
        "e"s, one has "dir" = "rid".

        The dictionary also has

            dirten = rident

        (No, I don't know what those mean.)  Since "dir" = "rid" we have:

            rident = dirent

        and since "rident" = "dirten",

            dirten = dirent

        even though "dirent" is not a dictionary word. Cancelling the
        leading "dir" leaves:

               ten =    ent

        but

               ten =    net

        because "ten" and "net" are dictionary words, so

               ent =    net

        and, cancelling the "t",

               en  =    ne

        and we've just proved that "en" and "ne" commute.  This fact might
        be useful in later proofs.

        What's the point of all this?  Well, the goal is to find out which
        letters commute with *every* other letter; such letters are said to
        be in the "center" of the system.  As for the *point*, I'm not sure
        there is one.  Apparently the math grads at UC didn't have enough
        to occupy their time.

        The chart in the UC math building has since been lost, so your task
        is to write a program whose input is a word list, with one word per
        line, and which makes appropriate deductions and eventually
        computes the center of the system.  I don't have the headword list
        from Webster's Third, but I do have the list from Webster's Second,
        so let's use that.  You can get a copy from

            http://perl.plover.com/qotw/words/Web2.bz2
            http://perl.plover.com/qotw/words/Web2.gz

----------------------------------------------------------------

The mailing list had an initial brief flurry of discussion about the
equivalence rules.  Mark offered a useful clarification:

        Two strings are equivalent if conditions (1) and (2) hold.
        Moreover, if A is equivalent to B and B is equivalent to C, then A
        is equivalent to C.
        Moreover, if A is equivalent to B, then pAq is equivalent to pBq
        for all strings p and q.
        Moreover, if pAq is equivalent to pBq for some strings p, q, A, and
        B, then A is equivalent to B.
        Nothing else is equivalent.

        Suppose L and M are strings of length 1; that is, single letters.
        We say that L and M commute if and only if LM is equivalent to ML.

Two people posted solutions to the list.

Rod Adams approached the problem in what he described as a 'brute-force'
way: repeatedly using the inference rules to derive new equivalences.
Since the dictionary is large to begin with (234936 words), and the
cancellation and transitivity rules add more equivalences, his solution
rapidly accumulates a very large data set.  Rod used the MySQL relational
database to accommodate this data set.  (It occurred to me that it might
have been possible to use the DBD::SQLite module instead.  SQLite offers a
SQL relational DBMS without needing a server, though with a few
restrictions compared to server-based databases.)

Unfortunately, Rod's program took a great deal of time to execute.  Running
it overnight on my computer produced only three of the letters in the
center; I stopped it in the morning before it could get any further.

The other solution posted was by Daniel Martin.  Daniel's approach was in
essence quite similar to Rod's.  However, Daniel's program aggressively
prunes the search space: as new equivalences are found, existing longer
ones are thrown away if they can be derived from the current set of
equivalences.  This allows the center to be found in about 30 seconds on my
computer.  Daniel's solution also has the pleasing property that it
constructs the outline of a proof for each commutative pair.

Daniel's approach iterates over the set of equivalences.  His reduce()
function attempts to break a fact into AB=CD where either A=C or B=D is
known.  If both are known, the original fact is simply deleted.  The fact
is replaced with B=D if only A=C is known, and with A=C is only B=D is
known.

[ There was some fascinating discusion on the -discuss list about
  whether these deletions were truly safe.  Eventually everyone
  involved agreed that they were.  That part of the discussion is
  available at

        http://perl.plover.com/~alias/list.cgi?1:sss:1979

  if you want to read it. - MJD ]


        sub reduce ($\$) {
          my ($fact, $found_something_flag) = @_;
          my ($worda, $wordb) = split('=', $fact);
          my ($foundpossible) = 0;
          my ($startat) = 1;
          my (@worda) = split(//,$worda);
          my (@wordb) = split(//,$wordb);
          if ($worda eq $wordb) {
            # we shouldn't be calling this with say-nothing facts
            print STDERR "Huh? $fact:\n$known{$fact}\n";
            delete $interesting{$fact};
            return qw[];
          }
          while (substr($worda,0,$startat+1) eq substr($wordb,0,$startat+1)) {
            $startat++;
          }
          for my $i ($startat..$#worda) {
            @worda[0..$i-1] = sort @worda[0..$i-1];
            @wordb[0..$i-1] = sort @wordb[0..$i-1];
            if (join('',@worda[0..$i-1])
                eq
                join('',@wordb[0..$i-1])) {
              $foundpossible = 1;
              my $lefta = substr($worda,0,$i);
              my $leftb = substr($wordb,0,$i);
              my $righta = substr($worda,$i);
              my $rightb = substr($wordb,$i);
              my $reason;
              if ($reason = knownp($lefta, $leftb)) {
                if (knownp($righta, $rightb)) {
                  # This fact is merely the aggregation
                  # of other stuff we know
                  delete $interesting{$fact};
                  return qw[];
                } else {
                  $$found_something_flag = 1;
                  delete $interesting{$fact};
                  my $newfact = join('=', $righta,$rightb);
                  $reason = join("\n", $known{$fact}, ">$fact", $reason);
                  $reason =~ s/^/ /mg;
                  $newfact =~ s/^(\w+)(\w+)=\1/$2=/;
                  $newfact =~ s/(\w+)=(\w+)\1$/=$2/;
                  $newfact = join('=', sort split(/=/,$newfact));
                  if (! $known{$newfact} ) {
                    $known{$newfact} = $reason;
                    if (length($newfact)>5) {
                      $interesting{$newfact}=1;
                    }
                    return ($newfact);
                  } else {
                    return qw[];
                  }
                }
              } elsif ($reason = knownp($righta, $rightb)) {
                $$found_something_flag = 1;
                delete $interesting{$fact};
                my $newfact = join('=', $lefta, $leftb);
                $reason = join("\n", $known{$fact}, ">$fact", $reason);
                $reason =~ s/^/ /mg;
                $newfact =~ s/^(\w+)(\w+)=\1/$2=/;
                $newfact =~ s/(\w+)=(\w+)\1$/=$2/;
                $newfact = join('=', sort split(/=/,$newfact));
                if (! $known{$newfact} ) {
                  $known{$newfact} = $reason;
                  if (length($newfact)>5) {
                    $interesting{$newfact}=1;
                  }
                  return ($newfact);
                } else {
                  return qw[];
                }
              }
            }
          }
          if ($foundpossible) {
            return ($fact);
          } else {
            return qw[];
          }
        }

The knownp() function referred to in reduce() is a semipredicate returning
the reason its arguments are known to be equivalent, or false if they
aren't known to be equivalent.

reduce() is called repeatedly on the set of equivalences.  If it fails to
reduce any to simpler forms, the next phase involves using existing
commutative pairs to derive new equivalences, taking advantage of the
interchangeability of the commutative pairs.  This function uses string
eval to create a function which does the interchanging:

        sub make_mangle_sub {
          my @twoletterequivs = grep(5==length, keys %known);
          my @twoletterswaps = sort map { split('='); } @twoletterequivs;
          my %lettercommutes=();

          for my $letter ('a'..'z') {
            my @swaps = grep /^$letter/, @twoletterswaps;
            if (@swaps) {
              $lettercommutes{$letter} =
                  join('', sort map { substr($_,1) } @swaps);
            }
          }

          my @letters_to_swap = sort {length($lettercommutes{$a}) <=>
                                        length($lettercommutes{$b}) or
                                          $a cmp $b} keys %lettercommutes;

          my $mangle_sub_text = "sub { my \$fact = shift;\n";
          for my $letter (@letters_to_swap) {
            my $otherlet = $lettercommutes{$letter};
            $mangle_sub_text .= "\$fact =~ s/([$otherlet]+)($letter+)/\$2\$1/g;\n";
          }
          $mangle_sub_text .= "\$fact = join('=', sort split(/=/,\$fact));\n";
          $mangle_sub_text .= '$fact; }';

          eval ($mangle_sub_text);
        }

If applying the 'mangling' function to each equivalence fails to produce
any new facts, then all derivable equivalences have been found.  At this
point, any letters which commute with all other letters are in the center.
Daniel's solution identifies the center as:

        a c e i l m n o p r s t u

Thanks to all who participated, including those who took part and didn't
post a solution to the list.