Sample solutions and discussion Perl Expert Quiz of The Week #21 (20040812) [ Aaron sent this to me on time, but I have been behind schedule in handling QOTW materials because I have had to work on my book. I will send out both regular and expert quiz #22 this Wednesday. Thanks for your patience. - MJD ] In the 1960's, the grad students at the University of Chicago math department worked on a series of astoundingly useless and time-consuming puzzles. One of these follows. Consider the set of all possible strings of the alphabet ('a' .. 'z'). Let us agree to consider two strings "equivalent" if the following conditions hold: 1. They contain precisely the same letters, and 2. They both appear in Webster's Third International Dictionary. In such a case, the two strings are considered interchangeable in all contexts. For example, "am" and "ma" are equivalent, and this also implies that "amount" and "maount" are equivalent, as are "grammar" and "grmamar" and "gramamr" and "grmaamr". Moreover, equal letters can be cancelled from the front and back end of any string. For example, "abby" and "baby" are equivalent, and, cancelling the trailing "by", this implies that "ab" and "ba" are also equivalent, and can be exchanged anywhere. When two letters can be exchanged in this way, we say that they "commute". The third floor of the math building at UC had a huge 26x26 chart; the square in column i and row j contained a proof that letters i and j would commute. Sometimes these proofs can be rather elaborate. For example, the dictionary has "dire" and "ride", so, by cancelling the trailing "e"s, one has "dir" = "rid". The dictionary also has dirten = rident (No, I don't know what those mean.) Since "dir" = "rid" we have: rident = dirent and since "rident" = "dirten", dirten = dirent even though "dirent" is not a dictionary word. Cancelling the leading "dir" leaves: ten = ent but ten = net because "ten" and "net" are dictionary words, so ent = net and, cancelling the "t", en = ne and we've just proved that "en" and "ne" commute. This fact might be useful in later proofs. What's the point of all this? Well, the goal is to find out which letters commute with *every* other letter; such letters are said to be in the "center" of the system. As for the *point*, I'm not sure there is one. Apparently the math grads at UC didn't have enough to occupy their time. The chart in the UC math building has since been lost, so your task is to write a program whose input is a word list, with one word per line, and which makes appropriate deductions and eventually computes the center of the system. I don't have the headword list from Webster's Third, but I do have the list from Webster's Second, so let's use that. You can get a copy from http://perl.plover.com/qotw/words/Web2.bz2 http://perl.plover.com/qotw/words/Web2.gz ---------------------------------------------------------------- The mailing list had an initial brief flurry of discussion about the equivalence rules. Mark offered a useful clarification: Two strings are equivalent if conditions (1) and (2) hold. Moreover, if A is equivalent to B and B is equivalent to C, then A is equivalent to C. Moreover, if A is equivalent to B, then pAq is equivalent to pBq for all strings p and q. Moreover, if pAq is equivalent to pBq for some strings p, q, A, and B, then A is equivalent to B. Nothing else is equivalent. Suppose L and M are strings of length 1; that is, single letters. We say that L and M commute if and only if LM is equivalent to ML. Two people posted solutions to the list. Rod Adams approached the problem in what he described as a 'brute-force' way: repeatedly using the inference rules to derive new equivalences. Since the dictionary is large to begin with (234936 words), and the cancellation and transitivity rules add more equivalences, his solution rapidly accumulates a very large data set. Rod used the MySQL relational database to accommodate this data set. (It occurred to me that it might have been possible to use the DBD::SQLite module instead. SQLite offers a SQL relational DBMS without needing a server, though with a few restrictions compared to server-based databases.) Unfortunately, Rod's program took a great deal of time to execute. Running it overnight on my computer produced only three of the letters in the center; I stopped it in the morning before it could get any further. The other solution posted was by Daniel Martin. Daniel's approach was in essence quite similar to Rod's. However, Daniel's program aggressively prunes the search space: as new equivalences are found, existing longer ones are thrown away if they can be derived from the current set of equivalences. This allows the center to be found in about 30 seconds on my computer. Daniel's solution also has the pleasing property that it constructs the outline of a proof for each commutative pair. Daniel's approach iterates over the set of equivalences. His reduce() function attempts to break a fact into AB=CD where either A=C or B=D is known. If both are known, the original fact is simply deleted. The fact is replaced with B=D if only A=C is known, and with A=C is only B=D is known. [ There was some fascinating discusion on the -discuss list about whether these deletions were truly safe. Eventually everyone involved agreed that they were. That part of the discussion is available at http://perl.plover.com/~alias/list.cgi?1:sss:1979 if you want to read it. - MJD ] sub reduce ($\$) { my ($fact, $found_something_flag) = @_; my ($worda, $wordb) = split('=', $fact); my ($foundpossible) = 0; my ($startat) = 1; my (@worda) = split(//,$worda); my (@wordb) = split(//,$wordb); if ($worda eq $wordb) { # we shouldn't be calling this with say-nothing facts print STDERR "Huh? $fact:\n$known{$fact}\n"; delete $interesting{$fact}; return qw[]; } while (substr($worda,0,$startat+1) eq substr($wordb,0,$startat+1)) { $startat++; } for my $i ($startat..$#worda) { @worda[0..$i-1] = sort @worda[0..$i-1]; @wordb[0..$i-1] = sort @wordb[0..$i-1]; if (join('',@worda[0..$i-1]) eq join('',@wordb[0..$i-1])) { $foundpossible = 1; my $lefta = substr($worda,0,$i); my $leftb = substr($wordb,0,$i); my $righta = substr($worda,$i); my $rightb = substr($wordb,$i); my $reason; if ($reason = knownp($lefta, $leftb)) { if (knownp($righta, $rightb)) { # This fact is merely the aggregation # of other stuff we know delete $interesting{$fact}; return qw[]; } else { $$found_something_flag = 1; delete $interesting{$fact}; my $newfact = join('=', $righta,$rightb); $reason = join("\n", $known{$fact}, ">$fact", $reason); $reason =~ s/^/ /mg; $newfact =~ s/^(\w+)(\w+)=\1/$2=/; $newfact =~ s/(\w+)=(\w+)\1$/=$2/; $newfact = join('=', sort split(/=/,$newfact)); if (! $known{$newfact} ) { $known{$newfact} = $reason; if (length($newfact)>5) { $interesting{$newfact}=1; } return ($newfact); } else { return qw[]; } } } elsif ($reason = knownp($righta, $rightb)) { $$found_something_flag = 1; delete $interesting{$fact}; my $newfact = join('=', $lefta, $leftb); $reason = join("\n", $known{$fact}, ">$fact", $reason); $reason =~ s/^/ /mg; $newfact =~ s/^(\w+)(\w+)=\1/$2=/; $newfact =~ s/(\w+)=(\w+)\1$/=$2/; $newfact = join('=', sort split(/=/,$newfact)); if (! $known{$newfact} ) { $known{$newfact} = $reason; if (length($newfact)>5) { $interesting{$newfact}=1; } return ($newfact); } else { return qw[]; } } } } if ($foundpossible) { return ($fact); } else { return qw[]; } } The knownp() function referred to in reduce() is a semipredicate returning the reason its arguments are known to be equivalent, or false if they aren't known to be equivalent. reduce() is called repeatedly on the set of equivalences. If it fails to reduce any to simpler forms, the next phase involves using existing commutative pairs to derive new equivalences, taking advantage of the interchangeability of the commutative pairs. This function uses string eval to create a function which does the interchanging: sub make_mangle_sub { my @twoletterequivs = grep(5==length, keys %known); my @twoletterswaps = sort map { split('='); } @twoletterequivs; my %lettercommutes=(); for my $letter ('a'..'z') { my @swaps = grep /^$letter/, @twoletterswaps; if (@swaps) { $lettercommutes{$letter} = join('', sort map { substr($_,1) } @swaps); } } my @letters_to_swap = sort {length($lettercommutes{$a}) <=> length($lettercommutes{$b}) or $a cmp $b} keys %lettercommutes; my $mangle_sub_text = "sub { my \$fact = shift;\n"; for my $letter (@letters_to_swap) { my $otherlet = $lettercommutes{$letter}; $mangle_sub_text .= "\$fact =~ s/([$otherlet]+)($letter+)/\$2\$1/g;\n"; } $mangle_sub_text .= "\$fact = join('=', sort split(/=/,\$fact));\n"; $mangle_sub_text .= '$fact; }'; eval ($mangle_sub_text); } If applying the 'mangling' function to each equivalence fails to produce any new facts, then all derivable equivalences have been found. At this point, any letters which commute with all other letters are in the center. Daniel's solution identifies the center as: a c e i l m n o p r s t u Thanks to all who participated, including those who took part and didn't post a solution to the list.