Sample solutions and discussion Perl Quiz of The Week #4 (20021106) Two words are said to be 'anagrams' if the letters of one word can be rearranged to form the other word. For example, in English, 'ascot' and 'tacos' are anagrams; so are 'tacos' and 'coats'. A set of words that are all anagrams of one another is an 'anagram set'. For example, ascot tacos coast coats is an anagram set. Letter case doesn't matter, so, for example, 'liberating' and 'Gilbertian' are considered to be anagrams. Write a program, make_anagrams, which reads a list of words, one per line, and finds all the anagrams in the the word list. It should output an anagram listing, as follows: * 'Words' that contain digits or punctuational characters should be ignored. * Anagram sets that contain only one word should be omitted from the output. * If an anagram set contains two words, say 'halls' and 'shall', the output should contain two lines: halls shall shall halls * If an anagram set contains more than two words, the entire set should be listed under the alphabetically first word; the others should cross-reference it. For example: headskin nakedish sinkhead nakedish (See 'headskin') sinkhead (See 'headskin') * Finally, the output lines should be in alphabetic order. For example, if the input was 5th ascot ate carrot coast coats cots Dorian eat halls headskin inroad nakedish ordain Ronald's shall sinkhead tacos tea then the output should be: Dorian inroad ordain ascot coast coats tacos ate eat tea coast (See 'ascot') coats (See 'ascot') eat (See 'ate') halls shall headskin nakedish sinkhead inroad (See 'Dorian') nakedish (See 'headskin') ordain (See 'Dorian') shall halls sinkhead (See 'headskin') tacos (See 'ascot') tea (See 'ate') If you need a sample input, you may obtain English word lists from http://perl.plover.com/qotw/words/ If you prefer to do this quiz in a language other than English, please substitute whatever conventions are appropriate for that language. ---------------------------------------------------------------- Here's some sample code: #!/usr/bin/perl while (<>) { chomp; next if /[^A-Za-z]/; my $key = join "", sort(split //, lc $_); $sets{$key} .= "$_:"; } for my $wl (values %sets) { my @wl = split /:/, $wl; if (@wl == 2) { push @output, "$wl[0] $wl[1]"; push @output, "$wl[1] $wl[0]"; } elsif (@wl > 2) { my ($first, @rest) = sort insensitive @wl; push @output, "$first @rest"; for (@rest) { push @output, "$_ (See '$first')"; } } } for my $line (sort insensitive @output) { print $line, "\n"; } sub insensitive {lc($a) cmp lc($b)} The key to solving this problem was figuring out how to decide if two words were anagrams. Two words are anagrams if they have the same letters, possibly in a different order. The easy way to find out if this is the case is to take the letters and put them in some canonical order, for example by sorting them; if the sorted letter lists are the same, then the words were anagrams. Nearly everyone who solved this problem did so by splitting the incoming words into letters, sorting the letters, and joining them back together to form a hash key. For example, the hash key for the word 'Ethiopian' would be 'aehiinopt'. Then you just list each word under its own hash key, and you have a hash where the hash values are lists of anagrammatic words. The first 'while' loop reads the input, computes the key for each word, and installs it into the %sets hash under the appropriate key. At the end of the loop, the %sets values are lists of anagrams, separated by ':' characters. It would have been preferable to use a hash of arrays, but _Learning Perl_ doesn't cover that. A typical value: $sets{'einrs'} is "resin:rinse:risen:siren:". The second loop prepares the output. It scans over all the anagram lists ('$wl' stands for 'word list') and decides what the output will look like for each list. It appends lines of output to @output; later these lines will be sorted and printed. If there are exactly two words in $wl, then two output lines are appended. If there are more than two, then they're sorted into order, the entire word list is listed under the first word, and the rest have crossreferences. Finally, the output lines are printed in alphabetical order. The utility function 'insensitive', which compares two strings without regard to case, is used in two places to produce slphabetically sorted lists. There was some discussion about what to do when the input contained two words with the same spelling but different case, as with 'Polish' (pertaining to Poland) and 'polish' (to make something shiny.) The version above includes such words, which means that there are some silly outputs that inform you that 'polish' is an anagram of 'Polish'. If you don't like this, add next if $seen{lc $_}++; in the top loop; this discards any word that has been seen before with any capitalization. (This is another application of the 'canonical form' idea; this time the canonical form of a string is the all-lowercase version.) The 'expert' quiz postmortem is delayed because the results are so very interesting. There were many solutions posted to the -discuss list and much discussion, and I don't want to leave out anything interesting. Expect it tomorrow, along with a pair of new quizzes. Thanks to everyone who participated, whether or not they sent mail about it.