Sample solutions and discussion Perl Quiz of The Week #18 (20040623) [Note: This was quiz #18, not #16. I sent out the quiz with the wrong Subject line. -Mark Jason Dominus] For statewide elections, the state of California uses a "random alphabetic" sort to order the candidates' names on the ballot. This link describes that special sort: http://www.ss.ca.gov/elections/elections_ra.htm (mirrored at http://perl.plover.com/qotw/misc/r018/elections_ra.htm ) The idea is this: a random order is selected for the twenty-six letters of the alphabet, and then the names on the ballot are sorted according to this random order, instead of according to the usual order. This is the order in which the names appear on the ballot in Assembly District 1. In Assembly District 2, the order is the same, except that the first name on the list has been moved to the end of the list. In Assembly District 3, the next name on the list is moved to the end of the list, and so on. You will write a program to generate ballots. The program will be invoked like this: elect.pl [-r] [permutation] [district-number] [namefile] (Items in [square brackets] are optional.) The 'permuation' argument will default to 'randlet.txt'. If the file exists, it should contain a single line of text with a random permutation of the 26 letters of the alphabet. If this file does not exist, the program should generate one at random. The -r option will force the file to be regenerated even if it exists already. The 'district-number' is the number of the Assembly District for which the ballot will be generated. If omitted, the program should generate the ballot for Assembly District 1. The 'namefile' is a file that contains the names of the candidates. The names come one per line with first name then last name: Jill Harmon Gus Tribble Walter Reston Norma Kretschmer Fiorella Squamuglia Vern Smith Bill Smith Ed Squid John Macdonald Angus MacDonald If the namefile is omitted, the program should read the names from the standard input. The program should print out the candidates' names, one per line, in the order specified by California state law, sorted according to the random permutation of the alphabet, with names rearranged as appropriate for the specified Assembly District. For example, if the file 'permutation' contains QWERTYUIOPASDFGHJKLZXCVBNM and the 'namefile' contains the names listed above, then the output for Assembly District 1 should be: Walter Reston Gus Tribble Ed Squid Fiorella Squamuglia Vern Smith Bill Smith Jill Harmon Norma Kretschmer Angus MacDonald John Macdonald For Assembly District 4, it should be: Fiorella Squamuglia Vern Smith Bill Smith Jill Harmon Norma Kretschmer Angus MacDonald John Macdonald Walter Reston Gus Tribble Ed Squid ---------------------------------------------------------------- When I first encountered the 'random alphabetic ordering' method of the California election system I immediately thought it would make for an interesting Perl challenge and it has. These people submitted ideas/solutions (in no particular order :): Matthew Walton Christer Ekholm John J. Trammell Yitzchak Scott-Thoennes Randy W. Sims Adrian Scottow Rod Adams Kester Allen Jonathan Scott Duff Mark Jason Dominus Torsten Hofmann the hatter Patrick LeBoutillier Rick Measham Mark Morgan There was first a discussion to refine the specification. Command line arguments and how to separate first and last names and what about non-alphabetic characters in names. [Some of this was my fault. Jon's original request was for a program whose input was in a different format than what I sent out; I rewrote it considerably. I was aware that the argument format I inserted was ambiguous, but didn't think it would be a big problem. -MJD] I tried to short circuit some of this discussion as the point of the quiz was to have some fun with sorting techniques. Because of this I won't discuss the various ways of using command line options to get the required data into the program. I'll focus on the sort itself. Many people saw right away that since comparing two names would be a complex affair that this was a prime opportunity for using a Schwartzian Transform. This is a technique invented by and named after (not by) Randall Schwartz, author of the classic "Learning Perl". If any of you do not know of this technique you can read about it in several places (Perl Cookbook, Effective Perl, perldoc -f sort, etc). The key idea in a complex sort is to pre-generate a sort key for each item BEFORE you do the sort rather than each time the sort compares two elements. This is much more efficient and simply darn clever! The use of anonymous arrays and maps unwind the pregenerated sort keys from the desired result: # schwartzian transform for efficiency and fun @names = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [ $_, sortkey $_ ] } @names; How to generate a sort key in this case? The tr/// function comes in quite handy here. If the permuted alphabet in in $alphabet then this will do it: sub sortkey { my $name = shift; $name =~ tr/$alphabet/A-Z/; return $name; } After this transformation the name we can simply do a simple compare and the result will be to have respected the new alphabetic ordering. Unfortunately tr/// does not do variable interpolation (why?). So we need to wrap the tr/// in an eval. There is also the need to swap first and last names and do the sort case-insensitively. So here is what Rick Measham coded: sub encode { # Case insensitive my $switch = uc shift; my $permutation = shift; # Last name first $switch =~ s/^([\w-]+)\s+(.+)$/$2 $1/; # Switch in the new alphabet -- tr doesn't take # variables so we have to eval the # transliteration. We checked the permutation # before so we can trust it. It's just a list # of letters. eval "\$switch =~ tr/$permutation/ABCDEFGHIJKLMNOPQRSTUVWXYZ/"; return $switch; } # Schwartzian Transform -- see note below my @sorted = map { $_->[1] } sort { $a->[0] cmp $b->[0] } map { [ encode($_, $permutation), $_ ] } @names; That pretty much summarizes what I think is the optimal approach. One improvement suggested by Mark Jason Dominus was to only do ONE eval rather than one for each name in the list. Here is his offering: [I sent this to Jon in private email ome time ago when we were first discussing the question; it wasn't intended as a complete solution to the problem. -MJD] open IN, "rlets" or die "can't open rlets\n"; my $rlets = ; print $rlets; chomp $rlets; my $candidates = shift; open IN, "<", $candidates or die "cannot open $candidates: $!\n"; my $code = q{ print sort { my $aa = lc $a; my $bb = lc $b; $aa =~ tr/RLETS/a-z/; $bb =~ tr/RLETS/a-z/; $aa cmp $bb } ; }; $code =~ s/RLETS/$rlets/g; eval $code; Note that the q{ } does a single quoting - no interpolation. I'm surprised that Perl knows about balanced {}'s and does not end the quoted string after "$aa cmp $bb". Or maybe I'm not surprised! :) This code does not use a Schwartzian Transform but is just as quick - evals are apparently quite expensive! [I think that's because Perl must run its parser. -MJD] On other points of the problem: Generating a random permutation of A to Z was accomplished in a variety of ways. Here is one by Mark Morgan: my @letters = ( 'A' .. 'Z' ); my $alphabet = ''; $alphabet .= splice( @letters, rand(@letters), 1 ) while @letters; Kester Allen and John Trammell used a module to generate the permutation: use List::Util qw/shuffle/; Rick Measham offered this regex to ensure that a permutation (not generated by the program) was an actual permutation of A..Z: die("Permutation file '$permutation_file' does not contain a permutation\n") unless $permutation =~ /^(?:([A-Z])(?!.*?\1)){26}$/; Wow! I'd have to study on that one! [Let's do that. The first part locates a capital letter, and captures it in $1: ([A-Z]) The next part uses the 'negative lookahead' construction to require that the same letter *not* appear a second time; the letter may *not* be followed by some sequence of characters (.*) and $1 again (\1): (?! .* \1) The whole thing is wrapped in /^(...){26}$/ which requires that the entire string consist of just 26 such letters, each of which does not appear a second time. I do not know why Pr. Measham used .*? instead of .* . -MJD] Matthew Walton submitted solutions in both Perl and a language called Haskell 98. I'd never heard of it! Thanks, Matthew, for opening my eyes to other realms. I do tend to get somewhat narrow minded in my Perl obsessiveness. I've heard good things about Ruby but have not perused it much yet... [Haskell is my favorite programming language! One thing I learned from Haskell is that inferring block structure from white space is not an intrinsically dumb idea; what is dumb is the way Python has implemented it. It's always nice to find out that a dumb idea is smarter than you thought it was. Haskell has two flagship features. First, it has an extremely powerful type system, and the Haskell compier figures out all the types for you so that it is very rare to have to declare any variables; if the compiler figures out the wrong type, it is almost always because you made a serious programming error. See http://perl.plover.com/yak/typing/ for more details about this. That article is about SML, which has a less sophisticated type system than Haskell, but perhaps it gives the flavor. Second, Haskell uses "lazy evaluation", which means that it never evaluates anythig it doesn't have to. Consider Perl code: my ($y) = grep /foo/, glob("*"); This searches the entire directory, constructs a list of files, scans the entire list, constructs another list of all the files whose names contain "foo", puts the first one into $y, and throws the rest away, wasting most of the work it just did. In the Haskell analog of this code, no work will be done until the value of $y is actually required, and then the directory will be searched and scanned only far enough to find the right value for $y. This happens automatically. One result is that you can write carrots = "carrot" : carrots; and get an infinite list of carrots; Haskell is happy to manipulate infinite lists. Another result is that you can write max = hd sort numbers; to get the smallest number in the list ("hd" gets the value at the head of a list) and this runs in time proportional to the length of the list, because the sort only sorts enough of the list to find the first element of the result. -MJD] Christopher Eckholm submitted his solution on a web page which made it easy to look at! [http://www.chrekh.se/qotw/elect.pl] Everyone's code was rather neatly done and well documented. It would be a pleasure to work with any of you as a team member! Jon [Thanks to everyone who participated, and particularly to those who did so in private and never told anyone about it. I will send the new quiz tomorrow. -MJD]