Sample solutions and discussion Perl Expert Quiz of The Week #13 (20030528) The 'MH' mail system stores email messages in a 'folder', which is just a plain directory. Messages are files in this directory whose names are numerals. The directory might contain other files or subdirectories; these are not messages. Implement a 'sortby' command that sorts the messages in a folder by subject. It should rename the messages in the folder so that (a) the set of message numbers before and after is the same, and (b) afterwards, if the subjects were extracted from each message in message number order, they would be in alphabetic order. 'sortby' is not allowed to change the contents of the folder in any other way. 'sortby' is invoked like this: sortby folder-dir [-f fieldname] [-r] 'folder-dir' is the path to the folder directory. '-f fieldname', if present, tells 'sortby' to examine the specified field of each message, instead of the subject field. '-r', if present, reverses the order of the sort. For example, here's the contents of a folder before sorting: 393 -11/18 "Barbie" use strict< # # # Here is a solution for the expert QOTW #13. My methodology was that I # found out to what each file needs to be renamed to. Then I found the # renaming loops[1] and renamed the files inside them by renaming the first # file to a temporary name. # # Regards, # # Shlomi Fish # # [1] - Since each file is renamed to one other file and each file is # renamed from one other file, one can show that they are renamed in loops: # # A -> B -> C -> D -> A use strict; use Mail::Box; use Mail::Box::MH; use Getopt::Long; my $field = "subject"; my $reverse = 0; my $result = GetOptions("f=s" => \$field, "r" => \$reverse); my $dir_path = shift || "$ENV{HOME}/Mail/inbox"; my $folder = Mail::Box::MH->new('folder' => $dir_path); my $idx = 0; my @messages; while (my $msg = $folder->message($idx++)) { # print $msg->messageId(), "\n"; my $filename = $msg->filename(); $filename =~ /\/([^\/]*)$/; $filename = $1; my $f; if ($field eq "from") { $f = $msg->from(); } elsif ($field eq "to") { $f = $msg->to(); } else { $f = $msg->subject(); } push @messages, { 'filename' => $filename, 'field' => $f }; } undef($folder); my @sorted_indexes = sort { ($messages[$a]->{'field'} cmp $messages[$b]->{'field'}) || ($a <=> $b) } (0 .. $#messages); if ($reverse) { @sorted_indexes = reverse(@sorted_indexes); } my @messages_moved = (0) x @messages; my $temp_filename = "a6Hy0"; while (-e "$dir_path/$temp_filename") { $temp_filename++; } my $i; chdir($dir_path); sub myrename { my ($from, $to) = @_; print "Renaming $from to $to\n"; rename($from, $to); } for($i=0;$i<@messages;$i++) { if (!$messages_moved[$i]) { # Check if we move the message to itself # if so - do nothing except mark this message # as moved if ($sorted_indexes[$i] == $i) { $messages_moved[$i] = 1; } else { myrename($messages[$i]->{'filename'}, $temp_filename); my ($prev_idx, $next_idx); $prev_idx = $i; $next_idx = $sorted_indexes[$prev_idx]; while ($next_idx != $i) { myrename( $messages[$next_idx]->{'filename'}, $messages[$prev_idx]->{'filename'} ); $messages_moved[$prev_idx] = 1; $prev_idx = $next_idx; $next_idx = $sorted_indexes[$prev_idx]; } $messages_moved[$prev_idx] = 1; myrename($temp_filename, $messages[$prev_idx]->{'filename'}); } } } __END__ ---------------------------------------------------------------------- Shlomi Fish shlomif@vipe.technion.ac.il Home Page: http://t2.technion.ac.il/~shlomif/ An apple a day will keep a doctor away. Two apples a day will keep two doctors away. Falk Fish ---------------------------------------------------------------- 1. Shlomi had my $result = GetOptions("f" => \$field, "r" => \$reverse); but it should have been my $result = GetOptions("f=s" => \$field, "r" => \$reverse); This puzzled me for a while, since it caused the program to fail when run as perl fish.pl -f from /tmp/TEST Since GetOptions thought that '-f' was a flag argument, the program interpreted 'from' as the folder directory name. Then the folowing my $folder = Mail::Box::MH->new('folder' => $dir_path); call assigned undef to $folder (since there was no 'from' directory), and a later while (my $msg = $folder->message($idx++)) died with "Can't call method "message" on an undefined value". 2. The program can't handle any -f flags other than 'from' or 'to', apparently owing to crappy design in the Mail::Box module. (Anyone sense a recurring theme in tonight's postmortems?) 3. Continuing the theme, look at this: time perl -MMail::Box::MH -e 1 real 0m3.501s user 0m2.980s sys 0m0.180s Am I the only person who finds this excessive? You should see how long it takes to start up in the debugger. Wait, it gets better: perl -l -MMail::Box::MH -e 'print join "\n", keys %INC' POSIX.pm Time/Zone.pm Mail/Message/Head.pm List/Util.pm MIME/Types.pm Mail/Box.pm Mail/Message/Body/Lines.pm Cwd.pm Mail/Message/Field/Fast.pm Mail/Box/MH/Index.pm Fcntl.pm Symbol.pm MIME/Type.pm Mail/Box/Parser.pm Scalar/Util.pm Mail/Message/Body/File.pm Exporter.pm Mail/Message/Part.pm File/Spec.pm /usr/local/lib/perl5/5.8.0/i586-linux/auto/POSIX/autosplit.ix Mail/Message/Body.pm locale.pm warnings/register.pm XSLoader.pm /usr/local/lib/perl5/5.8.0/i586-linux/auto/POSIX/load_imports.al Object/Realize/Later.pm Mail/Box/Locker.pm IO/ScalarArray.pm Mail/Message/Body/Delayed.pm Mail/Box/MH/Labels.pm File/Spec/Unix.pm Exporter/Heavy.pm vars.pm strict.pm Mail/Message/Body/Multipart.pm Mail/Box/Dir.pm Mail/Box/MH/Message.pm AutoLoader.pm IO/Lines.pm Mail/Box/MH.pm IO/Handle.pm re.pm Mail/Message/Head/Partial.pm SelectSaver.pm Mail/Message/Head/Complete.pm Time/Local.pm warnings.pm Mail/Box/Dir/Message.pm Mail/Message.pm IO/WrapTie.pm Mail/Reporter.pm Mail/Message/Construct.pm Mail/Message/Head/Delayed.pm Mail/Message/Field.pm IO/Seekable.pm File/Copy.pm base.pm Config.pm File/Basename.pm integer.pm Mail/Box/Message.pm IO.pm Carp.pm Mail/Message/Body/Nested.pm overload.pm Date/Parse.pm IO/File.pm Mail/Address.pm Mail/Message/Head/Subset.pm DynaLoader.pm That sounds like a joke, doesn't it? Say "use Mail::Box::MH" and you load *seventy* modules. 4. The business end of Shlomi's program is: for($i=0;$i<@messages;$i++) { if (!$messages_moved[$i]) { # Check if we move the message to itself # if so - do nothing except mark this message # as moved if ($sorted_indexes[$i] == $i) { $messages_moved[$i] = 1; } else { myrename($messages[$i]->{'filename'}, $temp_filename); my ($prev_idx, $next_idx); $prev_idx = $i; $next_idx = $sorted_indexes[$prev_idx]; while ($next_idx != $i) { myrename( $messages[$next_idx]->{'filename'}, $messages[$prev_idx]->{'filename'} ); $messages_moved[$prev_idx] = 1; $prev_idx = $next_idx; $next_idx = $sorted_indexes[$prev_idx]; } $messages_moved[$prev_idx] = 1; myrename($temp_filename, $messages[$prev_idx]->{'filename'}); } } } Here's the business end of my version: while (%new_number) { my ($cur) = keys %new_number; my @chain; do { push @chain, $cur; $cur = delete $new_number{$cur}; } while $cur != $chain[0]; print STDERR "Chain: (@chain)\n" if $VERBOSE; my ($prev, @rest) = reverse @chain; my $TMP = "$prev.TMP"; rename("$dir/$prev", "$dir/$TMP") or die "$prev => $TMP: $!"; for my $cur (@rest) { rename("$dir/$cur", "$dir/$prev") or die "$cur => $prev: $!"; $prev = $cur; } rename("$dir/$TMP", "$dir/$prev") or die "$TMP => $prev: $!"; } It seems to me now, from looking at Shlomi's version, that I might be able to save some code by eliminating @chain, and having the function do the rename as it traverses the data structure. Maybe something like this: my %rev = reverse %new_numbers; while (%new_number) { my ($first) = keys %new_number; rename "$dir/$first", "$dir/$first.TMP"; my $prev; my $cur = $first; while (1) { $prev = $cur; $cur = $rev{$cur}; last if $cur eq $first; rename "$dir/$cur", "$dir/$prev" or die "$cur => $prev: $!"; } rename "$dir/$cur.TMP", "$dir/$prev" or die "$cur => $prev: $!"; And perhaps I could get rid of the 'reverse' line by building the hash in the correct direction to begin with. I was influenced to think of the '@chain' idea because the standard MH 'sortm' command prints out diagnostic messages that say "renaming message chain from 841 to 1". I did save some code by using 'delete' to mark the messages as having been moved; this seems less verbose than using an explicit flag as Shlomi's program does. 5. Most of the remaning details in my program have to do with option parsing and soforth. I have a private 'MH' module that deals with reading in my MH profile file and locating my folders and so on. Most of it isn't very interesting, but the code is at http://perl.plover.com/qotw/misc/e013/mjd.pl if you want to see it. 6. The standard MH 'sortm' program performs a bunch of other MH-specific tasks in addition to renaming the message files. For example, an MH folder can have a set of 'sequences' associated with it, which are essentially sets of messages. The folder directory contains a file named '.mh_sequences' that records these sets. When MH's 'sortm' command renames the messages, it also adjusts the contents of the sequences file. I was going to say Since Shlomi's program is going to the work of loading Mail::Box::MH, it might as well use the Mail::Box::MH->rename_message method, which will do the work of modifying the sequences file. But then I looked, and it appears that there is no method for renumbering messages. I had supposed that there would be, because the (humongous) Mail::Box::MH object does indeed contain a subobject representing the sequences. But no, apparently not. My program also ignores this issue. I wanted to say that it would be easy to fix, by replacing the rename($old, $new) calls with system("mh-rename-message +folder $old $new") invoking the MH shell command for renumbering a message. But there is no such MH command either. The idea behind MH was that there should be a lot of little shell utilities which would be easy to combine into more complex utilities using shell scripts or whatever. As a longtime user of MH, my conclusion is that this was only partially successful, partly through misdesign (why no 'mh-rename-message' command?) and partly because the idea has inherent problems. As an example of the inherent problems, consider this common MH locution: scan `pick -from dominus` The 'pick' command scans the current folder and generates a list of all the message numbers of messages sent by dominus. The 'scan' command then scans these messages and prints out a summary. 'pick' often takes a long time to run, since it must read and parse every message in the folder. While 'pick' is running, I have to be careful not to issue any MH commands that will change the curent folder, say by typing the command in another window. If I do, then when the 'scan' runs it will scan the right message numbers but in the wrong folder. 7. H. Dieter Pearcey reminded me: Have you seen Simon's comments re: Mail::* at http://ddtm.simon-cozens.org/~simon/email.html ? The whole Mail::Box stuff in particular frightens me. I read this article last month, but I had forgotten about it. Thanks, Dieter. I would have to agree. (I received Dieter's message *after* I had written about the 70 modules above.) Mail::Box looks to me like a piece of software that was prematurely overwritten. Someone decided to build the Mailbox Class of All Mailbox Classes, but did so without a clear idea of what specific tasks it would be used for. As a result, it has a very beautiful structure, a lot of unnecessary machinery, and it's missing methods that it needs to make it really useful. 8. In the solutions for regular quiz #13, I said: RFC822 address syntax is horrendously complicated and grossly overengineered. But it works well enough for almost all examples that one encounters in practice. (Which is why RFC822 is overengineered.) This sentence was badly composed. The 'it' refers to my crappy parsing function, not to the RFC822 syntax. My idea was that since my crappy function works almost al the time, this shows that the many many complex features of RFC822 syntax are rarely used, and that is what leads me to conclude that it is overengineered. My thanks to Shlomi and to everyone who worked on this quiz in private. I hope to bring the new postmortems for quizzes #14 on Monday.