Perl Program Repair Shop
************************
#HTML#
Mark Jason Dominus
++++++++++++++++++
Plover Systems Co.
++++++++++++++++++
mjd@plover.com
Chicago.pm Edition
++++++++++++++++++
v0.1 (Feb, 2006)
++++++++++++++++
* Slides at:
http://perl.plover.com/flagbook/yak/Chi/
http://perl.plover.com/flagbook/yak/Chi/Chi.tgz
http://perl.plover.com/flagbook/yak/Chi/Chi.zip
* Sample source code at:
http://perl.plover.com/flagbook/yak/Chi/sources/
----------------------------------------------------------------
What is the class about?
***********************
* Programmers write the first thing that comes into their heads
* That's OK
* But you should make an editing pass afterwards to clean it up
* There are a couple of dozen really common mistakes most people make
* They are easy to recognize and easy to repair
* Even if you fix just these errors, the code becomes _drastically_ better
----------------------------------------------------------------
#RTIMG# redflag2
Red Flags
*********
* A red flag is a visible sign in the code that something might be suboptimal
================================================================
* *Notice*: [R[might be]R]
* *Notice*: [R[suboptimal]R], not [R[wrong]R]
================================================================
* It is _not_ a sure sign that something is wrong
* NOT NOT NOT NOT
================================================================
* Instead, it's a sign that you should pause and reflect
* Ask, "is this really written in the best way?"
* Try to write it a different way (or two, or three)
* Compare and evaluate
* Decide
* And then _perhaps_ replace the code with a different version
----------------------------------------------------------------
Today's examples
****************
* [[Chicago.pm]] folks sent me code samples
* (Late)
* None of them were really suitable, for various reasons
* Too short
* Selectively edited
* Too obscure
* Written in Java (!!!)
* I'll do what I can
* Then we'll have a longer example form somewhere else
----------------------------------------------------------------
#LTIMG# Scream-narrow.jpg
Horrors
********
* But I know you came here to see some horrors
* So here's one
----------------------------------------------------------------
Introduction
************
foreach $array (values %orbital_elements_hash)
{
my $index = 0;
my $orbital_element = $xmldocument->createElement("orbital-element");
foreach $dataitem (@$array)
...
* OK, nothing strange about that
* It's the contents of the inner [[foreach]] loop that are amazing
----------------------------------------------------------------
The Loaded Uzi
**************
foreach $dataitem (@$array)
{
$keplerian_elements->appendChild($orbital_element);
if($index == 0)
{
print "$dataitem ";
}
if($index == 1)
{
my $nasa_catalog_number = $xmldocument->createElement("nasa-catalog-number");
$orbital_element->appendChild($nasa_catalog_number);
$nasa_catalog_number->appendChild($xmldocument->createTextNode($dataitem));
}
elsif($index == 2)
{
...
----------------------------------------------------------------
The Loaded Uzi
**************
foreach $dataitem (@$array)
{
......
elsif($index == 15)
{
my $mean_motion = $xmldocument->createElement("mean-motion");
$orbital_element->appendChild($mean_motion);
$mean_motion->appendChild($xmldocument->createTextNode($dataitem));
}
elsif($index == 16)
{
# revolution-at-epoch
# first-derivative-of-mean-motion
}
#* $index++;
}
----------------------------------------------------------------
The Loaded Uzi
**************
elsif($index == 6)
{
# Drag
}
----------------------------------------------------------------
The Loaded Uzi
**************
#IMG# uzi.jpg
#HTML#
Doing linear scans over an associative array is like trying to club someone to death with a loaded Uzi. #HTML#
(Larry Wall) ---------------------------------------------------------------- The Loaded Uzi ************** * I've seen this before, but never on a regular array * Usually, you see something like this: $input = _something_; for $key (%hash) { if ($key eq $input) { do_something($hash{$key}); } } * Which should of course be: $input = _something_; do_something($hash{$input}) if exists $hash{$input}; ---------------------------------------------------------------- The Loaded Uzi ************** * In extreme cases you see: for $key (keys %hash) { if ($key eq 'name') { check_name($hash{$key}); elsif ($key eq 'address') { check_address($hash{$key}); } elsif ... } * Of course, it should be: check_name($hash{name}); check_address($hash{address}); ... ---------------------------------------------------------------- The Loaded Uzi ************** * Analogously, we can replace this: for (@$items) { if ($index == 0) { _action_0 } elsif ($index == 1) { _action_1 } elsif ($index == 2) { _action_2 ... } elsif ($index == 16) { _action_16 } $index++; } * With this: _action_0 _action_1 _action_2 ... _action_16 ---------------------------------------------------------------- #IMG# redflags-crossed [R[Loop counter variables]R] **************************** * But surprisingly, this very weird example actually demonstrates a red flag * And a really nice one * A variable whose only job is to track the number of iterations through a loop * This is a good example * Because it's obvious that it isn't always the wrong thing to do * For instance: for my $item (@items) { $counter++; print "$counter: $item\n"; } * But it's _usually_ the wrong thing to do ---------------------------------------------------------------- #IMG# redflags-crossed [R[C-style for loop]R] ********************** * Most frequently, you see these in the code of ex-C-programmers: Subject: Re: How to compare two files and get the differences ? Message-Id: <7o5128$cp5$1@news1.kornet.net> for ($i=0;$i<$sizelines;$i++) { ($f1, $categorybackup, $f2, $f3, $f4, $f5, $f6, $f7) = split ('\|', $lines[$i]); ... } * Better: for $line (@lines) { my ($categorybackup) = (split /\|/, $line)[1]; ... } ---------------------------------------------------------------- Declarations ************ * Speaking of ex-C-programmers, here's another example you folks sent: sub pre_process_file { my $dbh = shift; my $file = shift; my $provider_id = shift; my $prv = shift; my $sql; my $sth; my $cmd; my %mail; my $provider_name; my $provider; my $internal_file; my $file_type; my $raw_layout; my $incoming_layout; my $valid_layout; my $zip; my @zip_members; my $member; my $data_file; my $data_out_path; my $zip_file; # input zip filename path information my $bad_file; # bad records returned from sort my $dup_file; # contains any duplicate records from exclude my $unq_file; # contains unique records from exclude my $inv_file; # contains records with invalid phone numbers my $good_file; # contains records that are ready to be loaded my $in_file; # basename of the input file my $num_in = 0; my $num_bad = 0; my $num_dupe = 0; my $num_good = 0; my $file_info_id; my @flds = qw(provider_name provider internal_file file_type raw_layout incoming_layout valid_layout); ($provider_name, $provider, $internal_file, $file_type, $raw_layout, $incoming_layout, $valid_layout) = @{$prv->{$provider_id}}{@flds}; # If file is zipped, then unzip it if ($file =~ /\.zip$/i) * Wow! * That's *42* lines of declarations ---------------------------------------------------------------- Declarations ************ * Even in C, we could have done this better * In C, you can declare variables at the head of _any_ block: /* C */ if (buf = malloc(bufsize)) { char *p = buf; ... } * In Perl, you can declare variables _anywhere_ ---------------------------------------------------------------- Declarations ************ * [G[Declare variables near where they are used]G] * This tends to keep the scope small * Easier on the maintenance programmer * So for example, instead of: my $inv_file; ... 83 lines omitted $inv_file = $data_out_path . ".inv"; * Just use: [C[my]C] $inv_file = $data_out_path . ".inv"; * This also eliminates a line of useless clutter ---------------------------------------------------------------- * That clutter is often quite easy to eliminate * This program has: my $provider_name; my $provider; my $internal_file; my $file_type; my $raw_layout; my $incoming_layout; my $valid_layout; # ... 24 lines omitted ($provider_name, $provider, $internal_file, $file_type, $raw_layout, $incoming_layout, $valid_layout) = @{$prv->{$provider_id}}{@flds}; * Why not trim this? [C[my]C] ($provider_name, $provider, $internal_file, $file_type, $raw_layout, $incoming_layout, $valid_layout) = @{$prv->{$provider_id}}{@flds}; ---------------------------------------------------------------- Declarations ************ * More trimming is possible * For example: my $member; * This variable is not used anywhere * Maybe it was used in the past, but its code was removed * The declaration lingers on, like a cold sore #IMG# longshoreman.jpg * or the Longshoremen's unions * If the declaration is near the use, it is likely to be removed at the appropriate time ---------------------------------------------------------------- Declaration near use ******************** * Here's another part of ther same program: my @now = localtime(); my $YY = strftime( "%y", @now ); my $YYYY = strftime( "%Y", @now ); my $MM = strftime( "%m", @now ); my $MMM = uc( strftime( "%b", @now ) ); my $DD = strftime( "%d", @now ); ... sub archive_file { ... my $dir; $dir = "$archive_dir/$YYYY/$MM/$DD/$provider"; * [[$YYYY]], [[$MM]], and [[$DD]] are not used anywhere else: my @now = localtime(); my $YY = strftime( "%y", @now ); my $MMM = uc( strftime( "%b", @now ) ); ... sub archive_file { ... #* my $dir = strftime "$archive_dir/%Y/%m/%d/$provider", localtime(); ---------------------------------------------------------------- Declaration near use ******************** * What about these? my @now = localtime(); my $YY = strftime( "%y", @now ); my $MMM = uc( strftime( "%b", @now ) ); * Oh, they're not used anywhere at all * Three more variables gone ---------------------------------------------------------------- Redirections ************ * I said I thought this code had been written by an ex-C-programmer * Why not an ex-shell programmer? * Well, mainly because shell programmers never declare _anything_ * But also: system("unzip -v '$incoming_dir/$file' 2>&1 >/dev/null"); * Here the intent was to discard standard output and standard error * But it only discards stdout * [[2>&1]] means "send [[stderr]] to wherever [[stdout]] is going" * [[>/dev/null]] means "send [[stdout]] to [[/dev/null]]" * The order is important here * It should have been: system("unzip -v '$incoming_dir/$file' [C[>/dev/null 2>&1]C]"); * That's not a red flag; just a bug ---------------------------------------------------------------- Ouch **** map { @{ $prv->{$_->[0]} }{@flds} = @$_[1..$#{$_}] } @$ar; * The C programmer is drunk with power! for $rec (@$ar) { my $key = shift @$rec; my %h; @h{@flds} = @$rec; $prv->{$key} = \%h; } ---------------------------------------------------------------- Ex-C-programmers **************** * I said I thought this code had been written by an ex-C-programmer * That was exhibit A * Here's B: sub main { ... } main(); ---------------------------------------------------------------- Ex-C-programmers **************** * Here's exhibit C: * The last line in the program is: exit; * The state rests, Your Honor. ---------------------------------------------------------------- Repeated Code ************* * Here's another chunk from another program: sub filetype { my $filename = shift; return "cc" if $filename =~ /\.[ch](pp)?$/; return "perl" if $filename =~ /\.(pl|pm|pod|tt|ttml|t)$/; return "php" if $filename =~ /\.(phpt?|html?)$/; return "python" if $filename =~ /\.py$/; return "ruby" if $filename =~ /\.rb$/; return "shell" if $filename =~ /\.[ckz]?sh$/; return "sql" if $filename =~ /\.(sql|ctl)$/; ... * Perl has a really nice idiom for this sort of thing: sub filetype { my $filename = shift; #* for ($filename) { return "cc" if /\.[ch](pp)?$/; return "perl" if /\.(pl|pm|pod|tt|ttml|t)$/; return "php" if /\.(phpt?|html?)$/; return "python" if /\.py$/; return "ruby" if /\.rb$/; return "shell" if /\.[ckz]?sh$/; return "sql" if /\.(sql|ctl)$/; ---------------------------------------------------------------- Repeated code ************* * Here's another such, from yet another program: my %stationInfo = shift; my $callsign = $stationInfo{Callsign}; my $timestamp =$stationInfo{LastHeard}; my $position = $stationInfo{Position}; my $status = $stationInfo{Status}; * Better: my %stationInfo = shift; my ($callsign, $timestamp, $position, $status) = @stationInfo{qw( Callsign LastHeard Position Status)}; ---------------------------------------------------------------- Unnecessary variables ********************* my %stationInfo = shift; my ($callsign, $timestamp, $position, $status) = @stationInfo{qw( Callsign LastHeard Position Status)}; * But actually, in this case we can do better * This code is immediately followed by: print $output "Callsign: $callsign\n"; print $output "Timestamp: $timestamp\n"; print $output "Position (Latitude & Longitude), Region: $position\n"; print $output "Status: $status\n"; #HTML#
* [R[Use immediately follows assignment]R]
print $output "Callsign: [C[$stationInfo{Callsign}]C]\n";
print $output "Timestamp: [C[$stationInfo{LastHeard}]C]\n";
print $output "Position (Latitude & Longitude), Region: [C[$stationInfo{Position]C]\n";
print $output "Status: [C[$stationInfo{Status}]C]\n";
* Then we can lose the four extraneous variables and four lines of
code
----------------------------------------------------------------
Unnecessary variables
*********************
* Similarly, we have:
my $userAgent = LWP::UserAgent->new;
my $request = $userAgent->request($uable);
* This becomes:
my $request = LWP::UserAgent->new->request($uable);
----------------------------------------------------------------
Unnecessary variables
*********************
* And similarly:
sub FindUXml
{
my $queryCallsign = shift;
my %aprsInfo = FindU($queryCallsign);
my $xmlTree = CreateStationXml(%aprsInfo)->as_XML;
return $xmlTree;
}
* Perhaps:
sub FindUXml
{
my $queryCallsign = shift;
return CreateStationXml(FindU($queryCallsign))->as_XML;
}
* We could get rid of [[$queryCallsign]] too:
sub FindUXml
{
return CreateStationXml(FindU([C[shift]C]))->as_XML;
}
* What do you think?
----------------------------------------------------------------
Darn!
*****
* That's all I had time to prepare for tonight
* The rest concerns other people's code
----------------------------------------------------------------
Some Miscellaneous Red Flags
****************************
* [[\"]]
* [[print]] [[print]] [[print]] [[print]] [[print]]
* Many very long strings
* C-style [[for]] loop
* File-Scope [[my]] Variables
* Excessively decorated comments
* Loop counter variables
* Single scalar variable in quotes
* Array Length variables
* Unnecessary Shell Calls
* The [[swswsw]] problem
----------------------------------------------------------------
#IMG# redflags-crossed
[R[\"]R]
********
#HTML#
Subject: Please Help !!!
Message-Id: <357F9D31.652AA817@thnet.com>
print "Content-type: text/html\n\n";
print "\n\n";
print " \n";
...
#HTML#
### .. this cesspit ...
* This continued for *63* lines
* There are two red flags here
----------------------------------------------------------------
#IMG# redflags-crossed
[R[\"]R]
********
#HTML#
print "\n";
print "
| ||||||||||||