#HTML#
The Perl Hardware Store
***********************
Tools You Didn't Know You Needed
================================
M-J. Dominus
++++++++++++
Plover Systems Co.
++++++++++++++++++
mjd-tpc-hardware@plover.com
20 March 2002
+++++++++++++
#HTML#
#HTML#
--------------------------------
Teflon Tape
***********
Teflon tape?
--------------------------------
#HTML#
1. Schwartzian Transform
************************
o Sorry if everyone knows this already
o Too useful to omit
o Named after Randal Schwartz
o co-author of _Learning Perl_, _Programming Perl_, etc.
--------------------------------
1. Schwartzian Transform
************************
o Sort list of items by some _non-apparent_ feature
o Example: Sort filenames by last-modified date
o Obvious method is very wasteful:
sort { -M $b <=> -M $a } (readdir D);
o Calls [[-M]] over and over on the same files
o Idea:
1. Construct data structure with both names and dates
2. Sort by date
3. Throw away dates
--------------------------------
1. Schwartzian Transform
************************
@names = readdir D;
@names_and_dates =
map { { NAME => $_, DATE => -M $_ } }
@names;
@sorted_names_and_dates =
sort { $b->{DATE} <=> $a->{DATE} }
@names_and_dates;
@sorted_names =
map { $_->{NAME} }
@sorted_names_and_dates;
--------------------------------
1. Schwartzian Transform
************************
o We can get an easy speed improvement here:
o Replace hashes [[ { NAME => $_, DATE => -M $_ } ]]
o ...with arrays [[ [ $_, -M $_ ] ]]
o Also, we usually eliminate the temporary variables:
@sorted_names =
map { $_->[0] }
sort { $b->[1] <=> $a->[1] }
map { [ $_, -M $_ ] }
readdir D;
o This is the Schwartzian Transform
----------------------------------------------------------------
1. Schwartzian Transform
************************
o [R[Warning]R]: Do not optimize without benchmarking!
User System Total
5.11 + 6.83 = 11.94 Naive sort
7.37 + 0.82 = 8.19 Schwartzian transform
o Donald Knuth (noted wizard) says:
#HTML# Premature optimization is the root of all evil.
--------------------------------
1. Schwartzian Transform
************************
o Well-known to Unix shell programmers:
# Sort file names by file size
ls -l | sort -n +4 | awk '{print $NF}'
# Sort output of SOMETHING from most frequent to least
SOMETHING | uniq -c
| sort -nr | awk '{$1=""; print}'
================================================================
#HTML#
o I first gave this talk at Perl Conference 2 in 1998
o At last, I have an audience that will appreciate this slide
--------------------------------
2. Manual Importing and Exporting
*********************************
o There is a commonly-used standard perl module called [[Exporter]]
o [[Exporter]] arranges for functions from one package to appear in another
o Some people actually know how this works
o In ten minutes, you will too
o The [[Exporter]] is big and heavy and featureful
o It can be useful to avoid it
--------------------------------
2. Manual Importing and Exporting
*********************************
Simple Example
==============
o The [[main]] package has a [[LOG]] function
o You want all your modules to have access to [[LOG]]
o This won't work:
package MyModule;
LOG("That had to hurt!");
o [[Undefined subroutine &MyModule::LOG called at...]]
o Non-exporting solution:
package MyModule;
main::LOG("That had to hurt!");
--------------------------------
2. Manual Importing and Exporting
*********************************
Simple Example
==============
o Importing [[LOG]] into [[MyModule]] is a one-liner:
package MyModule;
*LOG = \&main::LOG; # Import
# This works now:
LOG("I slaughtered that guy!");
# (It calls main::LOG)
o Simple! Convenient!
--------------------------------
2. Manual Importing and Exporting
*********************************
A Module That Exports a Function
================================
o More usual is to have the [[LOG]] function _provided_ by a module
o If some part of your program wants to issue log messages, it will do:
use MyLogger;
LOG("Try a bigger gun!");
o Where did [[LOG]] come from?
package MyLogger;
# Always export `LOG':
sub import {
# Get name of calling package
my $caller = caller;
*{$caller . '::LOG'} = \&LOG;
}
sub LOG { ... }
o [[use MyLogger]] automatically calls [[MyLogger::import]] if it exists
o This is how [[Exporter]] works
--------------------------------
2. Manual Importing and Exporting
*********************************
o Suppose you want to use a module that exports a function
o But the function has the wrong name
use LWP::Simple; # exports 'get' function
sub get { ... } # which is globbered by this 'get'
my $html = [R[get(...)]R]; # oops, wrong 'get()'
o You can still call [[LWP::Simple::get]] if you like
my $html = LWP::Simple::get(...);
o But this will save typing:
use LWP::Simple [C[()]C]; # exports nothing
#* BEGIN { *webget = \&LWP::Simple::get }
sub get { ... }
my $html = [C[webget]C](...);
o Here [[webget]] actually calls [[LWP::Simple::get]]
--------------------------------
2. Manual Importing and Exporting
*********************************
Constants
=========
o Now we'll see how [[use constant]] works
sub import {
my $name = shift or return;
my $value = shift;
my $caller = caller;
*{$caller . "::$name"} = sub () { $value };
}
o The caller can say
use constant PI => 3;
$circumference = 2 * PI * $r;
o Real module has error checking, list constants, etc.
o Real module also loads *1289* lines of code!
o (Up 18% since 1998)
--------------------------------
2. Manual Importing and Exporting
*********************************
Exporting Variables
===================
o You can export anything, not just functions
o Useful example: [[DBI]] puts error messages in [[$DBI::errstr]]
o Maybe you wish it had a shorter name
*ERR = \$DBI::errstr;
o Now you can use [[$ERR]] instead
--------------------------------
2. Manual Importing and Exporting
*********************************
Provide a synonym for a function
================================
* Your module provides
package MyModule;
sub no_elements { ... }
* Later you realize that [[num_elements]] would have been clearer
package MyModule;
sub num_elements { ... }
#* BEGIN { *no_elements = \&num_elements; }
* Now both names will do exactly the same thing
* You can change the documentation to say
num_elements Returns the number of pending items in the
flapdoodle queue
no_elements Obsolete synonym for 'num_elements'
* Similarly
sub unimplemented { ... }
for (qw(FIRSTKEY NEXTKEY CLEAR)) {
*$_ = \&unimplemented;
}
--------------------------------
3. Adding a new Method to a Package
***********************************
o Suppose you're using someone else's package
o There's a method you wish it has, but it doesn't
o For example:
my $sth = $dbh->prepare($query)
or die "Couldn't prepare '$query': " . $dbh->errmsg;
$sth->execute(...)
or die "Couldn't execute '$query': " . $sth->errmsg;
my @data;
while (my $rec = $sth->fetchrow_hashref) {
push @data, [@$rec];
}
$sth->finish;
# Do something with @data
o You're sick of repeating the same code over and over
o You wish that [[Msql]] supported a [[myquery]] method:
@data = $dbh->myquery($query, ...); # dies if necessary
# Do something with @data
--------------------------------
3. Adding a new Method to a Package
***********************************
o You wish that [[$dbh]] supported a [[myquery]] method
o Subclassing is a pain in the butt
o Do this instead:
sub Msql::myquery {
my $dbh = shift;
# Just stick the method here, dummy.
}
o Now you can say
@list_of_hashes = $dbh->myquery(...);
o [[myquery]] is inherited by [[Msql]]'s subclasses just the way you'd want.
o Downside: The programmer in the next cubicle might do the same thing
o If you add a lot of new methods, consider making a subclass
o Downside: People from those _other_ languages may be appalled
o Solution: Ignore them
--------------------------------
4. Semaphore Files
******************
o Let's write a program that will either update or print out a file's contents:
$FILE = '/data/hitcounter';
print "Content-type: text/html\n\n";
if (something()) {
open F, $FILE or die ...;
print while ;
} else {
open F, "> $FILE" or die ...;
my $data = qx{some command};
print F $data;
}
close F;
o Oops, it has a race condition
o Better slap a little file locking on it
--------------------------------
#HTML#
4. Semaphore Files
******************
#* use Fcntl ':flock';
if (something()) {
open F, $FILE or die ...;
#* flock F, LOCK_SH;
print while ;
} else {
open F, "> $FILE" or die ...;
#* flock F, LOCK_EX;
my $data = qx{some command};
print F $data;
}
================================================================
o Oops, it's still wrong
o The [[open]] clobbers the file contents *before* the file is locked
o Maybe we can lock the file before we open it? ... Nope
o Maybe the writer can open in append mode, lock the file, seek to the beginning, overwrite the contents, then call [[truncate]] ...
o Bah!
--------------------------------
4. Semaphore Files
******************
#HTML#
$FILE = '/data/hitcounter';
#* $SEMAPHORE = $FILE . '.lck';
print "Content-type: text/html\n\n";
#* open S, "> $SEMAPHORE" or die ...;
if (something()) {
flock [C[S]C], LOCK_SH;
open F, $FILE or die ...;
print while ;
} else {
flock [C[S]C], LOCK_EX;
open F, "> $FILE" or die ...;
my $data = qx{some command};
print F $data;
}
close F; [C[close S;]C]
* This solves all sorts of problems
* I had a long conversation last week with a guy trying to lock a [[DB_File]] database portably
* Believe me, you _don't_ want to hear the details
* The upshot: This trick works, and nothing else does
----------------------------------------------------------------
5. Caching
**********
o Makes programs faster
o Exchanges space for time
o When a cached function is called, its return value is saved
o When called again with same arguments, the saved value is returned
o Very commonly used
o Example: Your CPU caches data it gets from main memory
o Example: The Linux kernel caches data it read from the disk
o Example: Your web browser caches the pages it gets from the network
----------------------------------------------------------------
5. Caching
**********
o Another example: Converting RGB values to CMYK values:
sub cmyk {
my ($r, $g, $b) = @_;
my ($c, $m, $y) = (1-$r, 1-$g, 1-$b);
my $k = $c < $m ? ($c < $y ? $c : $y)
: ($m < $y ? $m : $y); # Minimum
for ($c, $m, $y) { $_ -= $k }
[$c, $m, $y, $k];
}
o Many image formats (including GIF) have many pixels that are the same color.
o This recomputes the same CMYK values over and over.
---------------------------------------------------------------
5. Caching
**********
o Faster version:
{
my %cmyk;
sub cmyk {
my $key = join ',' , @_;
return $cmyk{$key} if exists $cmyk{$key};
$cmyk{$key} = real_cmyk(@_);
}
sub real_cmyk {
# as before...
}
}
o Code in C, C++, Java, etc. will be similar
-----------------------------------------------------------------
5. Caching
**********
Memoizing
=========
o That wasn't much work
o But it was still more work than it needs to be
o Here's how you do it:
use Memoize; # Available from CPAN, standard from 5.7.2
memoize 'cmyk';
sub cmyk { ... no changes required ... }
o That's all!
o I would love to tell you how it works, but I don't have time
o In many languages (C, C++, etc.) this is impossible
----------------------------------------------------------------
5. Caching
**********
Memoizing
=========
o Suppose you have a function that will take a long time to compute
o Cache the results in a database
o Run an overnight batch job to populate the database
o In the morning, the function is fast
o Perl's [[tie]] feature and the [[Memoize]] module make this easy:
use DB_File;
tie %h, 'DB_File', "cmyk.db", O_RDWR|O_CREAT, 0666 or die ...;
use Memoize;
memoize 'cmyk', SCALAR_CACHE => [ HASH => \%h ];
for my $r (0 .. 255) {
for my $g (0 .. 255) {
for my $b (0 .. 255) {
cmyk($r, $g, $b);
}
}
}
-----------------------------------------------------------------
5. Caching
**********
Memoizing
=========
o Memoizing is a really useful tool to have in your toolbox
o Program too slow? Try sprinkling in a little memoization. It's cheap and easy.
o Need to profile? Try memoizing. If it works, rewrite the function you memoized; if not, try another function.
o Worried about recursion inefficiencies? Memoization is often a cheap and effective alternative to rewriting in iterative style.
o Continued...
----------------------------------------------------------------
5. Caching
**********
Memoizing
=========
o Memoize slow functions like [[gethostbyname]].
o Memoize to a permanent database and speed up your function _forever_.
o Same technique can be adapted to make a simple profiler
* Or call counter
* Or call-graph generator
* See Philippe Verdret's [[Hook::PrePostCall]] module
o Memoization is _impossible_ in C
----------------------------------------------------------------
6. Returning a False Value
**************************
sub foo {
...
return undef; # False
}
if (@result = foo(...)) { ... }
o Oops. [[undef]] is _not_ false in a list context!
o [[@result]] has one element, which is [[undef]]
if (@result) { ... } # Yes!
----------------------------------------------------------------
6. Returning a False Value
**************************
o Solution:
sub foo {
...
return;
}
o Returns [[undef]] in scalar context.
o Returns empty list in list context.
----------------------------------------------------------------
7. Debug Printing of Strings
****************************
if (/Kool-Aid$/) { die }
o But it didn't die! Why not?
o Try the debugger:
DB<119> p $_;
I want to drink the Gnu/Linux Kool-Aid
o Pull your hair out.
o Or, instead:
DB<119> p "<$_>";
o Oho.
o The _terminal program_ should have taken care of this!
----------------------------------------------------------------
8. Debug Printing of Lists
**************************
@t = ('x', ' ', '=', ' ', '3.4', '& ', 'y', '', '=', '', '5')
o Now [[print @t]] yields
x = 3.4& y=5
o Hard to tell what the list elements are!
o [[print "@t"]] is even worse!
x = 3.4 & y = 5
o Solution:
$" = ')(';
print "(@t)";
(x)( )(=)( )(3.4)(& )(y)()(=)()(5)
--------------------------------
9. Selecting n Different Things
*******************************
while (keys %h < $n) {
$h{select_thing()}++;
}
@things = keys %h;
o In scalar context, [[keys %h]] is super-efficient.
o No, it does not count the keys one at a time.
--------------------------------
Ouch
****
o (Note added 2002: This last slide was a blast from the past for me...)
================================================================
o Did you know that if you don't like Perl's OO semantics you can roll your own?
o But I don't have time to show you how
o Be sure to nag the conference folks to invite me back if you want to hear about it next year.
================================================================
o Because I liked this talk so much I decided to expand it into a book
o In the book, I _will_ show how to build an OO programming system from scratch
o And a lot of variations on memoization
o And a lot of other useful stuff that didn't fit in the talk
o Ask me about the book
#HTML# "Please."
--------------------------------
Thank You!
**********
#IMG# beer.jpg
--------------------------------
References
**********
o Original version of this talk (with detailed prose notes)
http://perl.plover.com/yak/hw1/
o Part 2, from the following year:
http://perl.plover.com/yak/hw2/
o I also expanded it to a three-hour "Tricks of the Wizards" class:
http://perl.plover.com/yak/tricks/
o More about memoization:
http://perl.plover.com/MiniMemoize/
http://perl.plover.com/Memoize/
o About the book:
http://perl.plover.com/book/
--------------------------------END