#HTML# The Perl Hardware Store *********************** Tools You Didn't Know You Needed ================================ M-J. Dominus ++++++++++++ Plover Systems Co. ++++++++++++++++++ mjd-tpc-hardware@plover.com 20 March 2002 +++++++++++++ #HTML#
#HTML#

-------------------------------- Teflon Tape *********** Teflon tape? -------------------------------- #HTML# 1. Schwartzian Transform ************************ o Sorry if everyone knows this already o Too useful to omit o Named after Randal Schwartz o co-author of _Learning Perl_, _Programming Perl_, etc. -------------------------------- 1. Schwartzian Transform ************************ o Sort list of items by some _non-apparent_ feature o Example: Sort filenames by last-modified date o Obvious method is very wasteful: sort { -M $b <=> -M $a } (readdir D); o Calls [[-M]] over and over on the same files o Idea: 1. Construct data structure with both names and dates 2. Sort by date 3. Throw away dates -------------------------------- 1. Schwartzian Transform ************************ @names = readdir D; @names_and_dates = map { { NAME => $_, DATE => -M $_ } } @names; @sorted_names_and_dates = sort { $b->{DATE} <=> $a->{DATE} } @names_and_dates; @sorted_names = map { $_->{NAME} } @sorted_names_and_dates; -------------------------------- 1. Schwartzian Transform ************************ o We can get an easy speed improvement here: o Replace hashes [[ { NAME => $_, DATE => -M $_ } ]] o ...with arrays [[ [ $_, -M $_ ] ]] o Also, we usually eliminate the temporary variables: @sorted_names = map { $_->[0] } sort { $b->[1] <=> $a->[1] } map { [ $_, -M $_ ] } readdir D; o This is the Schwartzian Transform ---------------------------------------------------------------- 1. Schwartzian Transform ************************ o [R[Warning]R]: Do not optimize without benchmarking! User System Total 5.11 + 6.83 = 11.94 Naive sort 7.37 + 0.82 = 8.19 Schwartzian transform o Donald Knuth (noted wizard) says: #HTML#

Premature optimization is the root of all evil.

-------------------------------- 1. Schwartzian Transform ************************ o Well-known to Unix shell programmers: # Sort file names by file size ls -l | sort -n +4 | awk '{print $NF}' # Sort output of SOMETHING from most frequent to least SOMETHING | uniq -c | sort -nr | awk '{$1=""; print}' ================================================================ #HTML# o I first gave this talk at Perl Conference 2 in 1998 o At last, I have an audience that will appreciate this slide -------------------------------- 2. Manual Importing and Exporting ********************************* o There is a commonly-used standard perl module called [[Exporter]] o [[Exporter]] arranges for functions from one package to appear in another o Some people actually know how this works o In ten minutes, you will too o The [[Exporter]] is big and heavy and featureful o It can be useful to avoid it -------------------------------- 2. Manual Importing and Exporting ********************************* Simple Example ============== o The [[main]] package has a [[LOG]] function o You want all your modules to have access to [[LOG]] o This won't work: package MyModule; LOG("That had to hurt!"); o [[Undefined subroutine &MyModule::LOG called at...]] o Non-exporting solution: package MyModule; main::LOG("That had to hurt!"); -------------------------------- 2. Manual Importing and Exporting ********************************* Simple Example ============== o Importing [[LOG]] into [[MyModule]] is a one-liner: package MyModule; *LOG = \&main::LOG; # Import # This works now: LOG("I slaughtered that guy!"); # (It calls main::LOG) o Simple! Convenient! -------------------------------- 2. Manual Importing and Exporting ********************************* A Module That Exports a Function ================================ o More usual is to have the [[LOG]] function _provided_ by a module o If some part of your program wants to issue log messages, it will do: use MyLogger; LOG("Try a bigger gun!"); o Where did [[LOG]] come from? package MyLogger; # Always export `LOG': sub import { # Get name of calling package my $caller = caller; *{$caller . '::LOG'} = \&LOG; } sub LOG { ... } o [[use MyLogger]] automatically calls [[MyLogger::import]] if it exists o This is how [[Exporter]] works -------------------------------- 2. Manual Importing and Exporting ********************************* o Suppose you want to use a module that exports a function o But the function has the wrong name use LWP::Simple; # exports 'get' function sub get { ... } # which is globbered by this 'get' my $html = [R[get(...)]R]; # oops, wrong 'get()' o You can still call [[LWP::Simple::get]] if you like my $html = LWP::Simple::get(...); o But this will save typing: use LWP::Simple [C[()]C]; # exports nothing #* BEGIN { *webget = \&LWP::Simple::get } sub get { ... } my $html = [C[webget]C](...); o Here [[webget]] actually calls [[LWP::Simple::get]] -------------------------------- 2. Manual Importing and Exporting ********************************* Constants ========= o Now we'll see how [[use constant]] works sub import { my $name = shift or return; my $value = shift; my $caller = caller; *{$caller . "::$name"} = sub () { $value }; } o The caller can say use constant PI => 3; $circumference = 2 * PI * $r; o Real module has error checking, list constants, etc. o Real module also loads *1289* lines of code! o (Up 18% since 1998) -------------------------------- 2. Manual Importing and Exporting ********************************* Exporting Variables =================== o You can export anything, not just functions o Useful example: [[DBI]] puts error messages in [[$DBI::errstr]] o Maybe you wish it had a shorter name *ERR = \$DBI::errstr; o Now you can use [[$ERR]] instead -------------------------------- 2. Manual Importing and Exporting ********************************* Provide a synonym for a function ================================ * Your module provides package MyModule; sub no_elements { ... } * Later you realize that [[num_elements]] would have been clearer package MyModule; sub num_elements { ... } #* BEGIN { *no_elements = \&num_elements; } * Now both names will do exactly the same thing * You can change the documentation to say num_elements Returns the number of pending items in the flapdoodle queue no_elements Obsolete synonym for 'num_elements' * Similarly sub unimplemented { ... } for (qw(FIRSTKEY NEXTKEY CLEAR)) { *$_ = \&unimplemented; } -------------------------------- 3. Adding a new Method to a Package *********************************** o Suppose you're using someone else's package o There's a method you wish it has, but it doesn't o For example: my $sth = $dbh->prepare($query) or die "Couldn't prepare '$query': " . $dbh->errmsg; $sth->execute(...) or die "Couldn't execute '$query': " . $sth->errmsg; my @data; while (my $rec = $sth->fetchrow_hashref) { push @data, [@$rec]; } $sth->finish; # Do something with @data o You're sick of repeating the same code over and over o You wish that [[Msql]] supported a [[myquery]] method: @data = $dbh->myquery($query, ...); # dies if necessary # Do something with @data -------------------------------- 3. Adding a new Method to a Package *********************************** o You wish that [[$dbh]] supported a [[myquery]] method o Subclassing is a pain in the butt o Do this instead: sub Msql::myquery { my $dbh = shift; # Just stick the method here, dummy. } o Now you can say @list_of_hashes = $dbh->myquery(...); o [[myquery]] is inherited by [[Msql]]'s subclasses just the way you'd want. o Downside: The programmer in the next cubicle might do the same thing o If you add a lot of new methods, consider making a subclass o Downside: People from those _other_ languages may be appalled o Solution: Ignore them -------------------------------- 4. Semaphore Files ****************** o Let's write a program that will either update or print out a file's contents: $FILE = '/data/hitcounter'; print "Content-type: text/html\n\n"; if (something()) { open F, $FILE or die ...; print while ; } else { open F, "> $FILE" or die ...; my $data = qx{some command}; print F $data; } close F; o Oops, it has a race condition o Better slap a little file locking on it -------------------------------- #HTML# 4. Semaphore Files ****************** #* use Fcntl ':flock'; if (something()) { open F, $FILE or die ...; #* flock F, LOCK_SH; print while ; } else { open F, "> $FILE" or die ...; #* flock F, LOCK_EX; my $data = qx{some command}; print F $data; } ================================================================ o Oops, it's still wrong o The [[open]] clobbers the file contents *before* the file is locked o Maybe we can lock the file before we open it? ... Nope o Maybe the writer can open in append mode, lock the file, seek to the beginning, overwrite the contents, then call [[truncate]] ... o Bah! -------------------------------- 4. Semaphore Files ****************** #HTML# $FILE = '/data/hitcounter'; #* $SEMAPHORE = $FILE . '.lck'; print "Content-type: text/html\n\n"; #* open S, "> $SEMAPHORE" or die ...; if (something()) { flock [C[S]C], LOCK_SH; open F, $FILE or die ...; print while ; } else { flock [C[S]C], LOCK_EX; open F, "> $FILE" or die ...; my $data = qx{some command}; print F $data; } close F; [C[close S;]C] * This solves all sorts of problems * I had a long conversation last week with a guy trying to lock a [[DB_File]] database portably * Believe me, you _don't_ want to hear the details * The upshot: This trick works, and nothing else does ---------------------------------------------------------------- 5. Caching ********** o Makes programs faster o Exchanges space for time o When a cached function is called, its return value is saved o When called again with same arguments, the saved value is returned o Very commonly used o Example: Your CPU caches data it gets from main memory o Example: The Linux kernel caches data it read from the disk o Example: Your web browser caches the pages it gets from the network ---------------------------------------------------------------- 5. Caching ********** o Another example: Converting RGB values to CMYK values: sub cmyk { my ($r, $g, $b) = @_; my ($c, $m, $y) = (1-$r, 1-$g, 1-$b); my $k = $c < $m ? ($c < $y ? $c : $y) : ($m < $y ? $m : $y); # Minimum for ($c, $m, $y) { $_ -= $k } [$c, $m, $y, $k]; } o Many image formats (including GIF) have many pixels that are the same color. o This recomputes the same CMYK values over and over. --------------------------------------------------------------- 5. Caching ********** o Faster version: { my %cmyk; sub cmyk { my $key = join ',' , @_; return $cmyk{$key} if exists $cmyk{$key}; $cmyk{$key} = real_cmyk(@_); } sub real_cmyk { # as before... } } o Code in C, C++, Java, etc. will be similar ----------------------------------------------------------------- 5. Caching ********** Memoizing ========= o That wasn't much work o But it was still more work than it needs to be o Here's how you do it: use Memoize; # Available from CPAN, standard from 5.7.2 memoize 'cmyk'; sub cmyk { ... no changes required ... } o That's all! o I would love to tell you how it works, but I don't have time o In many languages (C, C++, etc.) this is impossible ---------------------------------------------------------------- 5. Caching ********** Memoizing ========= o Suppose you have a function that will take a long time to compute o Cache the results in a database o Run an overnight batch job to populate the database o In the morning, the function is fast o Perl's [[tie]] feature and the [[Memoize]] module make this easy: use DB_File; tie %h, 'DB_File', "cmyk.db", O_RDWR|O_CREAT, 0666 or die ...; use Memoize; memoize 'cmyk', SCALAR_CACHE => [ HASH => \%h ]; for my $r (0 .. 255) { for my $g (0 .. 255) { for my $b (0 .. 255) { cmyk($r, $g, $b); } } } ----------------------------------------------------------------- 5. Caching ********** Memoizing ========= o Memoizing is a really useful tool to have in your toolbox o Program too slow? Try sprinkling in a little memoization. It's cheap and easy. o Need to profile? Try memoizing. If it works, rewrite the function you memoized; if not, try another function. o Worried about recursion inefficiencies? Memoization is often a cheap and effective alternative to rewriting in iterative style. o Continued... ---------------------------------------------------------------- 5. Caching ********** Memoizing ========= o Memoize slow functions like [[gethostbyname]]. o Memoize to a permanent database and speed up your function _forever_. o Same technique can be adapted to make a simple profiler * Or call counter * Or call-graph generator * See Philippe Verdret's [[Hook::PrePostCall]] module o Memoization is _impossible_ in C ---------------------------------------------------------------- 6. Returning a False Value ************************** sub foo { ... return undef; # False } if (@result = foo(...)) { ... } o Oops. [[undef]] is _not_ false in a list context! o [[@result]] has one element, which is [[undef]] if (@result) { ... } # Yes! ---------------------------------------------------------------- 6. Returning a False Value ************************** o Solution: sub foo { ... return; } o Returns [[undef]] in scalar context. o Returns empty list in list context. ---------------------------------------------------------------- 7. Debug Printing of Strings **************************** if (/Kool-Aid$/) { die } o But it didn't die! Why not? o Try the debugger: DB<119> p $_; I want to drink the Gnu/Linux Kool-Aid o Pull your hair out. o Or, instead: DB<119> p "<$_>"; o Oho. o The _terminal program_ should have taken care of this! ---------------------------------------------------------------- 8. Debug Printing of Lists ************************** @t = ('x', ' ', '=', ' ', '3.4', '& ', 'y', '', '=', '', '5') o Now [[print @t]] yields x = 3.4& y=5 o Hard to tell what the list elements are! o [[print "@t"]] is even worse! x = 3.4 & y = 5 o Solution: $" = ')('; print "(@t)"; (x)( )(=)( )(3.4)(& )(y)()(=)()(5) -------------------------------- 9. Selecting n Different Things ******************************* while (keys %h < $n) { $h{select_thing()}++; } @things = keys %h; o In scalar context, [[keys %h]] is super-efficient. o No, it does not count the keys one at a time. -------------------------------- Ouch **** o (Note added 2002: This last slide was a blast from the past for me...) ================================================================ o Did you know that if you don't like Perl's OO semantics you can roll your own? o But I don't have time to show you how o Be sure to nag the conference folks to invite me back if you want to hear about it next year. ================================================================ o Because I liked this talk so much I decided to expand it into a book o In the book, I _will_ show how to build an OO programming system from scratch o And a lot of variations on memoization o And a lot of other useful stuff that didn't fit in the talk o Ask me about the book #HTML# "Please." -------------------------------- Thank You! ********** #IMG# beer.jpg -------------------------------- References ********** o Original version of this talk (with detailed prose notes) http://perl.plover.com/yak/hw1/ o Part 2, from the following year: http://perl.plover.com/yak/hw2/ o I also expanded it to a three-hour "Tricks of the Wizards" class: http://perl.plover.com/yak/tricks/ o More about memoization: http://perl.plover.com/MiniMemoize/ http://perl.plover.com/Memoize/ o About the book: http://perl.plover.com/book/ --------------------------------END