Perl Lines of the Day from September, 1998

28 September, 1998


  $selected = 'selected' if  ($in{State} || $uinfo{state}) eq $state ;

A program is generating an HTML page that includes a menu where they can select their home state. The page should have the appropriate state pre-checked, as a convenience to the user. The page might be displayed as a result of subbmitting a similar page (or even the same page, if there was an error) or it might not. In the firmer case, the program has a hash %in of form input data; in any case, it has a hash %uinfo of user information from a database. The context of the Line of the day is something like this:

  print qq{<SELECT NAME="state">\n};
  foreach $state (@states) {
    $selected = 'selected' if  ($in{State} || $uinfo{state}) eq $state;
    print qq{  <OPTION VALUE="$state" $selected>\n};  
  }
  print qq{</SELECT>\n};

We're generating a line of HTML that looks like this:

  <option value="PA" selected>

Where selected will be omitted for all but one of the options. How do we decide whether to include the selected attribute? The Line of the Day is a peculiar way to write this, although I didn't realize that until after I had written it. A more normal way would be:

  print qq{<SELECT NAME="state">\n};
  my $selected_state = $in{State} || $uinfo{state};
  foreach $state (@states) {
    $selected = 'selected' if  $selected_state eq $state;
    print qq{  <OPTION VALUE="$state" $selected>\n};  
  }
  print qq{</SELECT>\n};

If there is a prior form submission that includes the state, we look at that to decide which item to select; if not we default to the user information from the database and select the item we find there.

The version without the explicit $selected_state variable is the one that occurred to me first, and I was surprised by how much more difficult it was to explain and to understand.

24 September, 1998


  0 + ($ao ? ($bo ? $ao * $bo * ($ao-$bo) : $ao) : $bo);

I once wrote a CGI program that needed to display items in a certain order. Actually, what the client asked for was to be able to specify that a particular item, that month's featured product, would be displayed first. I knew it wouldn't stop at that, and that they'd probably want to be able to specify the order of the first several products, followed by the rest of the products in no particular order.

I had the client make a new column in their product database called `order'; I said ``To make an item appear first, put in `1' as the order; to make it appear second, put in `2', and so on.'' If there was no order, the item would appear after all the ordered items.

While I was writing that, it occurred to me that they might one day want to be able to force an item to appear last. Just leaving out the order number wasn't enough, because those items all appeared together at the end in a random order. And specifying an order of 1,000,000 wouldn't work because even an item with order 1,000,000 would appear before the items with no order at all. So I added the feature that negative order numbers specified a position from the end.

The Line of the Day is the code from the sort comparison function that sorts items into order. The complete function is:

  sub compare_products {
    my ($ao, $bo) = ($a->{order} + 0, $b->{order} + 0);
    $ao ? ($bo ? $ao * $bo * ($ao - $bo) : -$ao) : $bo;
  }

A `product' here is a reference to a hash that includes the product's order, name, price, and other items. THis comparison function imposes the following order on the products:

1 2 3 4 ... 0 ... -4 -3 -2 -1

It's interesting for at least two reasons. One is the use of an arithmetic computation as a return from a sort comparator. Arithmetic computations in sort comparators are more common in C than in Perl, because C has no <=> operator. You never need to have <=>, because simple subtraction does the same thing, and that is what the C programmers do:

   /* Compare two numbers for sort order */
   int num_cmp(void *n1, void *n2) {
     return *(double *)n1 - *(double *)n2;  /* Just like Perl's <=>. */
   }

But in Perl, because there is a special <=>, we tend to forget that what it does is basically subtraction.

The other interesting thing here is that I have to explicitly add 0 to the orders. Otherwise, if a product had no order at all, and I returned it without converting it to a number first, I might be return the empty string to sort, and Perl's sort, unlike the rest of Perl, does not recognize the empty string as a synonym for 0.

15 September, 1998


  $line =~ s/${\ quotemeta '$a'}/b/;

Is this line a joke, or not? I'm not sure.

Someone showed up in comp.lang.perl.misc and wanted to know how to write a s/// to replace the literal string $a with something else, say b. He'd tried all sorts of plausible things, and none of them worked. For example, this obviously doesn't work:

  s/$a/b/;

Because patterns are interpolated like double-quoted strings, so the $a is expanded instead of being taken literally. The guy asking the question had tried all sorts of things:

  s/$a/b/;
  s'$a'b';
  s/\Q$a\E/b/;
  s'\Q$a\E'b';

None of these work! s''' prevents double-quote-like interpolation on the right-hand side, in the subsitition string rather than in the pattern. And the ones with \Q...\E don't work because the value of $a is interpolated before the \Q...\E escaping takes place.

Of course, what he wanted was just this:

  s/\$a/b/;

That's what you get if you apply the \Q...\E in your brain, and then insert the result from your brain into the program.

Anyway, that raises the question: What if you can't (or won't) do quotemeta in your head? What then? In that case, you fall back on the very general solution:

  $pat = quotemeta '$a';
  s/$pat/b/;

The recipe here is: ``To match any literal string, first quotemeta it and store the result in a variable. Then substitute the contents of the variable.''

Of course, you always get some whiner who wants to do it in one line. There's a general recipe for inlining any code at all into a double-quoted string, and that's what I used here to get the Line of the Day.

5 September, 1998


  if (defined $eq
      ? $eq->($a->[-$i], $b->[-$j])
      : $a->[-$i] eq $b->[-$j])
     {

I included this because I liked the way the condition of the if was itself a conditional ?: expression.

This line appears in my implementation of diff. The function it's in accepts an optional parameter, which is a reference to a function, $eq. $eq is used to compare certain items if you specify it, and it defaults to the builtin eq function.

One way to implement this would have been by doing something like this:

  my $eq = shift || \&default_eq;

  ...

  sub default_eq { $_[0] eq $_[1] }

But then the user would have to pay for the subroutine call for default_eq even if they didn't specify their own $eq function, and that seemed like a shame.

So instead, I inlined the test for $eq. If it's there, the if condition invokes it on the appropriate arguments, and otherwise, it just uses the builtin eq without calling any subroutines.

Whether this is actually a performance improvement over the subroutine version is not clear.

3 September, 1998


  @header_lines  = split /^(?!\s)/m, $HEADER;

I once asked Abigail `What use are negative lookahead assertions, anyway?' And she immediately came up with this extremely useful use. I was impressed.

What does this do? If $HEADER contains the header of a mail message, it splits the header into lines. That may seem easy, but mail headers are a little more complicated than you might think. Just doing split /\n/m is not enough, because mail headers sometimes include continuation lines like this:

	Return-Path: <atteson+@jkim2.biology.yale.edu>
	Received: from jkim2.biology.yale.edu (atteson+@130.132.32.49)
	  by plover.com with SMTP; 3 Sep 1998 01:43:33 -0000
	Received: (from atteson@localhost)
		by jkim2.biology.yale.edu (8.8.7/8.8.7) id VAA11540;
		Wed, 2 Sep 1998 21:41:48 -0400
	Date: Wed, 2 Sep 1998 21:41:48 -0400
	Message-Id: <199809030141.VAA11540@jkim2.biology.yale.edu>
	From: Kevin Atteson <atteson+@jkim2.biology.yale.edu>
	To: mjd@plover.com
	Subject: going live

Those indented lines aren't new headers; the indentation indicates that they are continuations of the previous lines. The first Received: header doesn't end at the ...130.132.32.49); it continues onto the next line, and ends at the -0000. Similarly, the second Received header spans three lines and ends at -0400.

Reading in a mail message is very easy:

  { local $/ = "";
    $HEADER = <>;
    local $/ = undef;
    $BODY = <>;
  }

Now the header is all in one place, and you might like to split it up into lines. The Line of the Day is a good way to do that. It does a split. What is the split delimiter? It's a little subtle. Normally a split delimiter is a string like : or a pattern like \s+. In that case, split discards the delimiters and returns the stuff in between.

In this case, the delimiter is the empty string, so split won't discard anything. But it's not just any empty string; it's a pattern that will only match the empty string in certain places: ^ forces it to match only empty strings that are at the beginning of a line, and (?!\s) forces it to match only empty strings that are not followed by white space. Thus: A new header line begins whenever there's a new line that does not begin with white space.

What good was the ?! here? We could have gotten a similar effect with /^\S/ instead. But then the program would have the wrong behavior, although it would split in the right places. But the split delimiter wouldn't be the empty string any more; instead, it would be a single character at the beginning of each header. split throws away the delimiters, so this split would discard the first character from each line, leaving us with items like ubject: and ate: in your array.

1 September, 1998


  sub untaint {(keys%{{$_[0],0}})[0]}  # `The Dunwich Horror'

When you run Perl with the -T flag, it is in taint mode. This means that Perl keeps track of whether the data in your program is trustworthy, and diagnoses and aborts attempts to perform unsafe operations with untrustworthy data. Data that came originally from the user or which might be under the control of the user are called tainted, and are considered untrustworthy.

It sometimes happens that you need to use tainted data in an unsafe operation, presumably after validating and checking it carefully. To prevent Perl from aborting your program, you must first launder or untaint the data. The usual way to do that is something like this:

  sub untaint { $_[0] =~ /(.*)/;
                $1 }

Data that's copied to the $1, $2... variables is always untainted. According to the perlsec manual page, this is the only way to launder tainted data.

Well, not quite. The taintedness of data is stored in part of Perl's SV structure. But Perl's hash keys aren't SVs; they're regular strings. They can't be numbers, or references, or undef, or blessed, or have any of the other special properties that Perl scalars have, and in particular they can't be tainted.

That leads us to the Line of the Day: An alternative recipe for taint laundering: Install the data to be laundered as a key in a hash, and then retrieve it again with `keys':

  sub untaint {
    my $data = shift;
    my %h = ($data => 0);
    my ($untainted) = keys %h;
    $untainted;
  }

The line of the day is a compressed version of the same thing.

Note: When I asked on p5p about this, Larry said not to use it, because hash keys might become SVs someday. But this is worth remembering anyway, because some day you might use untrustworthy data by accident in a way you hadn't meant to, and Perl might not catch your mistake because you inadvertently laundered the data by using it as a hash key.

Return to: Universe of Discourse main page | What's new page | Perl Paraphernalia | Line of the Day

mjd-perl-lotd@plover.com