Sample solutions and discussion Perl Expert Quiz of The Week #1 (20021016) Write a subroutine, 'subst', which gets a string argument, $s. It should search $s and replace any occurrences of "$1" with the current value of $1, any occurrences of "$2" with the current value of $2, and so on. For example, if $1, $2, and $3 happen to be "dogs", "fish" and "carrots", then subst('$2, $1 and $3') should return "fish, dogs, and carrots" I thought this problem would be harder than it was. The obvious way to analyze the argument string (let's call it the 'template') is to use a regex, but if you use regexes, you'll destroy the contents of $1, $2, etc., which you need for later. Or so I thought; I had forgotten that $1 etc. are dynamically block scoped, so that { mess up $1 in here } $1 is restored here This makes the solution rather easier than I had expected. The solution I worked out beforehand looked something like this: sub subst { my $pat = shift; my @pos; while ($pat =~ /\$(\d+)/g) { push @pos, [$1, pos($pat)-2, 1+length $1] ; } for (reverse @pos) { my ($n, $pos, $len) = @$_; substr($pat, $pos, $len) = $$n; } $pat; } The 'while' loop scans over the template looking for escape sequences; whenever it finds one, it records its position in the string. For the example template '$2, $1, and $3', it builds this structure: [2, 0, 2], # this means that '$2' appears at position 0 [1, 4, 2], # this means that '$1' appears at position 4 [3, 12, 2], # this means that '$3' appears at position 12 Each element contains the variable to be substituted, and the position and length at which it should be inserted into the template. The position and length are just what 'substr' wants when it does a string replacement, so it's easy to use 'substr' later. The 'while' loop overwrites $1, but that's OK, because the original $1 is restored when the loop finishes. The 'for' loop then replaces the escape sequences in the template with the appropriate values. $$n is a symbolic reference, so if you were using 'strict', you need 'no strict 'refs'' just before this line. Doing the replacements in reverse order prevents the replacements from messing up the string positions of the variables you will be replacing later. In general, replacing part of a string might move anything to the right of the place where you did the replacement. So we just do the replacements from right to left. This is probably the most generally applicable trick I used; if you only learn one thing from this example, this should be it. 1. Several people did analogous solutions which began by recording the $1, $2... variables into an array, and then replaced the escape seqences with the values of the array elements. This seems to be simpler. The trick is that you then have to figure out how big the array should be. This works: sub subst { my $target = shift; my @n; for (1 .. $#+) { $n[$_] = $$_; } $target =~ s/\$(\d+)/$n[$1]/g; return $target; } It uses the Perl @+ variable to decide how many of $1, $2, etc. might be defined. 2. A few people tried a solution like the one in the previous section, but instead of using something like @+ to determine the number of $1, $2, ... variables, they just scanned through them until they found one that was undefined: sub subst { my $x=shift; my @p; for (my $i=1; eval 'defined($main::'.$i.')'; $i++) { $p[$i]=eval '$main::'.$i; } $x=~s/\$(\d+)/$p[$1]/g; $x; } This gentleman said: my expert solutions assumes there are no gaps between $'s, but i think it's quite reasonable. the only sane way to put values into this variables doesn't allow you to create gaps in numeration. Unfortunately, he's mistaken. Someone posted a complicated counterexample, but there's a really simple counterexample: "y" =~ /(x)|(y)/ After this, $1 is undefined and $2 is "y". So the 'scan while defined' approach doesn't work, even for quite ordinary cases. 3. Quite a few people tried to use 'eval'. For example: sub subst($) { return eval( '"'.$_[0].'"' ); } This works very badly. Among its defects: subst('I said, "Don\'t touch the $3!"'); This should return I said, "Don't touch the carrots!" instead, it returns undef and delivers the message Bareword found where operator expected at (eval 1) line 1, near ""I said, "Don't" (Missing operator before Don't?) on standard error. Euggh. Another defect: subst('\n$=') should return the template unchanged, since there are no appearances of $1, $2, $3, etc. The template is four characters long, consisting of . Instead, it returns the three-character string <6> <0>. A third defect: subst('".`cat /etc/passwd`."') inserts the password file into the return value. If you were to replace 'cat /etc/passwd' with 'rm -rf /', it would try to remove every file in the filesystem. This is not what you normally expect from a function whose job is to fill out templates. A good rule of thumb is that unless what you're trying to do is most clearly described as "compile and run arbitrary Perl code", it's probably a mistake to use 'eval' to do it. 4. Shlomi Fish pointed out that where my sample solution had \d+, it should have been [1-9]\d*, since \d+ allows variables like $0 and $003, which were not mentioned in the original problem statement. 5. A number of people preferred to use something like eval '$main::' . $n; to get the values of $1, $2, etc., rather than a symbolic reference like $$n. Usually the motivation was that '$$n' does not work under 'strict refs'. This is misguided. 'strict refs' compliance is not an end in itself. A 'strict refs' is a warning that you might be accidentally using a string as if it were a reference. Here, we are not doing it accidentally. Use of strings as references is a bad move for two reasons. First, it may be accidental, leading to bugs that are extremely difficult to find. That is not an issue here. Second, it's possible to make a mistake in the string and end up using a variable other than the one you meant to use. For example, consider ($var_name, $value) = get_input(...); $$var_name = $value; Here the intent is to ask the user for the name of a variable, and a value to assign to it, and then to perform the assignment. If 'get_input' forgets to chomp $var_name before returning it, then the assignment is actually made to a variable with a very strange name--- ${"x\n"} instead of $x for example. Tracking down this sort of thing can be very difficult. But using 'eval' as a replacement for the symbolic reference here makes the problem worse, not better: eval "\$$var_name = $value"; Similarly, the symbolic reference code has a potential problem in that it may allow the user to overwrite Perl's special variables like $= or $\ . But again, the 'eval' solution makes the problem worse; not only is the user still able to overwrite $\, they can also execute arbitrary code. It seems to me that over the past few years the Perl community has moved towards a rather bizarre "must-use-strict-everywhere-at-all-costs" position, even when the costs of 'strict' are to make the problems---the same problems it was put in to solve---even worse. The metaphor I use for this in my classes is that you have met someone who says "My solution is better than yours because I can leave the smoke detector switched on the whole time and it never makes any noise," and then you look at their solution and find out that it starts by putting a giant fan in the room to blow all the smoke away from the detector. This person remembers that the smoke detector is there to protect them, but they have forgotten what it is there to protect them from. They have confused the outward signs of safety (a quiet detector) with actual safety. Symbolic references are a perfectly good way to solve this problem. There are good ways to solve it without using symbolic references, but 'eval' isn't one of them. 6. The most popular one of these ways was to use @- and @+, which tell you the start and end positions of $1, $2, etc. in the original target string. For example: sub subst { my $retVal = shift; my @matches = map { substr($&, $-[$_], $+[$_]-$-[$_]) } 1..$#-; $retVal =~ s!\$(\d+)!$matches[$1-1]!g; $retVal; } Unfortunately, the original target string isn't easily available to subst(). This example assumes it's in '$&'. But $& is only the *part* of the target string that matched, so, for example, although this works properly: "fish dogs carrots" =~ /(\w+) (\w+) (\w+)/; print subst('$2, $1 and $3'); This doesn't: "---fish dogs carrots" =~ /(\w+) (\w+) (\w+)/; print subst('$2, $1 and $3'); It prints: s ca, h do and rots (This is another one of those odd cases where people on the list posted a very complicated counterexample, when a very simple one was available.) The problem goes away if you use my @matches = map { substr("$`$&$'", $-[$_], $+[$_]-$-[$_]) } 1..$#-; instead. The big drawback of all these solutions is that any use of $&, $`, or $' anywhere in the program slows down all the other regexes everywhere. 7. Another alternative to symbolic references suggested by Yitzchak Scott-Thoennes is to visit the Perl symbol table directly, using '${$::{$n}}' instead of $$n. Unfortunately, this doesn't work in this case, because for $1, etc. Perl only creates symbol table entries for the variables that are accessed symbolically. "fish dogs carrots" =~ /(\w+) (\w+) (\w+)/; $x = "$1 $3"; for (1..3) { print "$_: $ {$::{$_}}\n"; } This prints 1: fish 2: 3: carrots Where did $2 go? Well, you never used $2 in your program, so there's no symbol table entry for it. If you comment out the "$x = ..." line, the fish and carrots disappear too. But in any case, even when it works, you're still accessing the variable symbolically, and so it has all of the drawbacks of the simple symbolic reference version. In this case, I don't think those are really drawbacks at all, but if you think this is in any way superior to the symbolic references version just because it doesn't provoke a 'strict refs' failure, you're deceiving yourself. 8. Some people went further, to consider whether subst() itself should provide an escaping mechanism, so that, for example subst('The value of \$3 is "$3".') would return The value of $3 is "carrots". instead of The value of \carrots is "carrots". This is tricky to get right, because if you're going to support \$, you need to support \\ also, and then you start to have to deal with arguments like The value of \\\\\\\\\\\$3 is "\\\\\\\\\\$3". Some people presented regexes that do this: $s =~ s/(?[0]} if ref $_; } return join "", @result; } I was surprised and delighted to see how short this was. We have to be a little tricky here because while the regex matching is going on, we don't have access to the original $1, $2, etc. So instead we build up an array @result, which contains strings, which are inserted into the output literally, and little arrays, [1], [2], etc., which indicate that the value of the corresponding $1, $2, etc. variable should be inserted at that point. 10. Analogous to this, but smaller, probably more efficient, and almost certainly easier to understand, was a solution suggested by Jeff Pinyan: sub subst { my $template = shift; my @vals; for (1 .. $#+) { $vals[$_] = $$_; } $template =~ s{ \\(.) | \$([1-9]\d*) }{ if (defined $1) { $1 } else { $vals[$2] } }gex; $template; } Instead of using the rather obscure '/\G.../gc' feature to find parts of the target string we want, we just use 's///', which everyone understands. The s/// is looking for two kinds of things to substitute: A backslash followed by a single character, in which case the backslash is removed, or a $ followed by a numeral, in which case the approrpaite value is substituted. Everything else stays the same. Thanks again to everyone who subscribed, and especially to those who participated in the discussion. I'll send another quiz on Wednesday.