Sample solutions and discussion Perl 'Expert' Quiz of The Week #24 (20040923) If you've ever written any modules then you are probably familiar with the concept of requirements or prerequisites. These are dependencies that must be met in order for your module to work correctly. For example, if your module uses another module then that module must be present on the user's system in order for your module to function correctly. If you've used MakeMaker then you've probably written something like: use ExtUtils::MakeMaker; WriteMakefile( NAME => 'Your::Module', VERSION_FROM => 'lib/Your/Module.pm', PREREQ_PM => { 'Test::More' => 0, 'File::Spec' => 0.82, }, ); or with Module::Build: use Module::Build; my $build = Module::Build->new( module_name => 'Your::Module', license => 'perl', requires => { 'File::Spec' => 0.82, }, build_requires => { 'Test::More' => 0, ), ); $build->create_build_script; Both of the above say tell their respective build tools that File::Spec version 0.82 or above and any version of Test::More are required. Module::Build is a little more flexible in that it lets you indicate that a module is required only during the build phase (build_requires) or that a module is recommended but not required (recommends). Further, Module::Build lets you be more specific about versions, using comparison operators. For example: requires => { 'Some::Module' => '>= 0.7, != 1.0, < 2.0', } Says that a version of Some::Module of 0.7 or greater, excluding version 1.0 and less than version 2.0 is required. However, even with Module::Build's greater flexibility there are a lot of requirements that are still not possible to describe. And some modules go to great lengths to do this dynamically. The problem with this is that it makes it more difficult for tools like CPAN.pm, CPANPLUS, and some automated tools to take advantage of without running the Makefile.PL or Build.PL file. Ideally, it would be nice to specify complex requirements in the now standard META.yml file, which contains meta data about a distribution in YAML format (http://yaml.org/). Then we could have a standard module that can read those requirements and validate them. This module could be used by Module::Build, ExtUtils::MakeMaker, CPAN.pm, CPANPLUS, and any other tool that needs to validate requirements. The types of things we would like to handle in the requirements specification are boolean expressions && (and), || (or), and ^^ (xor); grouping with parenthesis; and macro definition and expansion. An example of boolean expressions (suggested by a discussion with David Wheeler and Ken Williams on the module-build list) would be: requires => q[ (DBD::Pg && DateTime::Format::Pg) || (DBD::mysql && DateTime::Format::mysql) ] This says that we need any version of either of these two sets of modules. If we need to, we can also include version specifications: requires => q[ ( DBD::Pg > 1.1 && DateTime::Format::Pg ) || ( DBD::mysql <= 1.2 && DateTime::Format::mysql ) ] Note that when a branch of the 'or' expression evaluates to true, it is not neccessary to evaluate the remaining branches - short-circuit evaluation. However, all branches of an 'and' or 'xor' expression must be evaluated for correct error reporting. Of course we also want to remain compatible as much as possible with the old specifications. For the macros, we're mostly interested in the use of predefined macros. For example: requires => q[ ( Term::Readline::Gnu ) || ( {OSNAME} == MSWin32 && Term::Readline::Perl ) ] (A possible extension would be to add a set operator (in) for versions and any other values such as '{OSNAME} in [VMS MSWin32]' or 'Some::Module in [1.0..1.9 !1.7]'.) Other useful macros might be {MULTITHREADED}, {LARGEFILES}, etc. Finally, it can be useful to allow definition of macros to simplify expressions. For Example: requires => q[ def Pg = DBD::Pg && DateTime::Format::Pg; def mysql = DBD::mysql && DateTime::Format::mysql; {Pg} || {mysql} ] Feel free to experiment with different syntax. The only hard requirements are that it supports: boolean expressions, grouping, and predefined macros. For example, my original suggestion to Ken Williams, Module::Build's author, was something of the form: requires => { 'db_driver' => q[ {postgresql} || {mysql} ], '{postgresql}' => { 'DBD::Pg' => 0, 'DateTime::Format::Pg' => 0, }, '{mysql}' => { 'DBD::mysql' => 0, 'DateTime::Format::mysql' => 0, } This syntax makes parsing a little simpler, but otherwise allows the same features. The keys with braces around the names are not evaluated; They are definitions of macros that are only evaluated when they appear in the value of another key. This weeks quiz is to write a module (Prereq::Expr) that can take a specification of the type described above, and evaluate it to determine if the requirements are satisfied. The specification is the value assigned to the 'requires' key in the examples above. (Prereq::Expr->eval( $dist{requires} )). It can be either a hash, string, or array, whichever makes more sense. But, it should be able to handle the old style requirements. In order to determine the version of an installed module, I've extracted the routines from Module::Build and put them in a module at: http://perl.plover.com/qotw/misc/e024/Versions.pm Use it like this: use Versions; my $version = Versions->from_module('File::Spec'); or my $version = Versions->from_file('/path/to/module.pm'); To simplify comparisons, you can assume that versions are real numbers and just compare with perl's built-in numeric comparison operators. For the purpose of this quiz, it's not necessary to worry about alpha versions and all the complications of comparing versions. If you are interested, you might check out John Peacock's version and version::alphabeta modules. One of the more interesting problems with this quiz, I think, is the problem of reporting missing requirements. When you can have arbitrarily large sets of alternative modules, it's not obvious (to me) how best to notify the user in an orderly and comprehensible way. I do, of course, have an ulterior motive in this quiz. This form of complex requirements has been on the Module::Build TODO list for a long time. I'd like to submit the solutions posted, with the permission of their respective authors, to Ken Williams for possible inclusion in some form or another in a future version of Module::Build. If you want to implement this in a language other than perl, you can simply replace Versions.pm with a module/class or whatever that simply contains a list of hardcoded versions or whatever make sense for your language of choice. The important part is the expression parsing and evaluation. ---------------------------------------------------------------- I apologize for getting this summary out late. As I told Mark on Monday, it's been a rough week with my car being stolen then finally returned a little banged up and with the upholstery cut up a bit. That's not much of an excuse though since I've had a couple months to prepare. I had started a solution on Saturday, but when I went to finish it up on Sunday I noticed that no responses had been posted yet. I waited to see what would happen, and on Monday, with no responses to the quiz, I guessed that the problem was that either A) no one was interested, or B) no one was familiar with expression parsing. Since option (A) made me feel bad, I chose to believe the lack of responses was due to (B). ;-) So, on Monday I decided to take my solution in a different direction. Originally, I had rolled the lexer and parser into a single loop and had the grammar mixed in and hard-coded. This would have worked nice, but I thought it would be better to provide a sort of brief introduction to parsing expressions. I scrapped my previous attempt and went with a slightly more formal approach. It's not as complete as what I would have liked to post, but it introduces the basic concepts. I've always thought that expression parsing was a great exercise. It makes use of tree structures and stacks, and it's a great example of how OO programming can take a complex task and partition it into smaller, more manageable pieces. The basic components of an expression parser are the lexer, the parser, and the syntax tree. The lexer takes a stream of characters and produces a stream of tokens. The tokens are the smallest bits of meaningful data in a "language"; they are the operators, the variable names, and literals. The parser performs a basic interpretation of the tokens, and stores the result in a syntax tree. It basically turns a sequence of tokens into a sequence of sub-expressions. The Syntax tree provides a means of storing the tokens and their structure in a way that makes it easy to analyze (semantic analysis) and evaluate to produce a solution. It's more interesting and helpful to take a look at the syntax tree first. Take for example, the expression '1 + 2 * 3'. This can be represented as a tree like + / \ 1 * / \ 2 3 To evaluate the expression, you start with the root node and for each sub node you evaluate the lhs (left hand side), then the rhs (right hand side), finally performing the operation on the results. There are two types of nodes. The inner nodes which represent expressions and the outer, leaf nodes which represent literal values, variables, etc. Here is the code for the inner node which represents a binary operation. package Expr::BinaryOp; use strict; use warnings; sub new { my $class = shift; my $self = { lhs => shift, op => shift, rhs => shift, }; return bless( $self, $class ); } sub eval { my $self = shift; my $lhs = $self->{lhs}; my $lval = $lhs->eval(); my $rhs = $self->{rhs}; my $rval = $rhs->eval(); my $op = $self->{op}; # string conversion if ( $op eq '==' ) { $op = 'eq' if $lhs->isa( 'Expr::Literal::String' ) || $rhs->isa( 'Expr::Literal::String' ); } elsif ( $op eq '!=' ) { $op = 'ne' if $lhs->isa( 'Expr::Literal::String' ) || $rhs->isa( 'Expr::Literal::String' ); } if ( $op eq '^^' ) { # special case return ($lval || $rval) && !($lval && $rval); } else { my $result; eval "\$result = \$lval $op \$rval"; return $result; } } 1; The constructor just initializes the state with the lhs & rhs terms and the operator. The eval function simply evaluates the lhs & rhs expressions, changes the operator to a string operator if necessary, and evaluates the results. To make things simple, everything is an expression with an eval() method. Even literals: package Expr::Literal; use strict; use warnings; sub new { my $class = shift; my $symbol = shift; return bless( \$symbol, $class ); } sub eval { my $self = shift; return $$self; } 1; ############################################################ package Expr::Literal::String; use base qw(Expr::Literal); 1; ############################################################ package Expr::Literal::Number; use base qw(Expr::Literal); 1; As you can see, literal expressions just return their value. The reason for the separate subclasses for String and Number is to maintain type information. The class for package names is just as simple. When evaluated it looks up the version using the Versions.pm file provided: package Expr::Symbol; sub new { my $class = shift; my $symbol = shift; return bless( \$symbol, $class ); } sub eval { die "Not implemented.\n" } 1; ############################################################ package Expr::Symbol::Name; use base qw(Expr::Symbol); use strict; use warnings; sub eval { my $self = shift; require Versions; my $vers; eval { $vers = Versions->from_module( $$self ) }; return $vers ? $vers : 0; } 1; Again, we're using class hierarchies to maintain type information. The final expression type is for variables. It uses a helper class to maintain a list of variable names that can persist across multiple expressions. package Expr::Symbol::Variable; use base qw(Expr::Symbol); use strict; use warnings; sub eval { my $self = shift; my $symtab = Expr::SymbolTable->instance(); return $symtab->lookup( $$self ); } 1; ############################################################ package Expr::SymbolTable; use strict; use warnings; my $singleton; sub instance { my $class = shift; return $singleton if defined( $singleton ); my $self = { OSNAME => $^O, # Add more built-in vars here }; $singleton = bless( $self, $class ); return $singleton; } sub lookup { my $self = shift; my $name = shift; return $self->{$name}; } sub set { my $self = shift; my $name = shift; my $value = shift; return $self->{$name} = $value; } sub unset { my $self = shift; my $name = shift; return delete $self->{$name}; } 1; With what we have, we can manually assemble arbitrarily complex expressions: # File::Spec > 0.80 my $module_version = Expr::Symbol::Name->new( 'File::Spec' ); my $wanted_version = Expr::Literal::Number->new( 0.80 ); my $expr1 = Expr::BinaryOp->new( $module_version, '>', $wanted_version ); # {OSNAME} == 'linux' my $osname = Expr::Symbol::Variable->new( 'OSNAME' ); my $os_wanted = Expr::Literal::String->new( 'linux' ); my $expr2 = Expr::BinaryOp->new( $osname, '==', $os_wanted ); # File::Spec > 0.80 && {OSNAME} == 'linux' my $expr = Expr::BinaryOp->new( $expr1, '&&', $expr2 ); And evaluate them as simply as: print $expr->eval() ? 'true' : 'false'; That's a cumbersome way of crafting an expression. This is where the lexer and parser come in. The lexer is actually quite straight forward: package Expr::Lexer; use strict; use warnings; sub new { my $class = shift; my $expr = shift; $expr =~ s/^\s+//; $expr =~ s/\s+$//; my $self = { expr => $expr, tokens => [], }; return bless( $self, $class ); } sub eval { my $self = shift; my @token_desc = ( { re => qr/(\()/, ttype => 'BEGIN_GROUP', }, { re => qr/(\))/, ttype => 'END_GROUP', }, { re => qr/(>=)/, ttype => 'RELOP', }, { re => qr/(>)/, ttype => 'RELOP', }, { re => qr/(<=)/, ttype => 'RELOP', }, { re => qr/(<)/, ttype => 'RELOP', }, { re => qr/(==)/, ttype => 'RELOP', }, { re => qr/(!=)/, ttype => 'RELOP', }, { re => qr/(&&)/, ttype => 'LOP', }, { re => qr/(\|\|)/, ttype => 'LOP', }, { re => qr/(\^\^)/, ttype => 'LOP', }, { re => qr/([[:alpha:]]\w+(?:::\w+)*(?=\W|$))/, ttype => 'NAME', }, { re => qr/{(\w+)}/, ttype => 'VARIABLE', }, { re => qr/(\d+(?:\.\d+)*|\d+\.|\.\d+)/, ttype => 'NUMBER', }, { re => qr/'([^']*)'/, ttype => 'STRING', }, { re => qr/"([^"]*)"/, ttype => 'STRING', }, ); my $text = $self->{expr}; while (length( $text )) { foreach my $token ( @token_desc ) { if ( $text =~ /^$token->{re}/ ) { push( @{$self->{tokens}}, [ $1, $token->{ttype} ] ); $text =~ s/^$token->{re}\s*//; } } } push( @{$self->{tokens}}, [ 'EOF', 'EOF' ] ); } sub next { my $self = shift; return @{$self->{tokens}} ? @{shift( @{$self->{tokens}} )} : (); } 1; The lexer takes an expression in its constructor. Then when its eval() method is called it iterates over a list of token types. As matches are made the tokens are extracted and placed in an array. They can later be examined through the next() method which is provided as a forward only, destructive iterator. For example: my $lexer = Expr::Lexer->new( q[ File::Spec > 0.80 && {OSNAME} == 'linux' ] ); $lexer->eval(); while ( my( $token, $ttype ) = $lexer->next() ) { print "$token ($ttype)\n"; } Outputs: File::Spec (NAME) > (RELOP) 0.80 (NUMBER) && (LOP) OSNAME (VARIABLE) == (RELOP) linux (STRING) EOF (EOF) The (simplified) parser is actually pretty straightforward also once you understand the algorithm. If you look at an expression like: 1 * 2 + 3 + 4 you know that you must multiply '1 * 2' before you can perform the additions. Operator precedence tells us in which order operators are evaluated. The trick with the parser is evaluating operators in the correct order with as little context info as possible, while looking at each token in sequence from left to right and without looking ahead. Scanning left-to-right, notice: 1 * 2 + 3 + 4 ^ At the first op (operator), you don't have enough context information to correctly evaluate the sub-expression, so it gets pushed on a stack. However, at the next op: 1 * 2 + 3 + 4 ^ We can see that this op has a lower precedence. This means that everything to he left is isolated from the unknown part of the remaining expression, so it is safe to evaluate. 2 + 3 + 4 ^ Again, we move forward to the next op: 2 + 3 + 4 ^ This time the operator to the left is of the same precedence, so we can evaluate it, giving us the property of left-associativity. 5 + 4 ^ Finally, we reach the end where we perform the final evaluation, yielding the final result: 9 That example covered the case of the current operator having equal or lower precedence. Let's look at one more example that shows the remaining case: 1 + 2 * 3 ^ At the first op, there is not enough context information. 1 + 2 * 3 ^ At the next op, we know that the previous op has lower precedence, however, there could be an op of greater precedence to the right, so we have to advance: 1 + 2 * 3 ^ Now, we are at the end. At this point we know that however many subexpressions remain on the stack, they are in reverse order of precedence. All we need to do is evaluate the stack from top to bottom (or right-to-left looking at the examples): 1 + 6 7 I'm not sure how good my description is. I googled and found another pretty good description at (click on the link for the PDF). The code that implements the above looks like: package Expr; use strict; use warnings; sub new { my $class = shift; my $expr = shift; my $lexer = Expr::Lexer->new( $expr ); $lexer->eval(); my $self = { lexer => $lexer, }; return bless( $self, $class ); } sub eval { my $self = shift; my $expr = $self->parse(); return $expr->eval(); } sub parse { my $self = shift; my %op_precedence = ( RELOP => 20, LOP => 10, ); my( @term_stack, @op_stack ); while ( my($token, $ttype) = $self->{lexer}->next() ) { if ( $ttype eq 'NAME' ) { my $expr = Expr::Symbol::Name->new( $token ); push( @term_stack, $expr ); } elsif ( $ttype eq 'VARIABLE' ) { my $expr = Expr::Symbol::Variable->new( $token ); push( @term_stack, $expr ); } elsif ( $ttype eq 'STRING' ) { my $expr = Expr::Literal::String->new( $token ); push( @term_stack, $expr ); } elsif ( $ttype eq 'NUMBER' ) { my $expr = Expr::Literal::Number->new( $token ); push( @term_stack, $expr ); } elsif ( $ttype eq 'RELOP' || $ttype eq 'LOP' ) { if ( @op_stack ) { my $prev_prec = $op_precedence{$op_stack[-1]->[1]}; my $curr_prec = $op_precedence{$ttype}; if ( $prev_prec == $curr_prec ) { $self->reduce( \@term_stack, \@op_stack ); push( @op_stack, [ $token, $ttype ] ); } elsif ( $prev_prec > $curr_prec ) { $self->reduce( \@term_stack, \@op_stack ) while @op_stack; push( @op_stack, [ $token, $ttype ] ); } else { push( @op_stack, [ $token, $ttype ] ); } } else { push( @op_stack, [ $token, $ttype ] ); } } elsif ( $ttype eq 'EOF' ) { $self->reduce( \@term_stack, \@op_stack ) while @op_stack; } elsif ( $ttype eq 'BEGIN_GROUP' ) { my $expr = $self->parse(); push( @term_stack, $expr ); } elsif ( $ttype eq 'END_GROUP' ) { $self->reduce( \@term_stack, \@op_stack ) while @op_stack; return pop( @term_stack ); } else { die "Unexpected token type: $ttype\n"; } } return pop( @term_stack ); } sub reduce { my $self = shift; my $term_stack = shift; my $op_stack = shift; my $op = pop( @$op_stack )->[0]; my $rhs = pop( @$term_stack ); my $lhs = pop( @$term_stack ); my $expr = Expr::BinaryOp->new( $lhs, $op, $rhs ); push( @$term_stack, $expr ); } 1; Finally, we can evaluate expressions painlessly: my $expr = Expr->new( q[ File::Spec > 0.80 && {OSNAME} == 'linux' ] ); print $expr->eval() ? 'true' : 'false'; There is more to it than the above, but it shows the basics. The biggest thing missing is semantic analysis which could be implemented as check() methods. Semantic analysis makes sure that the expressions you right make sense. For example, the code above will happily evaluate expressions like: {OSNAME} > 'MSWin32' File::Spec > Cwd It's easy to check for this by checking that the types of the operands and the operator fit together, and throwing an exception if not. The code can be downloaded in a single file from http://perl.plover.com/qotw/misc/e024/expr.pl As you can see from the example, expression parsing is not rocket science, but neither is it simple. This is one of the reasons I suggested that alternatives might make things simpler. Tassilo von Parseval's solution eliminates almost all of the complexity of parsing while providing the same functionality: http://perl.plover.com/~alias/list.cgi?1:mss:2276 It uses Perl's built-in expression evaluator, executing the expression inside a "Safe" compartment to avoid unwanted side effects. IMO, it's a very elegant solution to the problem. Marc Prewitt's solution uses Parse::RecDescent; it was was educational: http://perl.plover.com/~alias/list.cgi?1:mss:2279 I've known about Parse::RecDescent for a long time, but I had never had occasion to use it before. I didn't think of it this time because I ASSumed the result would depend on Parse::RecDescent, but it doesn't; Parse::RecDescent will generate a parser that runs independently from Parse::RecDescent itself. And, also because I thought I could code a manual parser quicker than I could learn to use it. However, if it can be used without the dependency, this is probably the best solution to use if full expression parsing is desired for Module::Build. Mark Dominus's solution was cool: http://perl.plover.com/~alias/list.cgi?1:mss:2277 It uses his "py" utility to generate a parser from GNU bison's debugging output. The idea callbacks was interesting and gave me some ideas: instead of building in support for versioning, his evaluator refers everything of the form "{text}" to a callback function, which is responsible for version and configuration checking. His solution is more correct than mine as I took some shortcuts. Ron Isaacson brought up some interesting thoughts on versioning: http://perl.plover.com/~alias/list.cgi?1:mss:2278 Versioning has always been problematic. Software is always changing and there is no way to possibly anticipate future changes and enhancements. I think the more flexible version specifications discussed here is a step in the right direction. Brian Ingerson's 'only' module and Module::Build's support for it (`./Build versioninstall`) also is a part of that solution. IIRC, Perl6 will also support something similar natively. Thanks to all who contributed. I hope this summary is useful. Randy.