Sample solutions and discussion
Perl 'Expert' Quiz of The Week #24 (20040923)


        If you've ever written any modules then you are probably
        familiar with the concept of requirements or
        prerequisites. These are dependencies that must be met in
        order for your module to work correctly. For example, if your
        module uses another module then that module must be present on
        the user's system in order for your module to function
        correctly.

        If you've used MakeMaker then you've probably written
        something like:

             use ExtUtils::MakeMaker;

             WriteMakefile(
                 NAME          => 'Your::Module',
                 VERSION_FROM  => 'lib/Your/Module.pm',
                 PREREQ_PM     => {
                     'Test::More' => 0,
                     'File::Spec' => 0.82,
                 },
             );

        or with Module::Build:

             use Module::Build;

             my $build = Module::Build->new(
                 module_name => 'Your::Module',
                 license => 'perl',
                 requires => {
                     'File::Spec' => 0.82,
                 },
                 build_requires => {
                     'Test::More' => 0,
                 ),
             );

             $build->create_build_script;

        Both of the above say tell their respective build tools that
        File::Spec version 0.82 or above and any version of Test::More
        are required.  Module::Build is a little more flexible in that
        it lets you indicate that a module is required only during the
        build phase (build_requires) or that a module is recommended
        but not required (recommends). Further, Module::Build lets you
        be more specific about versions, using comparison
        operators. For example:

             requires => {
                 'Some::Module' => '>= 0.7, != 1.0, < 2.0',
             }

        Says that a version of Some::Module of 0.7 or greater,
        excluding version 1.0 and less than version 2.0 is required.

        However, even with Module::Build's greater flexibility there
        are a lot of requirements that are still not possible to
        describe. And some modules go to great lengths to do this
        dynamically. The problem with this is that it makes it more
        difficult for tools like CPAN.pm, CPANPLUS, and some automated
        tools to take advantage of without running the Makefile.PL or
        Build.PL file. Ideally, it would be nice to specify complex
        requirements in the now standard META.yml file, which contains
        meta data about a distribution in YAML format
        (http://yaml.org/). Then we could have a standard module that
        can read those requirements and validate them. This module
        could be used by Module::Build, ExtUtils::MakeMaker, CPAN.pm,
        CPANPLUS, and any other tool that needs to validate
        requirements.

        The types of things we would like to handle in the
        requirements specification are boolean expressions && (and),
        || (or), and ^^ (xor); grouping with parenthesis; and macro
        definition and expansion.

        An example of boolean expressions (suggested by a discussion
        with David Wheeler and Ken Williams on the module-build list)
        would be:

             requires => q[
                 (DBD::Pg && DateTime::Format::Pg)
                   ||
                 (DBD::mysql && DateTime::Format::mysql)
             ]

        This says that we need any version of either of these two sets
        of modules. If we need to, we can also include version
        specifications:

             requires => q[
                 ( DBD::Pg > 1.1 && DateTime::Format::Pg )
                   ||
                 ( DBD::mysql <= 1.2 && DateTime::Format::mysql )
             ]

        Note that when a branch of the 'or' expression evaluates to
        true, it is not neccessary to evaluate the remaining branches
        - short-circuit evaluation. However, all branches of an 'and'
        or 'xor' expression must be evaluated for correct error
        reporting.

        Of course we also want to remain compatible as much as
        possible with the old specifications.

        For the macros, we're mostly interested in the use of
        predefined macros.  For example:

             requires => q[
                 ( Term::Readline::Gnu )
                   ||
                 ( {OSNAME} == MSWin32 && Term::Readline::Perl )
             ]

        (A possible extension would be to add a set operator (in) for
        versions and any other values such as '{OSNAME} in [VMS
        MSWin32]' or 'Some::Module in [1.0..1.9 !1.7]'.)

        Other useful macros might be {MULTITHREADED}, {LARGEFILES},
        etc.

        Finally, it can be useful to allow definition of macros to
        simplify expressions. For Example:

             requires => q[
                 def Pg = DBD::Pg && DateTime::Format::Pg;
                 def mysql = DBD::mysql && DateTime::Format::mysql;

                 {Pg} || {mysql}
             ]

        Feel free to experiment with different syntax. The only hard
        requirements are that it supports: boolean expressions,
        grouping, and predefined macros. For example, my original
        suggestion to Ken Williams, Module::Build's author, was
        something of the form:

             requires => {
                 'db_driver'    => q[ {postgresql} || {mysql} ],
                 '{postgresql}' => {
                     'DBD::Pg'                 => 0,
                     'DateTime::Format::Pg'    => 0,
                 },
                 '{mysql}'      => {
                     'DBD::mysql'              => 0,
                     'DateTime::Format::mysql' => 0,
                 }

        This syntax makes parsing a little simpler, but otherwise
        allows the same features. The keys with braces around the
        names are not evaluated; They are definitions of macros that
        are only evaluated when they appear in the value of another
        key.

        This weeks quiz is to write a module (Prereq::Expr) that can
        take a specification of the type described above, and evaluate
        it to determine if the requirements are satisfied. The
        specification is the value assigned to the 'requires' key in
        the examples above.  (Prereq::Expr->eval( $dist{requires}
        )). It can be either a hash, string, or array, whichever makes
        more sense. But, it should be able to handle the old style
        requirements.

        In order to determine the version of an installed module, I've
        extracted the routines from Module::Build and put them in a
        module at:

                http://perl.plover.com/qotw/misc/e024/Versions.pm

        Use it like this:

                use Versions;
                my $version = Versions->from_module('File::Spec');

        or

                my $version = Versions->from_file('/path/to/module.pm');

        To simplify comparisons, you can assume that versions are real
        numbers and just compare with perl's built-in numeric
        comparison operators. For the purpose of this quiz, it's not
        necessary to worry about alpha versions and all the
        complications of comparing versions. If you are interested,
        you might check out John Peacock's version and
        version::alphabeta modules.

        One of the more interesting problems with this quiz, I think,
        is the problem of reporting missing requirements. When you can
        have arbitrarily large sets of alternative modules, it's not
        obvious (to me) how best to notify the user in an orderly and
        comprehensible way.

        I do, of course, have an ulterior motive in this quiz. This
        form of complex requirements has been on the Module::Build
        TODO list for a long time. I'd like to submit the solutions
        posted, with the permission of their respective authors, to
        Ken Williams for possible inclusion in some form or another in
        a future version of Module::Build.

        If you want to implement this in a language other than perl,
        you can simply replace Versions.pm with a module/class or
        whatever that simply contains a list of hardcoded versions or
        whatever make sense for your language of choice. The important
        part is the expression parsing and evaluation.

----------------------------------------------------------------

I apologize for getting this summary out late. As I told Mark on
Monday, it's been a rough week with my car being stolen then finally
returned a little banged up and with the upholstery cut up a
bit. That's not much of an excuse though since I've had a couple
months to prepare.

I had started a solution on Saturday, but when I went to finish it up
on Sunday I noticed that no responses had been posted yet. I waited to
see what would happen, and on Monday, with no responses to the quiz, I
guessed that the problem was that either A) no one was interested, or
B) no one was familiar with expression parsing. Since option (A) made
me feel bad, I chose to believe the lack of responses was due to
(B). ;-)

So, on Monday I decided to take my solution in a different direction.
Originally, I had rolled the lexer and parser into a single loop and
had the grammar mixed in and hard-coded. This would have worked nice,
but I thought it would be better to provide a sort of brief
introduction to parsing expressions. I scrapped my previous attempt
and went with a slightly more formal approach. It's not as complete as
what I would have liked to post, but it introduces the basic concepts.

I've always thought that expression parsing was a great exercise. It
makes use of tree structures and stacks, and it's a great example of
how OO programming can take a complex task and partition it into
smaller, more manageable pieces.

The basic components of an expression parser are the lexer, the
parser, and the syntax tree. The lexer takes a stream of characters
and produces a stream of tokens. The tokens are the smallest bits of
meaningful data in a "language"; they are the operators, the variable
names, and literals. The parser performs a basic interpretation of the
tokens, and stores the result in a syntax tree. It basically turns a
sequence of tokens into a sequence of sub-expressions. The Syntax tree
provides a means of storing the tokens and their structure in a way
that makes it easy to analyze (semantic analysis) and evaluate to
produce a solution.

It's more interesting and helpful to take a look at the syntax tree
first. Take for example, the expression '1 + 2 * 3'. This can be
represented as a tree like

     +
    / \
   1  *
     / \
    2   3

To evaluate the expression, you start with the root node and for each
sub node you evaluate the lhs (left hand side), then the rhs (right
hand side), finally performing the operation on the results.

There are two types of nodes. The inner nodes which represent
expressions and the outer, leaf nodes which represent literal values,
variables, etc. Here is the code for the inner node which represents a
binary operation.

     package Expr::BinaryOp;

     use strict;
     use warnings;

     sub new {
         my $class = shift;
         my $self = {
             lhs => shift,
             op  => shift,
             rhs => shift,
         };
         return bless( $self, $class );
     }

     sub eval {
         my $self = shift;
         my $lhs = $self->{lhs}; my $lval = $lhs->eval();
         my $rhs = $self->{rhs}; my $rval = $rhs->eval();
         my $op  = $self->{op};

         # string conversion
         if ( $op eq '==' ) {
         $op = 'eq' if $lhs->isa( 'Expr::Literal::String' ) ||
                       $rhs->isa( 'Expr::Literal::String' );
         } elsif ( $op eq '!=' ) {
         $op = 'ne' if $lhs->isa( 'Expr::Literal::String' ) ||
                       $rhs->isa( 'Expr::Literal::String' );
         }

         if ( $op eq '^^' ) { # special case
             return ($lval || $rval) && !($lval && $rval);
         } else {
             my $result;
             eval "\$result = \$lval $op \$rval";
             return $result;
         }
     }

     1;

The constructor just initializes the state with the lhs & rhs terms
and the operator. The eval function simply evaluates the lhs & rhs
expressions, changes the operator to a string operator if necessary,
and evaluates the results.

To make things simple, everything is an expression with an eval()
method. Even literals:

     package Expr::Literal;

     use strict;
     use warnings;

     sub new {
         my $class  = shift;
         my $symbol = shift;
         return bless( \$symbol, $class );
     }

     sub eval {
         my $self = shift;
         return $$self;
     }

     1;

     ############################################################

     package Expr::Literal::String;
     use base qw(Expr::Literal);

     1;


     ############################################################

     package Expr::Literal::Number;
     use base qw(Expr::Literal);

     1;

As you can see, literal expressions just return their value. The
reason for the separate subclasses for String and Number is to
maintain type information.

The class for package names is just as simple. When evaluated it looks
up the version using the Versions.pm file provided:

     package Expr::Symbol;

     sub new {
         my $class  = shift;
         my $symbol = shift;
         return bless( \$symbol, $class );
     }

     sub eval { die "Not implemented.\n" }

     1;


     ############################################################

     package Expr::Symbol::Name;

     use base qw(Expr::Symbol);

     use strict;
     use warnings;

     sub eval {
         my $self = shift;
         require Versions;
         my $vers;
         eval { $vers = Versions->from_module( $$self ) };
         return $vers ? $vers : 0;
     }

     1;

Again, we're using class hierarchies to maintain type information.

The final expression type is for variables. It uses a helper class to
maintain a list of variable names that can persist across multiple
expressions.

     package Expr::Symbol::Variable;

     use base qw(Expr::Symbol);

     use strict;
     use warnings;

     sub eval {
         my $self = shift;
         my $symtab = Expr::SymbolTable->instance();
         return $symtab->lookup( $$self );
     }

     1;


     ############################################################

     package Expr::SymbolTable;

     use strict;
     use warnings;

     my $singleton;

     sub instance {
         my $class = shift;

         return $singleton if defined( $singleton );

         my $self = {
             OSNAME   => $^O,
             # Add more built-in vars here
         };

         $singleton = bless( $self, $class );
         return $singleton;
     }

     sub lookup {
         my $self = shift;
         my $name = shift;
         return $self->{$name};
     }

     sub set {
         my $self = shift;
         my $name = shift;
         my $value = shift;
         return $self->{$name} = $value;
     }

     sub unset {
         my $self = shift;
         my $name = shift;
         return delete $self->{$name};
     }

     1;

With what we have, we can manually assemble arbitrarily complex expressions:

     # File::Spec > 0.80
     my $module_version = Expr::Symbol::Name->new( 'File::Spec' );
     my $wanted_version = Expr::Literal::Number->new( 0.80 );
     my $expr1 = Expr::BinaryOp->new( $module_version, '>', $wanted_version );

     # {OSNAME} == 'linux'
     my $osname = Expr::Symbol::Variable->new( 'OSNAME' );
     my $os_wanted = Expr::Literal::String->new( 'linux' );
     my $expr2 = Expr::BinaryOp->new( $osname, '==', $os_wanted );

     # File::Spec > 0.80 && {OSNAME} == 'linux'
     my $expr = Expr::BinaryOp->new( $expr1, '&&', $expr2 );

And evaluate them as simply as:

     print $expr->eval() ? 'true' : 'false';

That's a cumbersome way of crafting an expression. This is where the
lexer and parser come in.

The lexer is actually quite straight forward:

     package Expr::Lexer;

     use strict;
     use warnings;

     sub new {
         my $class = shift;
         my $expr  = shift;
         $expr =~ s/^\s+//;
         $expr =~ s/\s+$//;

         my $self = {
             expr   => $expr,
             tokens => [],
         };
         return bless( $self, $class );
     }

     sub eval {
         my $self = shift;

         my @token_desc = (
             { re    => qr/(\()/,
               ttype => 'BEGIN_GROUP',
             },
             { re    => qr/(\))/,
               ttype => 'END_GROUP',
             },
             { re    => qr/(>=)/,
               ttype => 'RELOP',
             },
             { re    => qr/(>)/,
               ttype => 'RELOP',
             },
             { re    => qr/(<=)/,
               ttype => 'RELOP',
             },
             { re    => qr/(<)/,
               ttype => 'RELOP',
             },
             { re    => qr/(==)/,
               ttype => 'RELOP',
             },
             { re    => qr/(!=)/,
               ttype => 'RELOP',
             },
             { re    => qr/(&&)/,
               ttype => 'LOP',
             },
             { re    => qr/(\|\|)/,
               ttype => 'LOP',
             },
             { re    => qr/(\^\^)/,
               ttype => 'LOP',
             },
             { re    => qr/([[:alpha:]]\w+(?:::\w+)*(?=\W|$))/,
               ttype => 'NAME',
             },
             { re    => qr/{(\w+)}/,
               ttype => 'VARIABLE',
             },
             { re    => qr/(\d+(?:\.\d+)*|\d+\.|\.\d+)/,
               ttype => 'NUMBER',
             },
             { re    => qr/'([^']*)'/,
               ttype => 'STRING',
             },
             { re    => qr/"([^"]*)"/,
               ttype => 'STRING',
             },
         );

         my $text = $self->{expr};
         while (length( $text )) {
             foreach my $token ( @token_desc ) {
                 if ( $text =~ /^$token->{re}/ ) {
                     push( @{$self->{tokens}}, [ $1, $token->{ttype} ] );
                     $text =~ s/^$token->{re}\s*//;
                 }
             }
         }
         push( @{$self->{tokens}}, [ 'EOF', 'EOF' ] );
     }

     sub next {
         my $self = shift;
         return @{$self->{tokens}} ? @{shift( @{$self->{tokens}} )} : ();
     }

     1;

The lexer takes an expression in its constructor. Then when its eval()
method is called it iterates over a list of token types. As matches
are made the tokens are extracted and placed in an array. They can
later be examined through the next() method which is provided as a
forward only, destructive iterator. For example:

     my $lexer = Expr::Lexer->new( q[
         File::Spec > 0.80 && {OSNAME} == 'linux'
     ] );
     $lexer->eval();

     while ( my( $token, $ttype ) = $lexer->next() ) {
         print "$token ($ttype)\n";
     }

Outputs:

     File::Spec (NAME)
     > (RELOP)
     0.80 (NUMBER)
     && (LOP)
     OSNAME (VARIABLE)
     == (RELOP)
     linux (STRING)
     EOF (EOF)

The (simplified) parser is actually pretty straightforward also once
you understand the algorithm. If you look at an expression like:

     1 * 2 + 3 + 4

you know that you must multiply '1 * 2' before you can perform the
additions. Operator precedence tells us in which order operators are
evaluated. The trick with the parser is evaluating operators in the
correct order with as little context info as possible, while looking
at each token in sequence from left to right and without looking
ahead.

Scanning left-to-right, notice:

     1 * 2 + 3 + 4
       ^
At the first op (operator), you don't have enough context information to 
correctly evaluate the sub-expression, so it gets pushed on a stack. 
However, at the next op:

     1 * 2 + 3 + 4
           ^

We can see that this op has a lower precedence. This means that
everything to he left is isolated from the unknown part of the
remaining expression, so it is safe to evaluate.

     2 + 3 + 4
       ^

Again, we move forward to the next op:

     2 + 3 + 4
           ^

This time the operator to the left is of the same precedence, so we
can evaluate it, giving us the property of left-associativity.

     5 + 4
           ^

Finally, we reach the end where we perform the final evaluation,
yielding the final result:

     9

That example covered the case of the current operator having equal or
lower precedence. Let's look at one more example that shows the
remaining case:

     1 + 2 * 3
       ^

At the first op, there is not enough context information.

     1 + 2 * 3
           ^

At the next op, we know that the previous op has lower precedence,
however, there could be an op of greater precedence to the right, so
we have to advance:

     1 + 2 * 3
               ^

Now, we are at the end. At this point we know that however many
subexpressions remain on the stack, they are in reverse order of
precedence. All we need to do is evaluate the stack from top to bottom
(or right-to-left looking at the examples):

     1 + 6

     7

I'm not sure how good my description is. I googled and found another
pretty good description at <http://www.epaperpress.com/oper/> (click
on the link for the PDF).

The code that implements the above looks like:

     package Expr;

     use strict;
     use warnings;

     sub new {
         my $class = shift;
         my $expr  = shift;

         my $lexer = Expr::Lexer->new( $expr );
         $lexer->eval();

         my $self = {
             lexer => $lexer,
         };
         return bless( $self, $class );
     }

     sub eval {
         my $self = shift;
         my $expr = $self->parse();
         return $expr->eval();
     }

     sub parse {
         my $self = shift;

         my %op_precedence = (
             RELOP => 20,
             LOP   => 10,
         );

         my( @term_stack, @op_stack );

         while ( my($token, $ttype) = $self->{lexer}->next() ) {
         if ( $ttype eq 'NAME' ) {
             my $expr = Expr::Symbol::Name->new( $token );
             push( @term_stack, $expr );
         } elsif ( $ttype eq 'VARIABLE' ) {
             my $expr = Expr::Symbol::Variable->new( $token );
             push( @term_stack, $expr );
         } elsif ( $ttype eq 'STRING' ) {
             my $expr = Expr::Literal::String->new( $token );
             push( @term_stack, $expr );
         } elsif ( $ttype eq 'NUMBER' ) {
             my $expr = Expr::Literal::Number->new( $token );
             push( @term_stack, $expr );
         } elsif ( $ttype eq 'RELOP' || $ttype eq 'LOP' ) {
             if ( @op_stack ) {
                 my $prev_prec = $op_precedence{$op_stack[-1]->[1]};
                 my $curr_prec = $op_precedence{$ttype};
                 if ( $prev_prec == $curr_prec ) {
                     $self->reduce( \@term_stack, \@op_stack );
                     push( @op_stack, [ $token, $ttype ] );
                 } elsif ( $prev_prec > $curr_prec ) {
                     $self->reduce( \@term_stack, \@op_stack ) while 
@op_stack;
                     push( @op_stack, [ $token, $ttype ] );
                 } else {
                     push( @op_stack, [ $token, $ttype ] );
                 }
             } else {
                 push( @op_stack, [ $token, $ttype ] );
             }
         } elsif ( $ttype eq 'EOF' ) {
             $self->reduce( \@term_stack, \@op_stack ) while @op_stack;
         } elsif ( $ttype eq 'BEGIN_GROUP' ) {
             my $expr = $self->parse();
             push( @term_stack, $expr );
         } elsif ( $ttype eq 'END_GROUP'   ) {
             $self->reduce( \@term_stack, \@op_stack ) while @op_stack;
             return pop( @term_stack );
         } else {
             die "Unexpected token type: $ttype\n";
         }
         }

         return pop( @term_stack );
     }

     sub reduce {
         my $self = shift;
         my $term_stack = shift;
         my $op_stack   = shift;

         my $op  = pop( @$op_stack )->[0];
         my $rhs = pop( @$term_stack );
         my $lhs = pop( @$term_stack );

         my $expr = Expr::BinaryOp->new( $lhs, $op, $rhs );
         push( @$term_stack, $expr );
     }

     1;

Finally, we can evaluate expressions painlessly:

     my $expr = Expr->new( q[
         File::Spec > 0.80 && {OSNAME} == 'linux'
     ] );
     print $expr->eval() ? 'true' : 'false';

There is more to it than the above, but it shows the basics. The
biggest thing missing is semantic analysis which could be implemented
as check() methods. Semantic analysis makes sure that the expressions
you right make sense. For example, the code above will happily
evaluate expressions like:

     {OSNAME} > 'MSWin32'
     File::Spec > Cwd

It's easy to check for this by checking that the types of the operands 
and the operator fit together, and throwing an exception if not.

The code can be downloaded in a single file from 

        http://perl.plover.com/qotw/misc/e024/expr.pl

As you can see from the example, expression parsing is not rocket
science, but neither is it simple. This is one of the reasons I
suggested that alternatives might make things simpler. 

Tassilo von Parseval's solution eliminates almost all of the
complexity of parsing while providing the same functionality:

        http://perl.plover.com/~alias/list.cgi?1:mss:2276

It uses Perl's built-in expression evaluator, executing the expression
inside a "Safe" compartment to avoid unwanted side effects.  IMO, it's
a very elegant solution to the problem.

Marc Prewitt's solution uses Parse::RecDescent; it was was
educational:

        http://perl.plover.com/~alias/list.cgi?1:mss:2279

I've known about Parse::RecDescent for a long time, but I
had never had occasion to use it before. I didn't think of it this
time because I ASSumed the result would depend on Parse::RecDescent,
but it doesn't; Parse::RecDescent will generate a parser that runs
independently from Parse::RecDescent itself. And, also because I
thought I could code a manual parser quicker than I could learn to use
it. However, if it can be used without the dependency, this is
probably the best solution to use if full expression parsing is
desired for Module::Build.

Mark Dominus's solution was cool:  

        http://perl.plover.com/~alias/list.cgi?1:mss:2277

It uses his "py" utility to generate a parser from GNU bison's
debugging output. The idea callbacks was interesting and gave me some
ideas: instead of building in support for versioning, his evaluator
refers everything of the form "{text}" to a callback function, which
is responsible for version and configuration checking.  His solution
is more correct than mine as I took some shortcuts.

Ron Isaacson brought up some interesting thoughts on
versioning:

        http://perl.plover.com/~alias/list.cgi?1:mss:2278

Versioning has always been problematic. Software is always changing
and there is no way to possibly anticipate future changes and
enhancements. I think the more flexible version specifications
discussed here is a step in the right direction. Brian Ingerson's
'only' module and Module::Build's support for it (`./Build
versioninstall`) also is a part of that solution. IIRC, Perl6 will
also support something similar natively.  

Thanks to all who contributed. I hope this summary is useful.

Randy.