| Next | Regular Expression Mastery | 77 | 
The tokenizer:
        sub tokens {
          my @tokens = 
            split m{(
                     \*\* | :=  # ** or := operator 
                    |
                     [-+*/^()=]  # some other operator
                    |
                     [A-Za-z]\w* # Identifier
                    |
                     \d*\.\d+(?:[Ee]\d+)?  # Decimal number
                    |
                     \d+         # Integer
                    )}x, shift();
          grep /\S/, @tokens;
        }
Easy to understand and to change, efficient, predictable.
Behaves very much like similar lex-generated parsers
This is why we need /x:
     split 
       m{(\*\*|:=|[-+*/^()=]|[A-Za-z]\w*|\d*\.\d+(?:[Ee]\d+)?|\d+)}, 
       shift();
Note that the order of the | alternatives is important
Is ** one token or two? What about 12.23?
| Next | ![]()  | 
    Copyright © 2002 M. J. Dominus |