Next | Regular Expression Mastery | 77 |
The tokenizer:
sub tokens { my @tokens = split m{( \*\* | := # ** or := operator | [-+*/^()=] # some other operator | [A-Za-z]\w* # Identifier | \d*\.\d+(?:[Ee]\d+)? # Decimal number | \d+ # Integer )}x, shift(); grep /\S/, @tokens; }
Easy to understand and to change, efficient, predictable.
Behaves very much like similar lex-generated parsers
This is why we need /x:
split m{(\*\*|:=|[-+*/^()=]|[A-Za-z]\w*|\d*\.\d+(?:[Ee]\d+)?|\d+)}, shift();
Note that the order of the | alternatives is important
Is ** one token or two? What about 12.23?
Next | Copyright © 2002 M. J. Dominus |