Subject: TPF grant proposal: Lexical Pragmas Organization: Plover Systems Date: Sat, 27 Sep 2003 19:14:38 -0400 From: Mark Jason Dominus * Introduction On September 2, I posted a 'trial balloon' patch to implement lexically scoped declarations in Perl. I propose to finish implementing this feature for release in Perl 5.8.2. This will take one month, and I will do it for $750. * The feature Several Perl 'pragma' declarations should be lexically scoped, but aren't. The best example is the 'sort' pragma: use sort 'stable'; # guarantee stability use sort '_quicksort'; # use a quicksort algorithm use sort '_mergesort'; # use a mergesort algorithm use sort '_qsort'; # alias for quicksort The manual says: CAVEATS This pragma is not lexically scoped : its effect is global to the program it appears in. Since it's global, it's of very limited usefulness. A module author can't safely use this declaration, because it will affect every sort in every program that uses the module. This limitation applies to a number of other declarations. I believe that lexically-scoped declarations are an extremely useful general feature, and we would see a lot more of them, except there is no good way to implement them at present. The few lexically-scoped pragmas that Perl does support each use up a bit in the PL_hints variable, which only contains 32 bits. People are reluctant to use up this nonrenewable resource except for the most important purposes. Early this month I delivered a trial patch for Perl that enabled lexically-scoped pragmas to be written in pure Perl or in XS. The core patch itself was tiny; only a few lines long. * Possible applications I don't mean to suggest that all of the following applications would be valuable, desirable, or practical. I'm just trying to give an idea of the very large world the possible applications of this feature. ** sort.pm, as noted above. ** warnings.pm. I believe the implementation of lexical warnings could be simplified using the lexical pragma feature. It might be possible to move a lot of lexical warning-related code out of the core. I think anything that makes the core smaller is worth considering. ** B::Deparse presently doesn't report on 'strict vars' or 'strict subs' correctly: % perl -MO=Deparse -e 'use strict; { no strict "vars"; my $x = 1 } my $y = 1; ' use strict 'refs'; { my $x = 1; } my $y = 1; The lexical pragma feature could be used to fix this. ** diagnostics.pm. There is no way to turn the effect on or off for a single block. The manual says: Not being able to say "no diagnostics" is annoying, but may not be insurmountable. The lexical pragma patch surmounts this obstacle. ** strict.pm. At present, 'strict subs' and 'strict vars' are processed entirely at compile time, leaving modules like B::Deparse with no way to find out if they were enabled for some particular block of code. 'strict' could use the lexical pragma patch to tag the appropriate blocks with notices that B::Deparse could use to correctly decompile the blocks. ** Memoize.pm. At present, memoization of a function is global. With the lexical pragma feature, one would be able to specify that only calls from inside certain blocks would be subject to the memoization optimization; or, conversely, one would be able to say use Memoize; memoize 'somefunc'; somefunc(...); # Uses the cache { no Memoize 'somefunc'; somefunc(...); # Really call somefunc(); ignore cache } somefunc(...); # Uses the cache ** Many modules introduce effects that should have a lexical effect, but don't. Damian Conway's Hook::LexWrap module is an example of this. The idea of Hook::LexWrap is that one can say sub doit { ... } doit(...); { wrap 'doit', pre => sub { ... }; doit(...); } and the calls inside the scope of the 'wrap' declaration will invoke the 'pre' wrapper function before actually calling doit(). To avoid unexpected action-at-a-distance effects, one usually wants the effect of 'wrap' to be lexically scoped. The Hook::LexWrap manual says: Lexically scoped wrappers Normally, any wrappers installed by "wrap" remain attached to the sub- routine until it is undefined. However, it is possible to make specific wrappers lexically bound, so that they operate only until the end of the scope in which they're created (or until some other specific point in the code). Unfortunately, it is lying: { wrap 'doit', pre => sub { ... }; foo(); } sub foo { doit(...); } foo() might be in another file, written by another author, but it inherits the wrapping behavior that was set up in the first block above. The name of the module promises lexical wrapping, but doesn't deliver it. Let's save Damian from this embarrassment. With the lexical pragma feature, wrappers can be made truly lexical. ** %^H Perl contains a special hash, %^H, which was introduced in an attempt to provide a lexical pragma feature. The implementation idea was flawed, and it doesn't work properly. The manual says: The %^H hash provides the same scoping semantic as $^H. This makes it useful for implementation of lexically scoped pragmas. It is lying, but %^H could be made to work using the new lexical pragma feature, and might provide a an alternative interface to it. According to Yitzchak Scott-Thoennes, %^H is used by the following standard modules: 'charnames', 'overload::constant', and 'vmsish'. It is also used by 'open' when PerlIO's layers feature is used, and it used to be used for 'sort.pm', but 'sort.pm' was changed when it was discovered that %^H didn't work properly. ** $& Perl has a useful built-in variable, $&, which is set after a regex match operation to contain the matching portion of the target string. If the variable is used anywhere, every regex match in the entire program must maintain this information, resulting in a slowdown of all regex matching. As a result, the feature is little used. It could have been a valuable feature, but the cost is too high. With the lexical pragma feature, it would be possible to solve this problem. One could develop a declaration that was used like this: { use matchvars; ... $target =~ /pattern/ ... now do something with $&; } ... $target2 =~ /pattern2/ Regex matches in the scope of 'use matchvars' would populate $&, but would *not* set the flag that tells Perl that $& should be populated by all other matching operations. In the example above, /pattern1/ would populate $&, but /pattern2/ wouldn't, and so only the match operations in the scope of the 'matchvars' declaration would pay the performance penalty. ** no tainting; Programs running in taint mode would able to use a 'no tainting' declaration which would declare that code inside the current block was guaranteed safe and exempt from taint checking. ** Debugging It becomes trivial to write a debugging function whose effect and be turned on and off per block. Consider: ... debug "Located user in database"; ... if ($uid > 1000) { debug "user ID $uid > 1000"; ... debug "couldn't remove user"; ... } debug "Finished dealing with user"; With the lexical pragma feature, it's easy to implement a debug() function that does nothing, unless it is called in the scope of a 'use mydebugging' pragma. The example code above would produce no debugging messages. If the programmer wanted to debug the 'if' block, they would insert a declaration: ... debug "Located user in database"; ... if ($uid > 1000) { use mydebugging; debug "user ID $uid > 1000"; ... debug "couldn't remove user"; ... } debug "Finished dealing with user"; Debugging messages are now enabled, but only inside the block of interest. Alternatively: ... debug "Located user in database"; ... if ($uid > 1000) { use mydebugging 'VERBOSE'; debug "user ID $uid > 1000"; ... use mydebugging 'normal'; debug "couldn't remove user"; ... } debug "Finished dealing with user"; Verbose debugging messages are enabled from the top of the block up to the following declaration, ** $SIG{__WARN__} Paul Marquess said: For example, I would like to define a pragma that allows a lexically scoped equivalent of $SIG{__WARN__}. The lexical pragma feature makes this easy. ** -i option When 'perl -i' encounters a file that can't be renamed, it issues a warning message and skips the file. There is no way to change this behavior. A lexical pragma is just what is needed. ** Module behavioral changes Module authors would be able to use the feature to introduce their own lexical declarations that changed the effects of their library functions. To pick the first example that comes to mind, consider DBI. Most small- to medium-sized programs run DBI in 'RaiseError' mode, in which all DBI errors throw exceptions. A program might be written with RaiseError turned on, since database errors represent unexpected programming mistakes. But later, a feature is added to the program to accept a query from a CGI form and display the results of the query. If the query produces an error, the program shouldn't die; it should trap the error and report it back to the web user: use DBI 'RaiseError'; ... sub do_user_query { no DBI 'RaiseError'; ... some_other_function(); ... } sub some_other_function { ... } It's important that the effect of the declaration be confined to the 'do_user_query' function, because the other functions that it calls, such as 'some_other_function', were written to assume that RaiseError would be true, and aren't prepared to handle error returns from DBI calls. * Current status I submitted a trial patch on September 2. The patch included: * the core changes necessary to implement the new feature * A new core XS module, called 'pragma', which allows pure Perl modules to access the new features * A pure-perl example pragma module, called pragma::Demo, which demonstrates how to use 'pragma' to implement pragmas * Incomplete documentation * A test suite based on pragma::Demo All the old tests passed. There are a few remaining technical issues to resolve: * Behavior of pragmas with respect to 'eval' and file boundaries. In particular, the following doesn't work at present: use some_lexical_pragma; eval 'is_the_lexical_pragma_set()'; The setting of the lexical pragma is not propagated into the code inside 'eval'. This needs to be fixed. * The test suite and documentation should be completed. * Garbage collection of pragma data. * Miscellaneous packaging issues; where the new functions should be added, etc. * Deliverables * Resolve technical issues noted above, including propagation into string 'eval'; solve potential garbage collection problem; complete test suite and documentation. * Convert of sort.pm to use the new feature, as discussed above. * Enhance of strict.pm and B::Deparse to use the new feature, as discussed above. * Fix %^H. Investigate core modules that use it and see if they should be converted to use the lexical pragma feature directly. * Investigate feasibility of using lexical pragma features to implement warnings.pm; report. * Identify other core modules that would benefit from lexical pragma features. * References ** Older discussions of %^H feature, lexical pragmas in general, and sort.pm: http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-04/msg02392.html http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-04/msg02238.html http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-04/msg02069.html http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-12/msg00327.html ** Original proposal and technical discussion of the lexical pragma feature I implemented: http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-04/msg02106.html ** My trial balloon patch: http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-09/msg00112.html * Contact Mark Jason Dominus mjd@plover.com Voice 215 978 5986 Fax 215 978 7197 Mobile 215 964 2014