Subject: TPF grant proposal: Lexical Pragmas
Organization: Plover Systems
Date: Sat, 27 Sep 2003 19:14:38 -0400
From: Mark Jason Dominus <mjd@plover.com>


* Introduction

On September 2, I posted a 'trial balloon' patch to implement
lexically scoped declarations in Perl.  I propose to finish
implementing this feature for release in Perl 5.8.2.  This will take
one month, and I will do it for $750.

* The feature

Several Perl 'pragma' declarations should be lexically scoped, but
aren't.  The best example is the 'sort' pragma:

        
           use sort 'stable';          # guarantee stability
           use sort '_quicksort';      # use a quicksort algorithm
           use sort '_mergesort';      # use a mergesort algorithm

           use sort '_qsort';          # alias for quicksort

The manual says:

        CAVEATS
               This pragma is not lexically scoped : its effect is
               global to the program it appears in.

Since it's global, it's of very limited usefulness.  A module author
can't safely use this declaration, because it will affect every sort
in every program that uses the module.

This limitation applies to a number of other declarations.  I believe
that lexically-scoped declarations are an extremely useful general
feature, and we would see a lot more of them, except there is no good
way to implement them at present.  The few lexically-scoped pragmas
that Perl does support each use up a bit in the PL_hints variable,
which only contains 32 bits.  People are reluctant to use up this
nonrenewable resource except for the most important purposes.

Early this month I delivered a trial patch for Perl that enabled
lexically-scoped pragmas to be written in pure Perl or in XS.  The
core patch itself was tiny; only a few lines long.

* Possible applications

I don't mean to suggest that all of the following applications would
be valuable, desirable, or practical.  I'm just trying to give an idea
of the very large world the possible applications of this feature.

** sort.pm, as noted above.

** warnings.pm.  I believe the implementation of lexical warnings
   could be simplified using the lexical pragma feature.  It might be
   possible to move a lot of lexical warning-related code out of the
   core.  I think anything that makes the core smaller is worth
   considering.  

** B::Deparse presently doesn't report on 'strict vars' or 'strict
   subs' correctly:

        % perl -MO=Deparse -e 'use strict; { no strict "vars"; my $x = 1 } my $y = 1; '

        use strict 'refs';
        {
            my $x = 1;
        }
        my $y = 1;

   The lexical pragma feature could be used to fix this. 

** diagnostics.pm. There is no way to turn the effect on or off for a
   single block.  The manual says:

       Not being able to say "no diagnostics" is annoying, but may not be
       insurmountable.

   The lexical pragma patch surmounts this obstacle.

** strict.pm.  At present, 'strict subs' and 'strict vars' are
   processed entirely at compile time, leaving modules like B::Deparse
   with no way to find out if they were enabled for some particular
   block of code.  'strict' could use the lexical pragma patch to tag
   the appropriate blocks with  notices that B::Deparse could use to
   correctly decompile the blocks.

** Memoize.pm.  At present, memoization of a function is global.  With
   the lexical pragma feature, one would be able to specify that only
   calls from inside certain blocks would be subject to the
   memoization optimization; or, conversely, one would be able to say

        use Memoize;
        memoize 'somefunc';

        somefunc(...);  # Uses the cache
        { no Memoize 'somefunc';
          somefunc(...); # Really call somefunc(); ignore cache
        }
        somefunc(...);  # Uses the cache
        

** Many modules introduce effects that should have a lexical effect,
   but don't.  Damian Conway's Hook::LexWrap module is an example of
   this.  The idea of Hook::LexWrap is that one can say

        sub doit { ... }
        doit(...);

        {
          wrap 'doit', pre => sub { ... };

          doit(...);
        }

   and the calls inside the scope of the 'wrap' declaration will
   invoke the 'pre' wrapper function before actually calling doit().
   To avoid unexpected action-at-a-distance effects, one usually wants
   the effect of 'wrap' to be lexically scoped.  The Hook::LexWrap
   manual says:

       Lexically scoped wrappers

       Normally, any wrappers installed by "wrap" remain attached to
       the sub- routine until it is undefined. However, it is possible
       to make specific wrappers lexically bound, so that they operate
       only until the end of the scope in which they're created (or
       until some other specific point in the code).

   Unfortunately, it is lying:

        {
          wrap 'doit', pre => sub { ... };

          foo();
        }

        sub foo { 
          doit(...);
        }

   foo() might be in another file, written by another author, but it
   inherits the wrapping behavior that was set up in the first block
   above.  The name of the module promises lexical wrapping, but
   doesn't deliver it.

   Let's save Damian from this embarrassment.  With the lexical pragma
   feature, wrappers can be made truly lexical.

** %^H

   Perl contains a special hash, %^H, which was introduced in an
   attempt to provide a lexical pragma feature.  The implementation
   idea was flawed, and it doesn't work properly.  The manual says:

           The %^H hash provides the same scoping semantic as $^H.
           This makes it useful for implementation of lexically scoped
           pragmas.

   It is lying, but %^H could be made to work using the new lexical
   pragma feature, and might provide a an alternative interface to it.
   According to Yitzchak Scott-Thoennes, %^H is used by the following
   standard modules: 'charnames', 'overload::constant', and 'vmsish'.
   It is also used by 'open' when PerlIO's layers feature is used, and
   it used to be used for 'sort.pm', but 'sort.pm' was changed when it
   was discovered that %^H didn't work properly.


** $&

   Perl has a useful built-in variable, $&, which is set after a regex
   match operation to contain the matching portion of the target
   string.  If the variable is used anywhere, every regex match in the
   entire program must maintain this information, resulting in a
   slowdown of all regex matching.  As a result, the feature is little
   used.  It could have been a valuable feature, but the cost is too
   high.

   With the lexical pragma feature, it would be possible to solve this
   problem.  One could develop a declaration that was used like this:

        {
          use matchvars;
          ... $target =~ /pattern/ ...

          now do something with $&;
        }

        ... $target2 =~ /pattern2/

   Regex matches in the scope of 'use matchvars' would populate $&,
   but would *not* set the flag that tells Perl that $& should be
   populated by all other matching operations.  In the example above,
   /pattern1/ would populate $&, but /pattern2/ wouldn't, and so only
   the match operations in the scope of the 'matchvars' declaration
   would pay the performance penalty.

** no tainting;

   Programs running in taint mode would able to use a 'no tainting'
   declaration which would declare that code inside the current block
   was guaranteed safe and exempt from taint checking.

** Debugging

   It becomes trivial to write a debugging function whose effect and
   be turned on and off per block.  Consider:

        ...
        debug "Located user in database";
        ...
        if ($uid > 1000) {
          debug "user ID $uid > 1000";
          ...
          debug "couldn't remove user";
          ...
        }

        debug "Finished dealing with user";

   With the lexical pragma feature, it's easy to implement a debug()
   function that does nothing, unless it is called in the scope of a
   'use mydebugging' pragma.  The example code above would produce no
   debugging messages.  If the programmer wanted to debug the 'if'
   block, they would insert a declaration:

        ...
        debug "Located user in database";
        ...
        if ($uid > 1000) {
          use mydebugging;
          debug "user ID $uid > 1000";
          ...
          debug "couldn't remove user";
          ...
        }

        debug "Finished dealing with user";

   Debugging messages are now enabled, but only inside the block of
   interest.  Alternatively:

        ...
        debug "Located user in database";
        ...
        if ($uid > 1000) {
          use mydebugging 'VERBOSE';
          debug "user ID $uid > 1000";
          ...
          use mydebugging 'normal';
          debug "couldn't remove user";
          ...
        }

        debug "Finished dealing with user";

   Verbose debugging messages are enabled from the top of the block up
   to the following declaration,

** $SIG{__WARN__}

   Paul Marquess said:


        For example, I would like to define a pragma that allows a
        lexically scoped equivalent of $SIG{__WARN__}.

   The lexical pragma feature makes this easy.  

** -i option

   When 'perl -i'  encounters a file that can't be renamed, it issues
   a warning message and skips the file.  There is no way to change
   this behavior.  A lexical pragma is just what is needed.  

** Module behavioral changes

   Module authors would be able to use the feature to introduce their
   own lexical declarations that changed the effects of their library
   functions.  To pick the first example that comes to mind, consider
   DBI.  Most small- to medium-sized programs run DBI in 'RaiseError'
   mode, in which all DBI errors throw exceptions.  

   A program might be written with  RaiseError turned on, since
   database errors represent unexpected programming mistakes.  But
   later, a feature is added  to the program to accept a query from a
   CGI form and display the results of the query.  If the query
   produces an error, the program shouldn't die; it should trap the
   error and report it back to the web user:

        use DBI 'RaiseError';
        ...

        sub do_user_query {
          no DBI 'RaiseError';
          ...
          some_other_function();
          ...
        }

        sub some_other_function {
          ...
        }

        
   It's important that the effect of the declaration be confined to
   the 'do_user_query' function, because the other functions that it
   calls, such as 'some_other_function', were written to assume that
   RaiseError would be true, and aren't prepared to handle error
   returns from DBI calls.

* Current status

I submitted a trial patch on September 2.  The patch included:

  * the core changes  necessary to implement the new feature

  * A new core XS module, called 'pragma', which allows pure Perl
    modules to access the new features

  * A pure-perl example pragma module, called pragma::Demo, which
    demonstrates how to use 'pragma' to implement pragmas

  * Incomplete documentation

  * A test suite based on pragma::Demo

All the old tests passed.  

There are a few remaining technical issues to resolve:

  * Behavior of pragmas with respect to 'eval' and file boundaries.  In
    particular, the following doesn't work at present:

        use some_lexical_pragma;
        eval 'is_the_lexical_pragma_set()';

    The setting of the lexical pragma is not propagated into the code
    inside 'eval'.  This needs to be fixed.

  * The test suite and documentation should be completed.

  * Garbage collection of pragma data.

  * Miscellaneous packaging issues; where the new functions should be
    added, etc.

* Deliverables

  * Resolve technical issues noted above, including propagation into
    string 'eval'; solve potential garbage collection problem;
    complete test suite and documentation.

  * Convert of sort.pm to use the new feature, as discussed above.

  * Enhance of strict.pm and B::Deparse to use the new feature, as
    discussed above.

  * Fix %^H.  Investigate core modules that use it and see if they should
    be converted to use the lexical pragma feature directly.

  * Investigate feasibility of using lexical pragma features to
    implement warnings.pm; report.

  * Identify other core modules that would benefit from lexical pragma
    features.

* References

** Older discussions of %^H feature, lexical pragmas in general, and
   sort.pm:

        http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-04/msg02392.html
        http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-04/msg02238.html

        http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-04/msg02069.html

        http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-12/msg00327.html

** Original proposal and technical discussion of the lexical pragma
feature I implemented:

        http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-04/msg02106.html


** My trial balloon patch:

        http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-09/msg00112.html


* Contact

        Mark Jason Dominus

        mjd@plover.com

        Voice  215 978 5986
        Fax    215 978 7197
        Mobile 215 964 2014