Date: 04 Oct 2001 09:18:09 -0400
From: Joe Schaefer <joe+usenet@sunstarsys.com>
Subject: Re: Efficient code?
Message-Id: <m3hetfk0ha.fsf@mumonkan.sunstarsys.com>

"S Warhurst" <s.warhurst@rl.ac.uk> writes:

> "Joe Schaefer" <joe+usenet@sunstarsys.com> wrote in message
> news:m33d51n4rw.fsf@mumonkan.sunstarsys.com...
> 
> <snip>
> > The line marked XXX might be accelerated by first inverting @line_break
> > outside the loops:

[...]

> Hmmm.. that's quite a different way of doing it than otherwise
> suggested. I can imagine how it might speed things up though. I'll
> have to do some benchmarking on this when I get more time. Thanks for
> writing all that out.. this thread is all going into a reference file
> I'm keeping :) 

After a little testing, the original code seems to have some off-by-one
errors.  Here's a cleaner version (w/ study() omitted since it does slow
things down):


  read TEXTFILE, $_, -s TEXTFILE;

  my @line_break = 0;
  push @line_break, pos while /\n/g;

  # just in case the last char in $_ isn't a newline:
  push @line_break, length $_ unless $line_break[-1] == length $_;

  my %line_num; # inverts @line_break
  @line_num{@line_break} = 0..$#line_break;

  # just in case last char isn't "\n":
  $line_num{ 0 } = $line_num{ $line_break[-1] };

  my @re = map qr/$_/, qw/list of 50 regexps/;

  foreach my $r (@re) {

    while (/$r/g) {

        # first get $line: assumes $r won't match across lines-
        # i.e. be careful with "\s"

        my $n = $line_num{ index($_, "\n", pos) + 1 };
        my $line = substr $_, $line_break[$n-1],
                              $line_break[$n] - $line_break[$n-1];

        # avoids mutiple matches on the same line by advancing pos
        pos = $line_break[$n];

        # now do something
    }
  }


-- 
Joe Schaefer    "Few things are harder to put up with than the annoyance of a
                                        good example."
                                               --Mark Twain