Return-Path: owner-perl5-porters@perl.org Return-Path: Delivered-To: mjd-p5p@plover.com Message-ID: <19980416052352.2111.qmail@plover.com> To: perl5-porters@perl.com cc: mjd@plover.com Subject: Pattern matching in SNOBOL4 (long, digression) Organization: Plover Systems Date: Thu, 16 Apr 1998 01:23:52 -0400 From: Mark-Jason Dominus Sender: owner-perl5-porters@perl.org Precedence: bulk X-Loop: Perl5-Porters This note started out as an analysis of SNOBOL4's `FAIL' pattern, and turned into a huge ramble about SNOBOL in general. If it has a point, the point is only that SNOBOL's pattern matching is *still* a lot better than Perl's, and that it is worth studying, because we could learn a lot from it. The canonical SNOBOL reference is The SNOBOL4 Programming Language R.E. Griswold, J. F. Poage, I. P. Polonsky Prentice-Hall, 1971 ---------------------------------------------------------------- Someone showed up in clpm today wanting to find the longest substring common to two strings $s and $t, and while I was thinking about that I got sidetracked into thinking about how to transform "abc" into ('abc', 'ab', 'a', 'bc', 'b', 'c', '') One way is to write an explicit loop and use substr, of course. But it was late, and my mind went down the m//g path, and I realized that m/.*/g wouldn't do it, of course. But this reminded me of a feature in SNOBOL that was useful for similar purposes, and I dug out my SNOBOL book and got sucked in, as I always do, and I came to the same conclusion that I always come to, which is that SNOBOL4 was a remarkably usable language, especially for 1971, and that people should pay more attention to it. It had associative arrays, recursive functions with locally-scoped variables, pattern matching that was better than then Perl's is now. On the other hand, SNOBOL's control flow and syntax are hopelessly 1971. SNOBOL4 has a backtracking pattern matcher very similar to Perl's. The feature that I remembered was a pattern-matching primitive called FAIL, which causes the pattern matcher to fail and backtrack. It doesn't cause a complete failure of the entire match (ABORT does that); rather, it just causes a failure of the current backtracking alternative, so that the pattern matcher backs up and tries the next alternative. The example from the SNOBOL book is &ANCHOR = 0 'MISSISSIPPI' ('IS' | 'SI' | 'IP' | 'PI') $OUTPUT FAIL Matching is SNOBOL is anchored at the beginning and the end unless you specify &ANCHOR=0; this enables matching substrings. 'MISSISSIPPI' is the string to be matched; the rest of the line is the pattern. ('IS' | 'SI' | 'IP' | 'PI') is just like (IS|SI|IP|PI) in Perl. $ is like a backreference; it says that the whatever was matched by the previous pattern component should be stored into the variable OUTPUT, even if the pattern match failed. In SNOBOL, the OUTPUT variable is special: Anything `stored' in OUTPUT is actually printed instead. Without the FAIL, the IS would match and be output, and then the pattern would succeed, and that would be the end of it; the FAIL causes a backtracking to try the next alternative, and so on, so that this pattern ends up printing: IS SI IS SI IP PI The Snobol manual has several examples of how this might be useful: ``In general, the behavior of the scanner during any pattern match may be observed using aa statement of the form STR PAT $OUTPUT FAIL '' This feature alone makes FAIL valuable. You may want to stop reading at this point; an extended example follows, in which FAIL figures only peripherally. ---------------------------------------------------------------- [ Extended yakking about how to write a parser in SNOBOL omitted here. -D ] Mark-Jason Dominus mjd@plover.com Return-Path: owner-perl5-porters@perl.org Return-Path: Delivered-To: mjd-p5p@plover.com Message-ID: <19980416142034.2992.qmail@plover.com> To: perl5-porters@perl.com cc: "Moore, Paul" , ilya@math.ohio-state.edu Subject: Re: Pattern matching in SNOBOL4 (long, digression) In-reply-to: Your message of "Thu, 16 Apr 1998 10:49:54 BST." Date: Thu, 16 Apr 1998 10:20:34 -0400 From: Mark-Jason Dominus Sender: owner-perl5-porters@perl.org Precedence: bulk X-Loop: Perl5-Porters Paul: > Can the regexp in (?!regexp) be omitted, then? [I just tested and it > looks like it!] So that would mean that a(?!) matches any occurrence of > a which "isn't followed by " (paraphrasing perlre.pod) - ie, a > then FAIL. > > Brilliant. Maybe a special note in the pod to mention this usage (for > SNOBOLers) would help. I don't think so, because at present it's useless. I even tried (?!) yesterday while I was messing around, but because of the other missing features, I didn't recognize that it was what I wanted. Ilya: > >When we have a patch-receptive pumpking (will we ever?), $& and > >friends will work in (?e ), so > > > > 'MISSISSIPPI' =~ /(is|si|ip|pi)(?e print $1 )(?!)/ > > > >will be the Perlian way. Oh, that's good. [ More SNOBOL, and a recounting of my dream about Ilya, omitted here. -D. ]