=head1 Lambda Calculus When computer scientists want to study what is computable, they need a model of computation that is simpler than real computers are. The usual model they use involves a I, which has the following parts: =over =item 1 One I which can hold a single number, called the I; the state register has a maximum size specified in advance. =item 2 An infinite tape of memory cells, each of which can hold a single character, and a read-write head that examines a single square at any given time. =item 3 A finite program, which is just a big table. For any possible number I in the register, and any character in the currently-scanned memory cell, the table says to do three things: It has a number to put into the register, replacing what was there before,; it has a new character to write into the current memory cell, replacing what was there before, and it has an instruction to the read-write head to move to the next cell, the previous cell, or to stay still, =back This may not seem like a very reasonable model of computation, but computer scientists have exhibited Turing machines that can do all the things you usually want computers to be able to do, such as performing arithmetic computations and running interpreter programs to simulate the behavior of other computers. They've also showed that a lot of obvious `improvements' to the Turing machine model, such as adding more memory tapes, random-access memory, more read-write heads, more registers, or whatever, don't actually add any power at all; anything that could be computed by such an extended machine could also have been computed by the original machine, although perhaps slowly. Finally, a lot of other totally different models for computation turn out to be equivalent in power to the Turing machine model. Each of these models has some feature about it that suggests that it really does correspond well to our intuitive idea of what is computable. One such model is the I, invented by Alonzo Church. Lambda Calculus is intended to capture, in a very simple way, the idea of creating a function and then calling it on an argument. Miraculously, the Lambda Calculus model is both simpler than the Turing machine model and more practical for real computation. The programming language Lisp, for example, is little more than Lambda Calculus in disguise---and not a very disguising disguise, either. The Lambda Calculus model of computation is very, very simple. There are only two operations. You can create a fucntion of one argument with a specified body, and you can apply one of these functions to an argument. The first operation is denoted like this: %x.B where B is the body and x is the formal parameter of the function. The % is usually written with a lowercase letter lambda, but my typewriter has no lambda, so I'll use % instead. B here is some arbitrary expression. When the function is invoked on some argument A, you locate the occurrences of the formal parameter x inside the body B, and you replace each X with a copy of the argument A. This is just what any language does when it calls a function: It looks in the body of the function, and replaces copies of the formal parameter with the actual argument. For example, here's a function: %x.(x x y) If we apply this function to the argument (p q), we get ((p q) (p q) y) To denote function application, we just juxtapose the function and its argument. So the function application we just discussed is denoted this way: (%x.(x x y))(p q) We say that this expression I to the simpler expression ((p q) (p q) y) Function application associates to the left, so that a b c is an abbreviation for (a b) c here we apply the function a to the argument b, and then apply the result to the argument c. If an expression has no applications in it, it can't be reduced any further, and is said to be in I. There might be several ways to reach a normal form. For example, consider the expression: (%x.p x)((%y.b) q) we can reduce the %y part first, yielding (%x.p x)b and then p b or we can reduce the %x part first, yielding p((%y.b) q) p b In this case, both paths led to the same place. In 193X, Church and Rosser proved that no expression has more than one normal form, and so any two sequences of reductions that end with a normal form give you the same normal form. This means that we can consider the normal form to be the `value' of an expression: It's what's left when we finish evaluating it, and it doesn't matter how we carry out the steps in the evaluation. If two expressions have the same normal form, we say they're I. We'll denote equivalence by C<-=->. Some expressions have no normal form: (%x.x x)(%x.x x) This expression reduces to itself; you never get to a normal form. Now what does all this have to do with real computation? For computation we need several things: We need boolean constants and an if-then test. We need numbers, and arithmetic, and it would be nice to have some data structures too. =head1 Currying At first it's not clear that we're going to be able to accomplish this. We only have functions of one argument; how are we going to express addition, which is a function of two arguments? It turns out that the restriction to one-argument functions is no restriction at all. That's because we can play a trick called I. First, imagine the function f3 that takes one argument and adds 3 to it. Now imagine the similar function f4 that takes one argument and adds 4 to it. Now imagine all the other functions in this family. certainly they're all functions of one argument. Our `add' function is going to take one argument. If that argument is 3, it'll return f3. If the argument is 4, it'll return f4. if it's some other number, add will do the analogous thing. Now let's see what happens when we try to evaluate (add 3 4) This is just short for ((add 3) 4) We already said that (add 3) will produce f3: (f3 4) f3 is the function that adds 3 to its argument, so the result of this is 7 that's just what we wanted---we put in 3 and 4 and got 7. In Perl, we could express it this way: sub add { my $x = shift; my $f = sub { my $y = shift; return $x + $y; } return $f; } C gets an argument C<$x>, and constructs a function, C<$f>. C<$f> is the function that gets a number C<$y>, adds C<$x> to it, and returns the result. C itself returns C<$f>. When you apply C<$f> to a number C<$y>, it adds C<$x> to it; when you apply C to a number C<$x>, you get C<$f>. In perl the notation is a little cumbersome; C actually returns a code reference, and to call it you have to use the C<&{...}(...)> syntax, so it looks like this: $sum = &{add(3)}(4); # Sum is 7 In perl 5.004 and later, there's a tidier notation, which we'll use from now on: $sum = add(3)->(4); # Sum is 7 If $add is a reference to the add function itself, it looks like this: $sum = $add->(3)->(4); # Sum is 7 As promised, we have an addition function that is made of functions of only one argument. =head1 Logic Now back to general programming. Our goals are to develop Lambda Calculus versions of if-then tests, boolean and integer constants, arithmetic operators, and some simple data structures. The if-then test is most fundamental. We cna imagine that `if' is a function of three arguments. The first is a boolean value; the second is the `then' clause, and the third is the `else' clause. The value of this expression IF b THEN x ELSE y is x if b is true, and y if b is false. So what we'd really like is to find three functions, IF, TRUE, and FALSE, such that IF TRUE x y -=- x IF FALSE x y -=- y for all x and y. Then we can declare that an expression has a boolean value if it is equivalent to TRUE or FALSE, and the Church-Rosser theorem tells us that if E x y will evaluate to x whenever the expression E has a true value and to y whenever E has a false value. This is just what we want. The definitions turn out to be very simple: IF %b.(%x.(%y.b x y)) TRUE %a.(%b.a) FALSE %a.(%b.b) Let's see if those identities are satsified: IF TRUE A B we'd like this to reduce to A, and it does: %b.(%x.(%y.b x y)) %a.(%b.a) A B %x.(%y.(%a.(%b.a))x y) A B %y.(%a.(%b.a))A y) B %a.(%b.a) A B %b.A B A We win. The demonstration that FALSE works is almost the same. =head1 Data Structures Now let's try for a small data structure. The simplest data structure is the ordered pair. We'll define a C function that takes two values and combines them into one, from which the original values can still be extracted. In lisp this operation is called I, although the object produced by I is in fact called a `pair'. We want to have PAIR, FIRST, and SECOND functions such that FIRST (PAIR A B) -=- A SECOND (PAIR A B) -=- B The insight here is that we can use a partially-evaluated C construction. To make the pair EA,BE, we'll construct %f.IF f A B Then C will just supply a true value for f to select A, and C will supply a false value to select B: PAIR = %a.(%b.(%f.f a b)) FIRST = %p.(p TRUE) SECOND = %p.(p FALSE) Let's do a quick example: We want SECOND (PAIR A B) to evaluate to C: SECOND (PAIR A B) %p.(p FALSE) (PAIR A B) %p.(p FALSE) (%a.(%b.(%f.f a b)) A B) %p.(p FALSE) ( %b.(%f.f A b) B) %p.(p FALSE) ( %f.f A B ) (%f.f A B) FALSE FALSE A B %a.(%b.b) A B %b.b B B Which is what we wanted. As Lisp programmers know, with pairs you can contruct almost any interesting data structure. For example, a linked list is a sequence of pairs, each of whose C elements is the pair that represents the `rest' of the list. =head1 Numbers Now we're in good shape to do numbers. Integers have a few fundamental properties: There's a C element, and an C function to recognize it. Every number has a successor (the next number), which is generated by the C function. And finally, every number except zero has a predecessor, computed by the C function. These four values must satisfy the following properties: IS_ZERO ZERO -=- TRUE IS_ZERO (SUCC x) -=- FALSE PRED (SUCC x) -=- x There are a few other properties they should satisfy: if C and C are different numbers, then C<(SUCC x)> and C<(SUCC y)> should be different also, and so on. But these three are enough for now. Here's our technique: We'll represent each number I as a list of length I. This means that a number I will be a pair; the C part of the pair will be the number I. The C part of the pair can be arbitrary, so we might as well choose it usefully: The C part of C will be C, and the C part of every other number will be C. With that definition, it's easy to recognize C. ZERO = PAIR TRUE TRUE SUCC = %x. PAIR FALSE x IS_ZERO = %x. FIRST x PRED = %x. SECOND x Now, for example, we can construct the number C: ONE = SUCC ZERO TWO = SUCC ONE and so on. It turns out that these few operations are enough to define any arithmetic function on the natural numbers. For example, here's the addition function: sub add { my ($m, $n) = @_; if (IZERO($m)) { return $n } else { return add(PRED($m),SUCC($n)) } } All the tools we need here are available: We can construct functions that bind arguments and return values; we have an if-then-else test; we have C and C and C. Similarly, here's the function that computes whether or not two numbers are equal: sub equal { my ($m, $n) = @_; if (IS_ZERO($m)) { return IS_ZERO($n); } else { if (IS_ZERO($n)) { return FALSE; } else { return equal(PRED($m), PRED($n)); } } } Let's stick with C because it's simpler. Rendered into the compact Lambda Calculus notation, it looks like this: ADD = %m.(%n.IF (IS_ZERO m) n (ADD (PRED m) (SUCC n))) There's a problem here, and it's big one: We used C in its own definition. We know what the definition is, but if we try to insert it, the defintiion gets longer and longer and we never get rid of the reference to C. We need some way to express recursion so that we don't have to write infinitely long function definitions, which are impractical. It turns out that we I express recursion in Lambda Calculus, and the method used to do it is really brilliant. =head1 Recursion First we need to discuss the idea of a I. Consider the function I. If you put in 3, you get 9. But if you put in 1, you get 1 back out again. We say that 1 is a `fixed point' of the function I. You can think of I as shuffling the numbers around, sending 2 to 4, 3 to 9, and soforth, but it leaves 1 fixed in the same place. Now let's consider a function where you put in one function and get another one out. Specifically, let's consider this function, which we'll call R: R = %g.(%m.(%n.IF (IS_ZERO m) n (g (PRED m) (SUCC n)))) You put in a function I, and you get out a new function. For example, if we put in the identity function C<%q.q>, we get out %m.(%n.IF (IS_ZERO m) n (%q.q (PRED m) (SUCC n))) %m.(%n.IF (IS_ZERO m) n ((PRED m) (SUCC n))) which doesn't actually make much sense. But it is a function. Now, we wanted to define the C function like this: ADD = %m.(%n.IF (IS_ZERO m) n (ADD (PRED m) (SUCC n))) Now, it's easy to see that R ADD = %m.(%n.IF (IS_ZERO m) n (ADD (PRED m) (SUCC n))) just expand the left-hand side once and you get the right-hand side. So if C is the addition function we are looking for, then ADD = R ADD This means that if we were able to find the C we wanted, it would be a fixed point of C, and conversely, if we could find a fixed point of R, it would have to be the addition function we were looking for. How does this help us? We have no idea how to find fixed points, especially fixed points of big hairy functions like C. Here's the magic: There is a way to find a fiuxed point for any function in Lambda Calculus. In fact, there's a function, C, that computes one. Given any function C, C C is a fixed point of C. That is, for any C, Y f = f (Y f) because C<(Y f)> is a fixed point of C. It's not at all obvious that such a function C could possibly exist, and yet one does exist. All that remains is to exhibit this magical fixed point function C that computes the fixed point for any function whatsoever. And here it is: Y = %f.((%x.f (x x))(%x.f (x x))) let's try applying C to some function C and see what we get. We hope that C<(Y f)> will be a fixed point for C, so that C<(Y f)> = C. Let's see: Y f = %f.((%x.f (x x))(%x.f (x x))) f = (%x.f (x x)) (%x.f (x x)) = f ((%x.f (x x)) (%x.f (x x))) = f (Y f) Miraculously, this is exactly what we wanted! C will find a fixed point for any function we give it. Since the addition function we want is just the fixed point of C, we have ADD = Y R and we're done. And yes, if you expand the definitions and grovel over the substitutions, you do indeed find that (ADD ONE ONE) reduces to C, as follows: ((Y R) ONE ONE) (((%x.R (x x))(%x.R (x x))) ONE ONE) ((R (%x.(R (x x)) %x.(R (x x)))) ONE ONE) ((%m.(%n.IF (IS_ZERO m) n ((%x.(R (x x)) %x.(R (x x))) (PRED m) (SUCC n)))) ONE ONE) (IF (IS_ZERO ONE) ONE ((%x.(R (x x)) %x.(R (x x))) (PRED ONE) (SUCC ONE))) (IF ((%n.(FIRST n)) (SUCC ZERO)) ONE ((%x.(R (x x)) %x.(R (x x))) (PRED ONE) (SUCC ONE))) (IF (FIRST (SUCC ZERO)) ONE ((%x.(R (x x)) %x.(R (x x))) (PRED ONE) (SUCC ONE))) (IF (FIRST (PAIR (FALSE ZERO))) ONE ((%x.(R (x x)) %x.(R (x x))) (PRED ONE) (SUCC ONE))) (IF FALSE ONE ((%x.(R (x x)) %x.(R (x x))) (PRED ONE) (SUCC ONE))) ((%x.(R (x x)) %x.(R (x x))) (PRED ONE) (SUCC ONE)) ((%x.(R (x x)) %x.(R (x x))) (SECOND (PAIR FALSE ZERO)) TWO) ((%x.(R (x x)) %x.(R (x x))) ZERO TWO) (R ((%x.(R (x x)) %x.(R (x x)))) ZERO TWO) ((%m.(%n.IF (IS_ZERO m) n ((%x.(R (x x)) %x.(R (x x))) (PRED m) (SUCC n)))) ZERO TWO) (IF (IS_ZERO ZERO) TWO ((%x.(R (x x)) %x.(R (x x))) (PRED ZERO) (SUCC TWO))) (IF (FIRST (PAIR TRUE TRUE)) TWO ((%x.(R (x x)) %x.(R (x x))) (PRED ZERO) (SUCC TWO))) (IF TRUE TWO ((%x.(R (x x)) %x.(R (x x))) (PRED ZERO) (SUCC TWO))) TWO =head1 Function Call Semantics Before we go on to the punch line of the article, there's one subtle matter that we have to see in detail. The problem is that at any particular step, there might be any number of ways to reduce a given Lambda Calculus expression. and we have to pick the right one. The Church-Rosser theorem says that no matter how we do it, the normal form that we get at the end will be the same one---I we actually reach a normal form. But if we follow the wrong path, we might never reach any normal form at all. For example, consider this: (%x.Q) ((%y.y y)(%y.y y)) Here we have two choices: We can apply the leftmost function C<%x.Q> to its argument, yielding the normal form, C, immediately. But if instead we try to evaluate the argument first, we don't get anywhere. A Perl analogue of the same thing would be something like this: sub loop_forever { while (1) { 1; } } sub return_Q { my $arg = shift; return "Q" } In Perl, the expression return_Q(loop_forever()) does indeed loop forever, even though C never uses its argument, because Perl has I semantics: This means that a function's arguments are always fully evaluated before the function is called. The other option is I, in which the argument is passed to the function I being evaluated, and is only evaluated later if it is required. You might think that since call-by-value is workable for Perl, it will work in practice in Lambda Calculus also. But you'd be wrong: call by value I always work in Perl; there are some essential places where Perl (and every other language) uses call by name instead. To see why, suppose you were to try to write your own function to replace Perl's C construction: sub my_if { my ($condition, $then_part, $else_part) = @_; if ($condition) { $then_part } else { $else_part } } Then you'd write $max = my_if($x > $y, $x, $y); and in fact this works, for this example. But it doesn't work in general: sub factorial { my $n = shift; return my_if($n == 0, 1, $n*factorial($n-1)); } =for MJD A better example here is to use something like my_if(..., 1, die); This function loops forever on I input. When I=0, it's important that we I make another recursive call---but this function I try to evaluate C<0*factorial(-1)> and pass this result to C; it doesn't matter that C would eventually have chosen the other branch, which was 1; with call by value we have to compute both branches I we can ask which one we want. And that defeats the whole point of C which is to compute one or the other, but not both. =head Delayed Action What we'd like to do is somehow express C in a way that delays the evaluation of I and I until after the C has selected one or the other of them. As we saw in my lazy streams article back in TPJ #7, the simplest way to delay evaluation is to wrap up C<$x> and C<$y> into functions: %q.x does I evaluate C, but rather creates a function that will evaluate it sometime in the future, when we apply the function to an argument; the argument is ignored, and out pops C. Instead of writing C<(IF b x y)> this way: (%b.(%x.(%y.b x y))) b x y as we have been doing, we'll write it this way: (%b.(%x.(%y.b x y))) b (%q.x) (%q.y) q The C part will yield either C<%q.x> or C<%q.y>, and then when we apply that to the dummy argument C, we'll have our result. To see how this works, let's suppose C is C: (%b.(%x.(%y.b x y))) FALSE (%q.x) (%q.y) q (%x.(%y.FALSE x y)) (%q.x) (%q.y) q (%y.FALSE (%q.x) y) (%q.y) q (FALSE (%q.x) (%q.y)) q ((%x.(%y.y)) (%q.x) (%q.y)) q ( (%y.y) (%q.y)) q ( (%q.y)) q y =head1 Punch Line OK, now we're finally ready for the punch line of the entire article. By now I bet you're expecting me to write a Perl program that evaluates Lambda Calculus expressions. Nope, wrong. Here's the punch line: Perl already has that feature built-in. If you wanted to investigate Turing machines, you'd have to start by writing a simulator; then you'd have to write Turing machine programs to perform functions like addition and multiplication, and that soon becomes impossibly difficult. But to investigate Lambda Calculus, you don't need a simulator. Lambda calculus isn't a machine; it's more like a programming language. In fact, it I a programming language. It's programming language with a syntax for defining functions and invoking them, and nothing else. The language is so simple that you can view it as a subset of Perl. =head2 Lambda-expressions in Perl If we want to write C<%x.B>, we'll just say sub { my $x = shift; B } C<%x.B> means to create a function, which, when applied to an argument I, substitutes it into C in place of C, and yields the result. And that's exactly what C does do. In Lambda Calculus, C<(p q)> means to apply a function C

to an argument C; Perl has that too. We'll just say: $p->($q) That's all we need; we can translate our Lambda Calculus expressions directly into Perl and have the Perl interpreter evaluate them immediately. Only the syntax is different, and not even very different at that. So, for example, the straightforward Perl translation of IF = %b.(%x.(%y.(b x) y)) is: $IF = sub { my $b = shift; sub { my $x = shift; sub { my $y = shift; $b->($x)->($y); } } } (Provided you also use the trick above to delay evaluation of C and C.) Here are a few other basic definitions: # TRUE = %x.(%y.x) $TRUE = sub { my $x = shift; sub { my $y = shift; $x; } } # FALSE = %x.(%y.y) $FALSE = sub { my $x = shift; sub { my $y = shift; $y; } } Let's try that out to see that it works right: print $IF->($TRUE)->("then")->("else"); (prints "then") print $IF->($FALSE)->("then")->("else"); (prints "else") We didn't need to use the delayed-evaluation trick here because it didn't matter that the strings C and C were evaluated before the C was called. Here are some other definitions, translated into Perl directly from the Lambda Calculus: # PAIR = %a.(%b.(%f.f a b)) $PAIR = sub { my $a = shift; sub { my $b = shift; sub { my $f = shift; $f->($a)->($b); } } } ; # FIRST = %p.(p TRUE) $FIRST = sub { my $p = shift; $p->($TRUE); } ; # SECOND = %p.(p FALSE) $SECOND = sub { my $p = shift; $p->($FALSE); } ; # ZERO = PAIR TRUE TRUE $ZERO = $PAIR->($TRUE)->($TRUE); # IS_ZERO = %n.(FIRST n) $IS_ZERO = sub { my $n = shift; $FIRST->($n); } ; =head2 Debugging Lambda Calculus Programs in Perl Let's check to make sure that this works: print $IS_ZERO->($ZERO); # prints CODE(0x81179cc) Oops! This anonymous function that we got might be the C function C<%x(%y.x)>, or it might be something else, and there's no simple way to peek inside to see. Let's define a utility function that analyzes a Lambda Calculus boolean and tells us which it is: sub print_bool { my $f = shift; print $IF->($f)->("It's true")->("It's false"); } print_bool($IS_ZERO->($ZERO)); # prints "It's true" Now we go on with the arithmetic: # SUCC = %n.(PAIR FALSE n) $SUCC = sub { my $n = shift; $PAIR->($FALSE)->($n); } ; # ONE = SUCC ZERO $ONE = $SUCC->($ZERO); # TWO = SUCC ONE $TWO = $SUCC->($ONE); print_bool($IS_ZERO->($ONE)); # prints "It's false" # PRED = %n.(SECOND n) $PRED = sub { my $n = shift; $SECOND->($n); } ; Here's a utility function that converts Lambda Calculus numbers into Perl numbers: sub convert_to_perl_number { my $n = shift; return "OOPS($n)" unless ref $n eq 'CODE'; my $o = 0; until ($IF->($IS_ZERO->($n))->(1)->(undef)) { $o++; $n = $PRED->($n); } $o; } sub print_number { print "If this is a number, its value is ", convert_to_perl_number(shift()), "\n" ; } print_number($SUCC->($TWO)); # prints: "If this is a number, its value is 2." The peculiar C<($IF-E($IS_ZERO-E($n))-E(1)-E(undef))> expression tests a Lambda Calculus number with C<$IS_ZERO>, then converts the Lambda Calculus-style boolean result into a Perl-style boolean value, either 1 or C, for use in the C loop. =head2 Lambda Arithmetic in Perl Now we're ready to try addition. Addition is recursive, and we need a version of the C operator that will work under a call-by-value semantics. The existing one won't, because it wants to make the recursive call right away; we need to delay this. We can solve this problem the same way we did for C: Instead of Y = %f.((%x.f (x x))(%x.f (x x))) we'll use YM = %f.(%x.(%y.f (x x)))(%x.(%y.f (x x))) This C operator is a little different from the C operator. Instead of generating the fixed point immediately, as C does: Y f = f (Y f) It delays the computation, and performs it on demand: YM f = %y.f (YM f) Then when you actually want to make the recursive call to C, you apply the result to some dummy argument C, which is thrown away. $YM = sub { my $f = shift; (sub { my $x = shift; sub { my $y = shift; $f->($x->($x)); }; })-> (sub { my $x = shift; sub { my $y = shift; $f->($x->($x)); }; }) } ; =head2 Forcing a Delayed Computation For documentation purposes, we'll name the dummy argument C. It doesn't matter what it is, so let's use the simplest thing we can: C<%x.x>. $FORCE = sub { my $x = shift; $x }; Now here's our addition function again, with delays and forces to make it work with call-by-value: # R = %g.(%m.(%n.( # (IF (IS_ZERO m) # (%q.n) # (%q.((g FORCE) (PRED m) (SUCC n)))) # FORCE # ) # )) $R = sub { my $g = shift; sub { my $m = shift; sub { my $n = shift; $IF->($IS_ZERO->($m)) ->(sub { my $q = shift; $n} ) ->(sub { my $q = shift; ($g->($FORCE))-> ($PRED->($m))-> ($SUCC->($n)); })->($FORCE); } } } ; If we apply C to C, the result is a recursive addition function; we have to force the result because C always delays delivering its result: $ADD = $YM->($R)->($FORCE); =head2 We Win! Now let's give it a try: print_number($ADD->($ONE)->($ONE)); # prints: "This number is actually 2." You are now invited to execute the program in C and verify that this is indeed what it prints out.