Class notes for Wednesday, February 8, 2006

The Y combinator

----------------------------------------------------------------------
Consider the standard recursive factorial function:

(define fact
  (lambda (n)
    (if (= n 0)
      1
      (* n (fact (- n 1))))))

(fact 5) => 120

(define zebra fact)
(zebra 5) => 120
(fact 5) => 120

No surprises here.  All we've done is give the factorial function the
alternative name zebra.  Suppose that now we redefine fact.  We ought
to still be able to use zebra to compute factorials.

(define fact '(apples are red))
(zebra 5) => CRASH!!!!

The (lambda (n) (if (= n 0) ...)) expression above must be named
"fact" in order to work correctly.  "Zebra" won't do.  But why should
functions care about their names, even if they're recursive?  After
all, functions are abstract mathematical objects that are independent
of the mundane world of humans and our proclivity for giving names to
everything we see...

----------------------------------------------------------------------

Step 0...  We wish to write the recursive factorial function without
having to give it a name.

(lambda (n)
  (if (= n 0)
      1
      (* n (??? (- n 1)))))

The problem, of course, is that we can't plug in the function
expression itself directly in place of the ??? because that
immediately leads to an infinite regress.  It's like trying to quote
an entire sentence inside the sentence itself.

----------------------------------------------------------------------

Step 1...  This is the only step in the entire derivation that
requires any real thought.  We can get around the above problem by
passing in a *copy* of the factorial function as an extra argument and
then using that in place of the ???.  But we must ensure that the copy
of our function is identical to the function itself at all times.  We
will adhere to this requirement in every step that follows.

The lambda expression below does not refer to the name fact.  Notice
that since f is a copy of the function, and the function takes a copy
of itself as a second argument, we must also pass f to f.  Passing a
copy of a function to itself is the whole secret of the Y combinator.

(define fact
  (lambda (n f)
    (if (= n 0)
	1
	(* n (f (- n 1) f)))))

(fact 5 fact) => 120

------------------------------------------------------------------------

Step 2...  We just switch the order of the function arguments, n and f.
(f (- n 1) f) changes to (f f (- n 1)), and the arguments to the top-level
invocation also switch places:

(define fact
  (lambda (f n)
    (if (= n 0)
	1
	(* n (f f (- n 1))))))

(fact fact 5) => 120

------------------------------------------------------------------------

Step 3...  We curry the function so that it takes its two arguments
one at a time.  Note the extra set of ()'s around the top-level
invocation as well, which still evaluates to 120.

(define fact
  (lambda (f)
    (lambda (n)
      (if (= n 0)
	  1
	  (* n ((f f) (- n 1)))))))

((fact fact) 5) => 120

-------------------------------------------------------------------------

Step 4...  Notice that fact above isn't really the factorial function
anymore.  It's the function that, when applied to itself, *produces* the
factorial function.  Let's make it the true factorial function again by
applying it to itself:

(define fact
  ((lambda (f)
     (lambda (n)
       (if (= n 0)
	   1
	   (* n ((f f) (- n 1))))))
   (lambda (f)
     (lambda (n)
       (if (= n 0)
	   1
	   (* n ((f f) (- n 1))))))))

(fact 5) => 120

The above application expression is now a self-contained recursive
version of factorial, although still in a clumsy form.

-------------------------------------------------------------------------

Step 5...  Now let's give the subexpression (f f) the name "r" using a
let expression.  Notice how every change to our function requires an
identical change to the copy of the function, as mentioned earlier.

(define fact
  ((lambda (f)
     (let ((r (f f)))
       (lambda (n)
	 (if (= n 0)
	     1
	     (* n (r (- n 1)))))))
   (lambda (f)
     (let ((r (f f)))
       (lambda (n)
	 (if (= n 0)
	     1
	     (* n (r (- n 1)))))))))

Unfortunately, the above expression will produce an infinite loop.
To see why, look at its structure:

(define fact
  ((lambda (f)
     (let ((r (f f)))
       ...))
   <copy>))

We are applying (lambda (f) (let ((r (f f))) ...)) to a copy of itself.
So <copy> gets passed in and bound to f, and then the let expression
immediately executes (f f).  That is, the copy of the function gets
applied to itself again, which causes the let expression to immediately
execute (f f) again, which causes ... an infinite loop.

-------------------------------------------------------------------------

Step 6...  We can stop the madness with a clever trick.  In the same way
that the car function is equivalent to the function (lambda (a) (car a)),
the "(f f)" function is equivalent to the function (lambda (a) ((f f) a)).
The let expression now simply creates the function (lambda (a) ((f f) a))
and binds it to r, without actually executing (f f).  This effectively
delays the evaluation of (f f) until sometime later when r gets invoked.

(define fact
  ((lambda (f)
     (let ((r (lambda (a) ((f f) a))))
       (lambda (n)
	 (if (= n 0)
	     1
	     (* n (r (- n 1)))))))
   (lambda (f)
     (let ((r (lambda (a) ((f f) a))))
       (lambda (n)
	 (if (= n 0)
	     1
	     (* n (r (- n 1)))))))))

Now we're ok again:
(fact 5) => 120

-------------------------------------------------------------------------

Step 7...  Next, we expand the let expressions into their equivalent
lambda forms.  In general, (let ((x val)) body) is equivalent to
((lambda (x) body) val).  This may look complicated but it's not.
Just match up the ()'s carefully.

(define fact
  ((lambda (f)
     ((lambda (r)
	(lambda (n)
	  (if (= n 0)
	      1
	      (* n (r (- n 1))))))
      (lambda (a) ((f f) a))))
   (lambda (f)
     ((lambda (r)
	(lambda (n)
	  (if (= n 0)
	      1
	      (* n (r (- n 1))))))
      (lambda (a) ((f f) a))))))

(fact 5) => 120

-------------------------------------------------------------------------

Step 8...  Now let's give the (lambda (r) (lambda (n) ...)) subexpression
the name "g" using another let:

(define fact
  (let ((g (lambda (r)
	     (lambda (n)
	       (if (= n 0)
		   1
		   (* n (r (- n 1))))))))
    ((lambda (f)
       (g (lambda (a) ((f f) a))))
     (lambda (f)
       (g (lambda (a) ((f f) a)))))))

(fact 5) => 120

-------------------------------------------------------------------------

Step 9...  Next, we replace the let expression for g by its equivalent
lambda form, just like in Step 7, and out pops the applicative-order
Y combinator!  The expression below still represents the self-contained
recursive factorial function, but now it's in a nicer form.  In particular,
the (lambda (g) ...) subexpression is Y:

(define fact
  ((lambda (g)
     ((lambda (f) (g (lambda (a) ((f f) a))))
      (lambda (f) (g (lambda (a) ((f f) a))))))
   (lambda (r)
     (lambda (n)
       (if (= n 0)
	   1
	   (* n (r (- n 1))))))))

(fact 5) => 120

------------------------------------------------------------------------

Step 10...  We just pull out the (lambda (g) ...) subexpression and
call it Y, since it has no free variables (it's a combinator, after all).
Then the expression above for the recursive factorial function can be
rewritten as shown below.  The expression passed to Y is a "template"
for the recursive factorial function.  Instead of "???", we call the
recursive invocation "r", wrap the whole thing with (lambda (r) ...),
and hand it over to Y, which returns a self-contained recursive function.
You can give it a name with define if you want, but you don't have to.

(define Y
  (lambda (g)
    ((lambda (f) (g (lambda (a) ((f f) a))))
     (lambda (f) (g (lambda (a) ((f f) a)))))))

(define fact
  (Y (lambda (r)
       (lambda (n)
	 (if (= n 0)
	     1
	     (* n (r (- n 1))))))))

(fact 5) => 120

Y is a "fixed point" combinator, meaning that it takes a function f
and creates a new function that is a fixed point of f.  A fixed point is
a point such that f(p) = p.  For example, 1 is a fixed point of the
function x^2.  The function e^x is a fixed point of the "differentiate"
function in calculus, which takes a function and returns its derivative.

In general, (f (Y f)) = (Y f).

Whew!

-------------------------------------------------------------------------