Wednesday, April 22, 2009

Python-style generators in Scheme

I have finally gotten back to working on interface injection in OpenJDK. As before I get a few breaks when I wait for stuff to compile. These breaks are too long to just sit around and wait, but too short to context switch to another project. So what I end up doing instead are small hacks. And since I, once again, have decided to try and post stuff here more regularly, I decided to write a post about a hack I did yesterday.

A long time ago I tried to implement Python style generators in Scheme, and obviously I am a better programmer now since I was able to just sit down and write it.

The aim is to be able to write code like this:

(define-generator (iterate lst)
(define (iter lst)
(if (not (null? lst))
(begin (yield (car lst))
(iter (cdr lst)))))
(iter lst))

And then use such a generator in a Python-like for-loop, like so:

(for (element (iterate '(a b c))
(print element))

The code that enables this is at it's core two syntax definitions. The generator definition syntax, and the for-loop syntax:

(define-syntax explicit-generator-definition
(syntax-rules (yield)
((explicit-generator-definition (yield exit) params . body)
(lambda params
(lambda (done loop)
(define (exit . values)
(call/cc (lambda (next) (apply loop (cons next values)))))
. body)))))

(define-syntax for
(syntax-rules (break continue)
; multiple valued generator version
((for (break breaker) (continue continuation)
((variables ...) generator) . body)
(lambda (breaker) (generator breaker
(lambda (continuation variables ...)
. body)))))
; single valued generator version
((for (break breaker) (continue continuation)
(variable generator) . body)
(lambda (breaker) (generator breaker
(lambda (continuation variable)
. body)))))
; enable ignoring break and continue
((for (continue continuation) (break breaker) (target ...) . body)
(for (break breaker) (continue continuation) (target ...) . body))
((for (break breaker) (targets ...) . body)
(for (break breaker) (continue ignored) (targets ...) . body))
((for (continue continuation) (targets ...) . body)
(for (break ignored) (continue continuation) (targets ...) . body))
((for (targets ...) . body)
(for (break _break) (continue _continue) (targets ...) . body))))

As you can see the loop construct has the ability to break the loop and jump to the next iteration by explicitly naming the break and continue continuations. The fact that these have to be named explicitly is a good thing since it enables you to define different names for the break/continue continuations for different loops when you nest loops. You can of course name them break and continue, respectively, like so:

(for (break break) (continue continue) (element (iterate '(a b c))
(print element))

The generator definition also needs the yield continuation to be named explicitly. This is usually not what you want, so instead we define a convenience syntax for defining generators in the form of a "e;unhygienic"e; macro (this is the syntax used for unhygienic macros by most scheme implementations, PLT scheme that I used for example):

(define-macro (generator arguments . body)
`(explicit-generator-definition (yield yield) ,arguments ,@body))

(define-macro (define-generator signature . body)
`(define ,(car signature)
(explicit-generator-definition (yield yield) ,(cdr signature) ,@body)))

These macros was what prevented me from completing this when I first tried a few years ago. I started with them, and wanted them to be portable across all Scheme implementations, and unhygienic macros are not defined in the R5RS standard. What I should have done is what I did now, start with the interesting stuff and add just a little implementation dependent code for the final touches.

Now we can use these macros to define a range function that behaves in about the same way as the xrange function in Python:

(define-generator (range . params)
(define (range start stop step)
(if (and (> start stop) (> step 0))
(values start (lambda (current) (<= current stop)) (- step))
(values start (lambda (current) (>= current stop)) step)))
; parse the variables based on the number of them
(lambda () (cond
; 0 arguments ->
((null? params)
; infinite generator from 0
(values 0 (lambda (current) #f) 1))
; 1 argument: count ->
((null? (cdr params))
; generator with count elements, starting at 0
(range 0 (car params) 1))
; 2 arguments: start and stop ->
((null? (cddr params))
; generator over [start, stop[
(range (car params) (cadr params) 1))
; 3 arguments: start, stop and step ->
((null? (cdddr params))
; generator over [start, stop[ with a given step length
(range (car params) (cadr params) (caddr params)))
; more arguments - error (by passing too few arguments to internal range
(#t (values))))
(lambda (start stop? step)
(do ((current start (+ current step)))
((stop? current))
(yield current)))))

As a final touch we add a convenience syntax for nested for-loops as well:

(define-syntax for*
(syntax-rules (break continue)
; base case - one generator - expands to single for
((for* (((break breaker) (continue continuation) targets ...)) . body)
(for (break breaker) (continue continuation) (targets ...) . body))
((for* (((continue continuation) (break breaker) targets ...)) . body)
(for (break breaker) (continue continuation) (targets ...) . body))
((for* (((break breaker) targets ...)) . body)
(for (break breaker) (targets ...) . body))
((for* (((continue continuation) targets ...)) . body)
(for (continue continuation) (targets ...) . body))
((for* ((targets ...)) . body)
(for (targets ...) . body))
; recursive case - more than one generator - expands to nested for*
; for* is used instead of for to avoid repeating break and continue cases
((for* (first more ...) . body)
(for* (first) (for* (more ...) . body)))))

To understand how the generators work let's look at what a generator definition and for-loop construct expands to:

; A generator expands to a function that accepts the defined parameters
(lambda params
; The generator function, when called, returns a function to which
; the for-construct passes the break continuation and a closure
; representing the body of the loop
(lambda (done loop)
; yield is defined as a function that takes arbitrary parameters
(define (exit . values)
; when invoked it retrieves the current continuation to be
; able to resume then calls the loop body with the continuation
; and the passed in parameters as arguments
(call/cc (lambda (next) (apply loop (cons next values)))))
; The actual body of the generator is defined by the user,
; this executes arbitrary code and calls yield.
. body))

; The for-construct starts by retrieving the current conitunuation
; this is used for breaking the loop
; It then calls the generator with two arguments:
; the break continuation and the body of the loop.
(lambda (breaker) (generator breaker
; The body of the loop accepts two arguments:
; the continue continuation and the argument that was passed to yield.
(lambda (continuation variable)
; All it does is execute the body
. body))))

Happy Hacking!