Continuations for Curmudgeons
This essay is for people who, in web years, are older than dirt. More specifically, if there was a period of time in which you programmed in a language which did not have garbage collection, then I mean you. For most people these days, that means that you had some experience with a language named C.
It is a pretty safe bet that — despite your deep technical background — you find that you have some genetic defect that makes it completely impossible for you to understand articles with titles like Continuations Made Simple and Illustrated. And for some reason, that bothers you.
If so, then maybe this article is for you. Think of yourself as a frog. Prepare to be boiled.
Local Variables
Let me start with something far, far, away from continuations. Something that you should be comfortable with. It should bring back memories of battles you heroically fought, and mostly won, long ago. This example is in C:
#include <stdio.h> int* f(int n) { int result = n + 1; return &result; } main() { int *n1 = f(1); int *n2 = f(2); printf("%d\n", *n1); }
OK, now let’s walk through this program. First a
function named f
is defined. It takes in one
argument, an integer named n
. It adds one to
that value, and stores the result in a local variable named
result
. Finally, f
returns the
address of result
.
At this point, any self respecting compiler should issue a warning. In fact, gcc does exactly that:
test.c: In function 'f': test.c:5: warning: function returns address of local variable
The problem is that local variables are stored on the
stack. And when f
returns, it pops the stack,
making this memory available for other uses.
In this program, main
calls f
twice. Once with a value of 1
and once with a
value of 2
. Quite likely, the second call will
use the exact same storage layout as the first call, meaning that
the first result
will be overlaid.
In fact, on my machine, this is exactly what happens. When I compile and run this example, I get the following result:
3
If you now look around, you will notice that all the tadpoles that never really programmed in C have jumped out of the pond.
ByValue vs ByReference
The above example could be fixed by using
malloc
. Of course, this would introduce a leak,
so calls to free
would also have to be added.
Aren’t you glad that you are now programming in a language
with garbage collection?
Even so, there still are quirks that will snare the unsuspecting programmer. For this pair of examples, I will use JavaScript. If you like, you can run these in your favorite web browser.
<script> var v = 5; function f() { v = v + 1; return v; } var x = f(); x = x - 2; document.write(f()); </script>
In this script, v
starts out at 5. Function
f
is called, and v
gets incremented, and
the result gets returned and stored in a variable named x.
That variable is then decremented by two. Then f
is called again, v
get incremented again, and the
result is printed.
When I run this I get
7
Now, lets change this example slightly.
<script> var v = [5]; function f() { v[0] = v[0] + 1; return v; } var x = f(); x[0] = x[0] - 2; document.write(f()[0]); </script>
The only difference between this script and the first script is
that v
is an array of length one. However, this
difference is significant, because arrays are allocated on the
heap, so v
is really just a reference to the
array. And after the first call, x
is a
reference to the same array, so when the value at index
0
is decremented by two, this affects the value
referenced by v
.
Sure enough, when I run the new script, I get
5
Inconsistent? Yes. Strange? Yes. But generally, this doesn’t cause too much problems. In any case, you avoid SEGVs as things allocated on the heap never go out of scope. They simply become eligible to be reclaimed when there is nothing left which references them.
If you are still with me, you don’t realize it yet, but you are about to become dinner.
Frames
Let’s recap.
While a stack can be seen as a consecutive sequence of bytes, in most languages there are either conventions (which are not enforced) or policies (which are) which prevent a function from “popping” off more data than they pushed onto the stack. In other words, each active scope (e.g., a function) is separated from one another much in the same way as the little bars that you put in between groceries on the conveyor belt in line separate each person’s purchases.
A procedure call pushes the return address (the address of the instruction after the call instruction) onto the stack, along with any arguments, and creates a new frame. A return statement uses this information to set the Instruction Pointer and Stack Pointer back to what it was as the point of call.
Within the procedure, arguments and local variables may be located based on their offset from the start of the stack segment.
Hashes
Everything so far has been very traditional computer science. Now, let’s start to diverge from that a bit. Instead of viewing a stack as a contiguous sequence of bytes, let’s view a stack as a linked list of frames. And instead of viewing a frame as a contiguous sequence of bytes, let’s substitute a dictionary or hashtable.
Now instead of knowing that this data resides at a predetermined
offset in the frame, you would look up x
simply by
looking up “x
” in the frame’s hash
table.
In case you didn’t notice, this organization of frames is more oriented towards dynamic languages.
callcc
Now, let’s revisit the notion of a procedure call with
this organization of frames. A procedure call creates a new
frame, and stores the current context (consisting of both the
instruction pointer and the stack pointer) in the new frame.
The syntax for creating a new frame and storing the current context
into a global variable named $cc
in Ruby is:
callcc{|$cc|}
The syntax for returning to this exact point, restoring both the instruction pointer and stack pointer, is:
$cc.call
This is naturally called a continuation, as you are continuing where you left off.
At this point, you are probably feeling a little bit disoriented. But go back and reread the above few paragraphs and examples. Other than the last examples being in a foreign language, all the words make sense. You are probably wondering “is that all there is?” and “what’s the big deal?”. If so, the answers are “pretty much yes”, and “read on”.
One thing to note here is that as long as $cc
retains its current value, the stack frame it refers to can not be
reclaimed by garbage collection. Frames themselves are not
automatically freed upon encountering return
statements; instead they are subject to garbage collection.
Simply by “continuing” at a prior point in the
execution, and creating a new continuation, call
“stacks” become arbitrary trees.
Generators
At this point, discussions generally veer off into either contrived or complicated examples. Let’s instead take a simple example where the machinery for continuations is not quite so raw and exposed: Python generators. Here is an example that prints Fibonacci numbers less than 1000:
def fib(): yield 0 i,j = 0,1 while True: yield j i,j = j,i+j for n in fib(): if n>1000: break print n
Instead of return
statements, the fib function uses
yield
statements. This yields one
number at a time out of the inexhaustible supply of Fibonacci
numbers. The caller simply invokes that continuation as many
times as desired, and all program state (instruction pointer, local
variables) are restored, and the fib
function starts
back up where it left off.
Once the loop is exited, the state if the fib
function is eligible for reclaiming.
Cool, eh?
If you try to recode this without continuations, you will find that you will need to return a “token” which contains all the relevant state, and that token will need to be passed back in on subsequent calls. That’s pretty much what is happening under the covers. (Note: you may be tempted to use global variables instead. Don’t. Your code would no longer be reentrant, and if there were two callers, they would get confused).
Cocoon
Now let’s look at a
web application, this time in JavaScript. A single
function
implements the entire form based application
that involves the display of three (or four, depending on the
options selected) web pages. This is accomplished via the
cocoon.sendPageAndWait
primitive.
This style of programming is quite different than the resource oriented approach advocated by REST. Some, however, find it to be more intuitive.
This hearkens back to the days in Basic when you coded things like
10 print "Enter your name" 20 input a$ 30 print "Hello, "; a$; "!"
When the computer got to the input
statement, the
operating system would suspend your program until the user provided
an input. At which point, your program would be paged back
into memory and would pick up exactly where it left off.
SAX
A SAX parser is an
example of a task that could be recast as a generator. It
consumes an InputSource
byte by byte and occasionally
emits startElement
, endElement
,
characters
and other events. After each event,
the parser picks back up where it left off.
While this organization is convenient for parser writers, many consumers find XML Pull Parsers easier to deal with.
Judicious application of continuations allows both constituencies to be satisfied. From the user of the library’s perspective, they are using a pull parser. From the person who codes up the SAX handler, each event simply invokes a continuation. There is no memory bloat like you see with DOM style parsers. Everybody is happy.
UI
Event programming, in general, is for wizards. These wizards live and breathe events. But even they regress every once and a while. You can tell when this happens when you see a modal dialog box.
With User Interface Continuations, even scripters can become £337 h4x0r UI gods. Without a modal dialog box in sight.
Closures
Consider the following JavaScript:
var msg = "Hello, world!"; setTimeout(function(){alert(msg)}, 1000);
The use of setTimeout
is common in JavaScript based
animation. The first parameter is a function to be
invoked. The second parameter is the number of
miliseconds to wait until this string is executed.
The question you should be asking yourself is “in which
context is this function to be evaluated?” In particular,
how does JavaScript go about looking for the value of
msg
?
The answer is simple: in the current context. This, it turns out, is a closure, continuation’s more general cousin. Wrapped up in a nice neat little package.
Continuations are bookmarks — they tell you where to pick up after you last left off. Closures are simply the current lexical stack — they represent the current local context and its immediate parent, recursively.
Classes
If the notion of a nested set of scopes residing on the heap seems foreign to you, it really shouldn’t. Here’s an example in Java:
public class Scopes { class Outer { class Inner { void alert() {System.out.println(msg);} } String msg = "Schweet!"; Inner inner = new Inner(); } Outer outer = new Outer(); public static void main(String[] args) { String msg = "Bogus!"; (new Scopes()).outer.inner.alert(); } }
Epilogue
One thing that people don’t often tell you about continuations is that they don’t actually resume exactly where you left off. After all, if they did, you would end up in an infinite loop. The only thing that is really restored is a few pointers. Everything that resides on the heap, or is stored externally in places like files and databases, remains unaffected when you invoke a continuation. But like the ByValue or ByReference discussion above, this doesn’t cause problems very often. In fact, it generally works out pretty much the way you would like it to, without you having to worry about it.
When you realize this, you realize that there really is no magic in continuations. Sure, more things are put on the heap, but there are ways to optimize this.
Oh, and by the way, you probably taste like chicken right about now.