About this blog

Hi, I'm Peter Bex, a Scheme and free software enthusiast from the Netherlands. See my user page on the CHICKEN wiki for some of my projects.

Are you in need of support for CHICKEN Scheme? Or maybe you want clear tech writing like on this very blog? Then you're in luck! I am now available as a freelance consultant!

The 5 most recent posts (archive) Atom feed

Maybe you've noticed the little Flattr icons that I've added at the bottom of each post, and now you're wondering if, and why, you should support my blogging efforts with a donation.

Let me state this first and foremost: I enjoy writing posts about subjects that interest me and I learn a lot from the research I put into my posts. So, even without donations I'd continue doing so. But, it's also a fact that many of my posts demand a lot of effort. Researching, writing and illustrating an in-depth post can take weeks, depending on my energy levels and other activities.

I hope you now understand why this blog is updated so slowly. I'm convinced that few updates with informative, long posts is better than frequent 100-word status updates of my projects. That would be a waste of your time! But because it takes so much time and effort, sometimes I struggle to find the motivation for writing new posts.

This is where Flattr comes in: a small donation is a wonderful way to express how much you liked a post, and it can really lift my spirits and motivate me to write more!

As an added advantage, by seeing which posts get Flattr'd the most I can get a better idea of what you, the readership of my blog, like and what you don't like. This will help me choose which topics to write about more.

Why I refuse to put ads on my blog

Of course, I know that placing ads on my blog would be the easiest way to get some returns from the hard work that I put into my posts. This would easily repay my hosting costs. However, in my opinion, the advertising industry is the most toxic influence on the web today. It is completely absurd that it's considered normal for companies (and individuals with blogs...) to be rewarded for filling up our screens with obnoxious garbage, and for eroding our privacy through various tracking mechanisms. They don't care that all this digital pollution is only going to be acted on by a tiny fraction of all visitors (if any). It is downright disrespectful; personally I think it's as bad as spam e-mail.

In my opinion, Flattr is a much more benign way of receiving money for my efforts. The company was created by a founder of The Pirate Bay, as a better and more direct alternative for rewarding creators than the commercial exploitation of copyright law by middle men. It's also non-unobtrusive: just a small icon in each post. The integration that I'm using is a simple hyperlink, so no nasty tracking cookies or web beacons; I'm even hosting the icon image myself.

Besides that, Flattr is completely voluntary on the visitor's part. This means that any rewards are directly linked to people's enjoyment of my posts. This encourages informative posts of substance rather than click-bait titles with disappointing content which would draw more traffic, just to get more ad impressions.

These are my views on the matter. If you're using ads on your blog, I hope I've managed to convince you to switch away from ads to visitor-supported donations. And if you like my posts, I hope you'll consider making a small donation.

Integrating Flattr into Hyde

To make this post at least a little bit on-topic, I'd like to share how I added the Flattr button to my Hyde configuration. I added the following definition to hyde.scm (slightly modified):

(use hyde uri-common)

(define (flattr-button #!optional (page (current-page)))
  (let* ((base-uri (uri-reference "https://flattr.com/submit/auto"))
         (attrs `((user_id . "YOUR FLATTR ACCOUNT NAME")
                  ;; A bit ugly, but Hyde does it like this, too
                  (url . ,(string-append ($ 'base-uri page) (page-path page)))
                  (category . "text")
                  (language . ,(or ($ 'lang page) "en_GB"))
                  (title . ,($ 'title page))))
         ;; Flattr doesn't grok semicolon as query param separator...
          (parameterize ((form-urlencoded-separator "&"))
            (uri->string (update-uri base-uri query: attrs)))))

    `(a (@ (class "flattr-button")
           (href ,flattr-page-uri))
        (img (@ (src "/pics/flattr-button.svg")
                (alt "Flattr this!")
                (title "Flattr this post!"))))))

Then, in my SXML definition for article layouts which I call, surprise, surprise! article.sxml:

;; -*- Scheme -*-

`(div (@ (class "article"))


        (h1 (@ (class "article-title"))
            ,($ 'title) " "
            (small (@ (class "date"))
                   "Posted on " ,(format-seconds (page-updated))))

        (div (@ (class "article-body"))
             (inject ,contents)


The button immediately follows the contents of the blog post, which makes sense because only after reading a blog post, you'll know whether it was worth something to you.

Flattr this!  (why?)

In my earlier post about the garbage collector, I lied a little bit about the data representation that CHICKEN uses. At the end of the post I briefly mentioned how CHICKEN really stores objects. If you want to fully understand the way CHICKEN works, it is important to have a good grasp on how it stores data internally.

Basic idea

CHICKEN attempts to store data in the most "native" way it can. Even though it's written in C, it tries hard to use machine words everywhere. So on a 32-bit machine, the native code that's eventually generated will use 32-bit wide integers and pointers. On a 64-bit machine it will use 64-bit wide integers and pointers.

This is known as a C_word, which is usually defined as an int or a long, depending on the platform. By the way, the C_ prefix stands for CHICKEN, not the C language. Every Scheme value is represented as a C_word internally. To understand how this can work, you need to know that there are roughly two kinds of objects.

Immediate values

First, there are the immediate values. These are the typical "atomic" values that come up a lot in computations. It is important to represent these as efficiently as possible, so they are packed directly in a C_word. This includes booleans, the empty list, small integers (these are called fixnums), characters and a few other special values.

Because these values are represented directly by a C_word, they can be compared in one instruction: eq? in Scheme. These values do not need to be heap-allocated: they fit directly in a register, and can be passed around "by value" in C. This also means they don't need to be tracked by the garbage collector!

At a high enough level, these values simply look like this:

This doesn't really show anything, does it? Well, bear with me...

Block objects

The other kind of value is the block object. This is a value that is represented as a pointer to a structure that contains a header and a variable-length data block.

The data block is a pointer which can conceptually be one of two types. In case of a string or srfi-4 object, the data block is simply an opaque "blob" or byte-vector. In most other cases, the block is a compound value consisting of other Scheme objects. Typical examples are pairs, vectors and records.

Because these values are heap-allocated, two distinct objects are not stored at the same memory address, even if they store the same value. That's why comparing their values is a complex operation. This operation is either equal? for deep structural comparison, or eqv? for value comparisons of numbers and symbols.

The R5RS specification explains that the difference between eq? and eqv? is not necessarily the same across Scheme implementations. For example, in CHICKEN, eq? can be used to compare characters and fixnums, because they are stored as immediate values. Portable programs should not rely on that. If you use eq? on block objects, their pointers will be compared. That means it checks whether they are one and the same object. This can be a useful operation in its own right.

Objects represented by data blocks also have to be tracked by the garbage collector: if there are still references to the block, its data must be copied (recursively) to keep it alive across GC events.

Here are some "high-level" examples of block objects:

This picture should look somewhat familiar to students of SICP: it is reminiscent of the box-and-pointer notation used to illustrate the structure of lists. The boxes containing green text represent the object headers. The header indicates the type of object and the object's size. It also determines whether the object's data block is a byte block or a block containing Scheme objects: if it contains Scheme objects, the header tells us how many slots (locations for storing Scheme objects) the object has. Byte blocks, on the other hand, are opaque and can contain any data. Their size is stored as a byte count.

From top to bottom, left to right, these represent the following values:

  • (#\a . #\b) is a pair containing the character "a" in its car and "b" in its cdr.
  • #(#f 123 456 #f 42) is a regular Scheme vector containing fixnums and false values.
  • "hello" is a string consisting of 5 characters (strings are treated as byte vectors in CHICKEN).
  • 12.5 is an inexact representation of the number twelve and a half (a "flonum"). This is a byte block storing the raw byte value of a C double.
  • ("hello" . (12.5 . ())) is the first pair of a proper list which contains a string and a flonum.
  • (12.5 . ()) is the cdr of that list; a pair containing a number and the end-of-list marker.

The final two pair objects show that slots (like any C_word) can hold not only immediate values, but also pointers to block objects. This leads us to the question: how to differentiate between a pointer to an object and an immediate object?

Bit fiddling

Most platforms require pointers to words to be aligned on a word boundary. Thus, on a 32-bit machine, memory addresses will always have zero in the lower 2 bits, because we can only point to multiples of 4 bytes. On a 64-bit machine, word addresses will have zero in the lower 4 bits.

Because the lower two bits are never used, we can perform a simple trick: any value that has either of the lower two bits set cannot be a word pointer, so we enforce immediate objects to have either bit set. It may feel like a gross hack to people who are used to working with "clean", high-level C code, but it is a technique which goes back a long way: Orbit, one of the earliest optimising compilers for Scheme, did exactly the same thing. Other modern Schemes like Larceny and Gambit do the same thing. Even Scheme48, which is probably the cleanest Scheme implementation, uses tagged words. Other Lisps use this representation as well. See Steel Bank Common Lisp, for example.

Many other dynamic languages don't use a packed data representation like this. Many prefer the simpler but bulkier struct representation. At the other end of the spectrum, we have statically typed, non-garbage collected languages. They generally don't need to store the type of a value along with it. Instead, they can directly store the "un-boxed" value in memory. This, and the relation to garbage collection, is explained rather well in Appel's 1989 paper "Runtime Tags Aren't Necessary" (sorry, this is in PostScript format).

Representation of objects

We've learned how CHICKEN distinguishes between pointers to (block) objects and immediate values. Now we will look into the nitty-gritty details of the object representation.

We can make the following breakdown of bit patterns (assuming a 32-bit platform):

This shows that the lower two bits can be used to distinguish between block objects (zero) and immediate objects (nonzero). For immediate objects, the low bit can be used to distinguish between fixnum objects and other kinds of immediate objects. The colouring indicates which bits are used for tagging objects of that kind. The uncoloured bits are used for representing the object being stored.

Fixnums are distinguished from "other immediate" values because fixnums are so incredibly common: they are used for indexing into strings, loop counters and many calculations. These have to be represented as efficiently as possible while storing the widest possible range of values. Run time type checking for fixnums should use as few CPU instructions as possible.

The "other immediate" types are further differentiated through the top two bits of the lower nibble:

The unused "other immediate" type of 0010 is reserved for future use. To get a good feel for the representation of immediates, let us look at a few example bit patterns. I'll also show you how to construct them in C.

Bit patterns of immediate values


These small integer values are stored in regular old two's complement representation, like the CPU uses. The lowest bit is always 1, due to the fixnum tag bit. The highest bit is used to determine the sign of the number.

The C_fix() macro shifts its argument one bit to the left, and sets the lower bit through a bit-wise OR with 1. To convert a Scheme fixnum back to a C integer, you can use the C_unfix() macro. This shifts its argument one bit to the right.

You might wonder what happens when you calculate or enter a very large integer. In CHICKEN 4, it will be coerced to a flonum. In CHICKEN 5, it will be stored as a bignum. Bignums are block objects, not immediates, because they may be arbitrarily large.


That's a very large bit space for only two values. However, reserving a special type tag just for booleans simplifies type detection code: we only have to compare the lower four bits with 0110 to check whether an object is a boolean.


Characters do not make full use of the available bits, because the lower byte's high nibble is always 0000. This means that only 24 bits are available for representing the character on 32-bit platforms. Luckily, this is enough for representing the full Unicode range. If Unicode ever starts using up a bigger code space, we can always sneak in 4 more bits.

Special objects

This list is exhaustive: currently there are only four special objects. There is a lot of room for adding other special objects, if that ever becomes necessary.

The "unbound variable" representation cannot be captured by a program: when it is evaluated, it immediately raises an exception. This is its intended function.

A closer look at block objects

Now that we know all about immediate values, let's turn to block objects. These are represented by a pointer to a C structure with a header and a data block. Slightly simplified, it looks like this:

#define C_uword  unsigned C_word
#define C_header C_uword

typedef struct
  C_header header;
  C_word data[];    /* Variable-length array: header determines length */

The header's bit pattern is broken up into three parts:

The bottom 24 bits encode the size of the object. On 64-bit machines, the bottom 56 bits are used for the size. The middle 4 bits encode the type of the object. The top 4 bits encode special properties to make the garbage collector's work easier:

  • C_GC_FORWARDING_BIT indicates this object has been forwarded elsewhere. To find the object at its new location, the entire header is shifted to the left (which shifts out this bit). Then, the value is reinterpreted as a pointer. Remember, the lowest two bits of word pointers are always zero, so we can do this with impunity!
  • C_BYTEBLOCK_BIT indicates this is a byte blob (size bits are interpreted in bytes, not words).
  • C_SPECIALBLOCK_BIT indicates that the first slot is special and should be skipped by the GC.
  • C_8ALIGN_BIT indicates that for this object, alignment must be maintained at an 8-byte boundary.

The type bits are assigned incrementally. There is room for 16 types, only 2 of which are currently unused. Let's look at the definitions, which should also help to explain the practical use of the latter 3 GC bits:

#define C_SYMBOL_TYPE            (0x01000000L)
#define C_STRING_TYPE            (0x02000000L | C_BYTEBLOCK_BIT)
#define C_PAIR_TYPE              (0x03000000L)
#define C_CLOSURE_TYPE           (0x04000000L | C_SPECIALBLOCK_BIT)
#define C_FLONUM_TYPE            (0x05000000L | C_BYTEBLOCK_BIT | C_8ALIGN_BIT)
/*      unused                   (0x06000000L ...) */
#define C_PORT_TYPE              (0x07000000L | C_SPECIALBLOCK_BIT)
#define C_STRUCTURE_TYPE         (0x08000000L)
#define C_POINTER_TYPE           (0x09000000L | C_SPECIALBLOCK_BIT)
#define C_LOCATIVE_TYPE          (0x0a000000L | C_SPECIALBLOCK_BIT)
#define C_SWIG_POINTER_TYPE      (0x0c000000L | C_SPECIALBLOCK_BIT)
#define C_LAMBDA_INFO_TYPE       (0x0d000000L | C_BYTEBLOCK_BIT)
/*      unused                   (0x0e000000L ...) */
#define C_BUCKET_TYPE            (0x0f000000L)

Most of the types should be self-explanatory to a seasoned Schemer, but a few things deserve further explanation.

You'll note that in the STRING type tag, C_BYTEBLOCK_BIT is also set, for obvious reasons: strings do not consist of slots containing Scheme values, but of bytes, which are opaque. Because the header's size bits store the length in bytes instead of in words, we can spot a very important limitation: CHICKEN strings can only hold 16 MiB of data on a 32-bit machine (on a 64-bit machine, strings are "limited" to 65536 TiB).

The CLOSURE type uses C_SPECIALBLOCK_BIT. This indicates to the garbage collector that the first slot contains a raw non-Scheme value. In the case of a closure, it contains a pointer to a C function. The other slots contain free variables that were closed over ("captured") by the lambda, which are normal Scheme objects. The compiled C function "knows" which variable lives in which slot.

The FLONUM type uses C_BYTEBLOCK_BIT, because an un-boxed C double value is not a Scheme object: we want to treat the data as an opaque blob. On a 32-bit system, the double will take up two machine words, so we can't use C_SPECIALBLOCK_BIT. The header will therefore hold the value 8 as its size. It also has another GC bit: C_8ALIGN_BIT. This ensures that the 64-bit double is aligned on a 8-byte boundary, to avoid unaligned access on 32-bit systems. This adds some complexity to garbage collection and memory allocation.

The STRUCTURE type refers to a SRFI-9 type of record object. Its slots hold the record's fields, and the accessors and constructors "know" which field is stored at which index.

The POINTER type holds a raw C pointer inside a Scheme object. Again, because C pointers are not Scheme objects, the object's first (and only) slot is treated specially, via C_SPECIALBLOCK_BIT.

The LOCATIVE type represents a rather complicated object. It acts a bit like a pointer into a slab of memory. You can use it as a single value which represents a location inside another block object. This can then be used as an argument to a foreign function that expects a pointer. Its first slot holds a raw pointer. The other slots hold the offset, the type of pointer (encoded as fixnum) and the original object, unless it is a weak reference.

The TAGGED_POINTER type is exactly like POINTER, but it has an extra user-defined tag. This can make it easier for code to identify the pointer's type. The tag is a Scheme value held in its second slot.

The SWIG_POINTER has been removed in CHICKEN 5 and was used for compatibility with SWIG. It is basically the same as POINTER, with additional SWIG data added to it.

The LAMBDA_INFO type stores procedure introspection information (mostly for debugging).

The BUCKET type is a special internal pair-like object which is used in the linked list of symbols under a hash table bucket in the symbol table. It does not count as a reference, so that symbols can be garbage collected when only the symbol table still refers to them.

So far, the only numeric types we've seen are fixnums and flonums. What about the other numeric types? After all, CHICKEN 5 will (finally) have a full numeric tower!

In CHICKEN 5, rational and complex numbers are viewed as two simpler numbers stuck together. They're stored as records with a special tag, which the run-time system recognises. Bignums are a different story altogether. When I first implemented them, they used one of the two unused header types in the list above. For various reasons I won't go into now, they are now also represented as a record with a special tag and a slot that refers to the byte blob containing the actual bignum value. Perhaps this is something for a later blog post.

Putting it all together in the garbage collector

So far, all of this perhaps sounds rather arbitrary and complex. The data representation is finely tuned to fit the garbage collector, and vice versa, so it may help to see how this simplifies the garbage collector.

The way the data representation is set up, the garbage collector only has to perform a few very basic checks. It does not need to know about any of the data types at all, it only needs to look at the special GC bits, and the size of an object!

Now we're finally ready to understand the heart of the garbage collector, which scans the live data and marks nested objects. This part of CHICKEN implements the Cheney algorithm. It's only 22 lines of code, without any simplifications. This is taken directly from runtime.c, with comments added for exposition:

/* Mark nested values in already moved (marked) blocks
   in breadth-first manner: */
while(heap_scan_top < (gc_mode == GC_MINOR ? C_fromspace_top : tospace_top)) {
  bp = (C_SCHEME_BLOCK *)heap_scan_top; /* Get next object from queue */

  /* If this word is an alignment hole marker, skip it */
  if(*((C_word *)bp) == ALIGNMENT_HOLE_MARKER)
    bp = (C_SCHEME_BLOCK *)((C_word *)bp + 1);

  n = C_header_size(bp);  /* Extract size bits from header */
  h = bp->header;         /* Remember header for masking other bits */
  bytes = (h & C_BYTEBLOCK_BIT) ? n : n * sizeof(C_word);  /* Size in bytes */
  p = bp->data;           /* Data block (first slot) */

  if(n > 0 && (h & C_BYTEBLOCK_BIT) == 0) { /* Contains slots, not bytes? */
    if(h & C_SPECIALBLOCK_BIT) { /* Skip first word (not a Scheme object) */

    while(n--) mark(p++); /* Mark Scheme objects in data slots */

  /* Advance onto next word just after object */
  heap_scan_top = (C_byte *)bp + C_align(bytes) + sizeof(C_word);

The comment at the start refers to the fact that the "tip of the iceberg" of live data has already been copied; this code scans that set for nested objects referred to by those live objects. See my post about the garbage collector for more about how the GC and Cheney's algorithm work.

If we're in a minor GC, this code scans over the fromspace, which is the memory area into which the nursery objects will be copied. If we're in a major GC, we're scanning over tospace, which is the other half of the heap, to which the fromspace will be copied.

The code above simply advances the heap_scan_top pointer over the objects we need to look at until we hit the end of this space. It then checks for an ALIGNMENT_HOLE_MARKER, which is a magic value that gets used as a placeholder to indicate that this machine word should be skipped. This placeholder may get inserted when allocating a C_8ALIGN_BIT object, to avoid unaligned access.

Next, the size (in bytes) of the object is determined, based on the C_BYTEBLOCK_BIT. Finally, if it's a data block (C_BYTEBLOCK_BIT is not set), we loop over the data slots. The first word is skipped if it's indicated as "special" via C_SPECIALBLOCK_BIT.

The mark() call hides the hairy part. It performs the following steps:

  • Check that the word contains a block object. Otherwise, return because it's an immediate value.
  • Check that the word points to memory that's being moved, otherwise return. This avoids copying already copied or evicted data.
  • If the object has the C_GC_FORWARDING_BIT set, just update the marked slot with the new location the object was forwarded to, and return.
  • If we're on a 32-bit machine, the object to be copied has the C_8ALIGN_BIT set, and the current top of the target heap area is not aligned, insert an ALIGNMENT_HOLE_MARKER.
  • In case the target area is too small to hold the object, interrupt the current GC and trigger the "next" GC type. This will be a major collection if we're currently doing a minor collection, or a heap reallocating major collection if we're in a regular major collection.
  • Finally, copy the object via a simple memcpy().

Because this is done by mark() and not by the scanning code shown above, all this is only performed if the object in question is a block object which needs to be copied (the mark() macro inlines the first check). Just scanning the live data is extremely fast. We can thank the data representation's simplicity for that speed!

Further reading

There is a lot of information about this stuff, but it can be a little hard to find. Here are a few links I found during my research for this blog post:

Flattr this!  (why?)

When you're writing Scheme code in CHICKEN it's sometimes necessary to make a little excursion to C. For example, you're trying to call a C library, you're writing extremely performance-critical code, or you're working on something that's best expressed in C, such as code that requires a lot of bit-twiddling.

This post contains a lot of code, including generated C code. If you get too tired to absorb it, it's probably best to stop reading and pick it up again later.

A basic example of invoking C code from CHICKEN

This is one of CHICKEN's strengths: the ability to quickly drop down to C for a small bit of code, and return its result to Scheme:

(import foreign)

(define ilen
  (foreign-lambda* long ((unsigned-long x))
    "unsigned long y;\n"
    "long n = 0;\n"
    "#ifdef C_SIXTY_FOUR\n"
    "y = x >> 32; if (y != 0) { n += 32; x = y; }\n"
    "y = x >> 16; if (y != 0) { n += 16; x = y; }\n"
    "y = x >>  8; if (y != 0) { n +=  8; x = y; }\n"
    "y = x >>  4; if (y != 0) { n +=  4; x = y; }\n"
    "y = x >>  2; if (y != 0) { n +=  2; x = y; }\n"
    "y = x >>  1; if (y != 0) C_return(n + 2);\n"
    "C_return(n + x);"))

(print "Please enter a number")
(print "The length of your integer in bits is " (ilen (read)))

This example is taken from a wonderful little book called "Hacker's Delight", by Henry S. Warren. It calculates the number of bits required to represent an unsigned integer (its "length"). By the way, this procedure is provided by the numbers egg as integer-length. The algorithm is implementable in Scheme, but at least a direct translation to Scheme is nowhere as readable as it is in C:

(define (ilen x)
  (let ((y 0) (n 0))
       (set! y (arithmetic-shift x -32))
       (unless (zero? y) (set! n (+ n 32)) (set! x y)))
    (set! y (arithmetic-shift x -16))
    (unless (zero? y) (set! n (+ n 16)) (set! x y))
    (set! y (arithmetic-shift x -8))
    (unless (zero? y) (set! n (+ n 8)) (set! x y))
    (set! y (arithmetic-shift x -4))
    (unless (zero? y) (set! n (+ n 4)) (set! x y))
    (set! y (arithmetic-shift x -2))
    (unless (zero? y) (set! n (+ n 2)) (set! x y))
    (set! y (arithmetic-shift x -1))
    (if (not (zero? y)) (+ n 2) (+ n x))))

The performance of the Scheme version is also going to be less than that of the C version. All in all, plenty of good reasons to prefer integration with C. There's no shame in that: most fast languages forego "pure" implementations in favour of C for performance reasons. The only difference is that calling C in other languages is often a bit more work.

Analysing the generated code

In this section we'll unveil the internal magic which makes C so easily integrated with Scheme. You can skip this section if you aren't interested in low-level details.

As you might have noticed, the C code in the example above contains one unfamiliar construct: It uses C_return() to return the result. If you inspect the code generated by CHICKEN after compiling it via csc -k test.scm, you'll see that it inserts some magic to convert the C number to a Scheme object. I've added some annotations and indented for readability:

/* Local macro definition to convert returned long to a Scheme object. */
#define return(x) \
  C_cblock C_r = (C_long_to_num(&C_a,(x))); goto C_ret; C_cblockend

/* Prototype declaring the stub procedure as static, returning a
 * C_word (Scheme object) and passing arguments through registers.
 * It's not strictly necessary in this case.
static C_word C_fcall stub7(C_word C_buf, C_word C_a0) C_regparm;

/* The stub function: it gets passed a buffer in which Scheme objects get
 * allocated (C_buf) and the numbered arguments C_a0, C_a1, ... C_an.
C_regparm static C_word C_fcall stub7(C_word C_buf, C_word C_a0)
  C_word C_r = C_SCHEME_UNDEFINED, /* Return value, mutated by return() macro */
        *C_a=(C_word*)C_buf;     /* Allocation pointer used by return() macro */

  /* Conversion of input argument from Scheme to C */
  unsigned long x = (unsigned long )C_num_to_unsigned_long(C_a0);

  /* Start of our own code from the foreign-lambda* body, as-is */
  unsigned long y;
  long n = 0;
  y = x >> 32; if (y != 0) { n += 32; x = y; }
  y = x >> 16; if (y != 0) { n += 16; x = y; }
  y = x >>  8; if (y != 0) { n +=  8; x = y; }
  y = x >>  4; if (y != 0) { n +=  4; x = y; }
  y = x >>  2; if (y != 0) { n +=  2; x = y; }
  y = x >>  1; if (y != 0) C_return(n + 2);
  C_return(n + x);

C_ret: /* Label for goto in the return() macro */
#undef return
  return C_r; /* Regular C return */

/* chicken.h contains the following: */
#define C_return(x)              return(x)
#define C_cblock                 do{
#define C_cblockend              }while(0)

In the foreign-lambda*, I used C_return for clarity: I could have just used return with parentheses, which will get expanded by the C preprocessor. This is somewhat confusing: return n + x; will result in an error, whereas return(n+x); will do the same as C_return(n+x);.

The return macro calls C_long_to_num, which will construct a Scheme object, which is either a fixnum (small exact integer) or a flonum (floating-point inexact number), depending on the platform and the size of the returned value. Hopefully, in CHICKEN 5 it will be either a fixnum or a bignum - that way, it'll always be an exact integer.

Because these number objects need to get allocated on the stack to integrate with the garbage collector, the calling code needs to set aside enough memory on the stack to fit these objects. That's what the C_buf argument is for: it's a pointer to this area. In CHICKEN, a whole lot of type punning is going on, so it's passed as a regular C_word rather than as a proper pointer, but let's ignore that for now.

The stub function above is used to do the actual work, but in order to integrate it into CHICKEN's calling conventions and garbage collector, an additional wrapper function is generated. It corresponds to the actual Scheme "ilen" procedure, and looks like this:

/* ilen in k197 in k194 in k191 */
static void C_ccall f_201(C_word c, C_word t0, C_word t1, C_word t2)
  C_word tmp /* Unused */; C_word t3; C_word t4; C_word t5;  /* Temporaries */
  C_word ab[6], *a=ab;     /* Memory area set aside on stack for allocation */

  if(c != 3) C_bad_argc_2(c, 3, t0);     /* Check argument count is correct */

  C_check_for_interrupt; /* Check pending POSIX signals, and thread timeout */

  if(!C_stack_probe(&a)) {   /* Stack full?  Then perform GC and try again. */
    C_save_and_reclaim((void*)tr3, (void*)f_201, 3, t0, t1, t2);
  t3 = C_a_i_bytevector(&a,1,C_fix(4));   /* Needed to have a proper object */
  t4 = C_i_foreign_unsigned_integer_argumentp(t2);   /* Check argument type */
  t5 = t1;                          /* The continuation of the call to ilen */
  /* Call stub7 inline, and pass result to continuation: */
  ((C_proc2)(void*)(*((C_word*)t5+1)))(2, t5, stub7(t3, t4));

The comment at the top indicates the name of the Scheme procedure and its location in the CPS-converted Scheme code. The k197 in k194 etc indicate the nesting in the generated continuations, which can sometimes be useful for debugging. These continuations can be seen in the CPS-converted code by compiling with csc -debug 3 test.scm.

Much of the code you might sort-of recognise from the code in my article about the CHICKEN garbage collector: The C_stack_probe() corresponds to that post's fits_on_stack(), and C_save_and_reclaim() combines that post's SCM_save_call() and SCM_minor_GC().

All Scheme procedures get compiled down to C functions which receive their argument count (c), the closure/continuation from which they're invoked (t0), so they can access local closure variables (not used here) and in order to perform a GC and re-invoke the closure. Finally, they receive the continuation of the call (t1) and any procedure arguments (everything after it, here only t2). If a procedure has a variable number of arguments, that will use C's varargs mechanism, which is why passing the argument count to every function is important. If a function is called with too many or too few arguments, this will "just work", even if the arguments are declared in the function prototype like here: The function is invoked correctly, but the stack will contain rubbish instead of the expected arguments. That's why it's important to first check the argument count, and then check whether a GC needs to be performed; otherwise, this rubbish gets saved by save_and_reclaim and the GC will attempt to traverse it as if it contained proper Scheme objects, resulting in segfaults or other nasty business.

The variable t3 will contain the buffer in which the return type is stored. It is wrapped in a byte vector, because this makes it a first-class object understood by the garbage collector. That's not necessary here, but this code is pretty generic and is also used in cases where it is necessary. The C_word ab[6] declaration sets aside enough memory space to hold a flonum or a fixnum, which need at most 4 bytes, plus 2 bytes for the bytevector wrapper. I will explain these details later in a separate post, but let's assume it's OK for now.

The argument type gets checked just before calling the C function. If the argument is not of the correct type, an error is signalled and the function will be aborted. The returned value is simply the input, so t4 will contain the same value as t2. Similarly, t1 gets copied as-is to t5. Finally, the continuation gets cast to the correct procedure type (again: a lot of type punning. I will explain this in another post), and invoked with the correct argument count (2), the continuation closure itself, and the return value of the stub function.

Returning complex Scheme objects from C

I've tried to explain above how the basic C types get converted to Scheme objects, but what if we want to get crazy and allocate Scheme objects in C? A simple foreign-lambda* won't suffice, because the compiler has no way of knowing how large a buffer to allocate, and the C function will return, so we'll lose what's on the stack.

To fix that, we have foreign-safe-lambda*, which will allow us to allocate any object on the stack. Before such a function is invoked, a minor garbage collection is triggered to clean the stack and ensure we have plenty of allocation room. Let's look at a simple example. This program displays the list of available network interfaces on a UNIX-like system:

(import foreign)

(foreign-declare "#include \"sys/types.h\"")
(foreign-declare "#include \"sys/socket.h\"")
(foreign-declare "#include \"ifaddrs.h\"")

(define interfaces
  (foreign-safe-lambda* scheme-object ()
    "C_word lst = C_SCHEME_END_OF_LIST, len, str, *a;\n"
    "struct ifaddrs *ifa, *i;\n"
    "if (getifaddrs(&ifa) != 0)\n"
    "  C_return(C_SCHEME_FALSE);\n"
    "for (i = ifa; i != NULL; i = i->ifa_next) {\n"
    "  len = strlen(i->ifa_name);\n"
    "  a = C_alloc(C_SIZEOF_PAIR + C_SIZEOF_STRING(len));\n"
    "  str = C_string(&a, len, i->ifa_name);\n"
    "  lst = C_a_pair(&a, str, lst);\n"

(print "The following interfaces are available: " (interfaces))

This functionality is not available in CHICKEN because it's not very portable (it's not in POSIX), so it's a good example of something you might want to use C for. Please excuse the unSchemely way of error handling by returning #f for now. We'll fix that in the next chapter.

Looking at our definition, the interfaces procedure has no arguments, and it returns a scheme-object. This type indicates to CHICKEN that the returned value is not to be converted but simply used as-is: we'll handle its creation ourselves.

We declare the return value lst, which gets initialised to the empty list, and two temporary variables: len and str, to keep an intermediate string length and to hold the actual CHICKEN string. The variable a is an allocation pointer. Then we have the two variables which hold the start of the linked list of interfaces, ifa, and the current iterator through this list, i.

We retrieve the linked list (if it fails, returning #f), and scan through it until we hit the end. For each entry, we simply check the length of the interface name string, we allocate enough room on the stack to hold a pair and a CHICKEN string of the same length (C_alloc() is really just alloca()). The C_SIZEOF... macros are very convenient to help us calculate the size of an object without having to know its exact representation in memory. We then create the CHICKEN string using C_string, which is put into the allocated space stored in a, and we create a pair which holds the string in the car and the previous list as its cdr.

These allocating C_a_pair and C_string functions accept a pointer to the allocated space (which itself is a pointer). This means they can advance the pointer's value beyond the object, to the next free position. This is quite nice, because it allows us to call several allocating functions in a row, with the same pointer, and at the end the pointer points past the object that was allocated last. Finally, we release the memory used by the linked list and return the constructed list.

Analysing the generated code

Like before, if you're not interested in the details, feel free to skip this section.

The interfaces foreign code itself compiles down to this function:

/* Like before, but no conversion because we "return" a native object: */
#define return(x) C_cblock C_r = (((C_word)(x))); goto C_ret; C_cblockend

/* The prototype _is_ necessary in this case: it declares the function
 * as never returning via C_noret, which maps to __attribute__((noreturn)).
static void C_ccall stub6(C_word C_c, C_word C_self,
                          C_word C_k, C_word C_buf) C_noret;

/* The arguments to the stub function now include the argument count,
 * the closure itself and the continuation in addition to the buffer
 * and arguments (none here).  This is a truly "native" CHICKEN function!
static void C_ccall stub6(C_word C_c, C_word C_self, C_word C_k, C_word C_buf)
        *C_a = (C_word *)C_buf;

  /* Save callback depth; needed if we want to call Scheme functions */
  int C_level = C_save_callback_continuation(&C_a, C_k);

  /* Start of our own code, as-is: */
  struct ifaddrs *ifa, *i;
  C_word lst = C_SCHEME_END_OF_LIST, len, str, *a;

  if (getifaddrs(&ifa) != 0)

  for (i = ifa; i != NULL; i = i->ifa_next) {
    len = strlen(i->ifa_name);
    a = C_alloc(C_SIZEOF_PAIR + C_SIZEOF_STRING(len));
    str = C_string(&a, len, i->ifa_name);
    lst = C_a_pair(&a, str, lst);


#undef return

  /* Pop continuation off callback stack. */
  C_k = C_restore_callback_continuation2(C_level);

  C_kontinue(C_k, C_r); /* Pass return value to continuation. */

This is not much different from the foreign-lambda* example, but notice that the arguments are different: this stub looks exactly like the C code generated from an actual Scheme continuation: it gets passed the argument count, its own closure and its continuation. Instead of ending with a regular return from C, it invokes a continuation. This is the crucial difference which integrates our code with the garbage collector: by passing it to the next continuation's C function, the "returned" value is preserved on the stack. In other words, it is allocated directly in the nursery.

Even though the stub is a "native" Scheme procedure, a wrapper is still generated: if the foreign-safe-lambda is defined to accept C arguments, it'll still need to convert from Scheme objects, it needs to check the argument count, and it needs to invoke the GC before the procedure can be called:

/* interfaces in k197 in k194 in k191 */
static void C_ccall f_201(C_word c, C_word t0, C_word t1){
  /* This is the function that corresponds to the Scheme procedure.
   * This is the first stage of the procedure: we invoke the GC with
   * a continuation which will do conversions and call the C stub.
  C_word tmp; C_word t2; C_word t3;
  C_word ab[3], *a = ab;

  /* As before: */
  if (c!=2) C_bad_argc_2(c, 2, t0);


  if (!C_stack_probe(&a)) {

  /* Create the continuation which will be invoked after GC: */
  t2 = (*a = C_CLOSURE_TYPE|2, /* A closure of size two: */
        a[1] = (C_word)f_205,  /* Second stage function of our wrapper, */
	a[2] = t1,             /* and continuation of call to (interfaces). */
	tmp = (C_word)a,       /* Current value of "a" must be stored in t2...*/
	a += 3,                /* ... but "a" itself gets advanced... */
	tmp);                  /* ... luckily tmp holds original value of a. */

  C_trace("test.scm:8: ##sys#gc"); /* Trace call chain */

  /* lf[1] contains the symbol ##sys#gc.  This invokes its procedure. */
  ((C_proc3)C_fast_retrieve_symbol_proc(lf[1]))(3, *((C_word*)lf[1]+1),
                                                t2, C_SCHEME_FALSE);

/* k203 in interfaces in k197 in k194 in k191 */
static void C_ccall f_205(C_word c, C_word t0, C_word t1)
  /* This function gets invoked from the GC triggered by the above function,
   * and is the second stage of our wrapper function.  It is similar to the
   * wrapper from the first example of a regular foreign-lambda.
  C_word tmp; C_word t2; C_word t3; C_word t4;
  /* Enough room for a closure of 2 words (total size 3) and a bytevector
   * of 3 words (total size 4).  This adds up to 7; The missing 1 is to
   * make room for a possible alignment of the bytevector on 32-bit platforms.
  C_word ab[8], *a=ab;


  if (!C_stack_probe(&a)) {
    C_save_and_reclaim((void*)tr2, (void*)f_205, 2, t0, t1);

  t2 = C_a_i_bytevector(&a, 1, C_fix(3)); /* Room for one pair */

  t3 = (*a = C_CLOSURE_TYPE|2, /* Create a closure of size 2: */
        a[1] = (C_word)stub6,  /* Our foreign-safe-lambda stub function, */
	a[2] = ((C_word)li0),  /* and static lambda-info for same (unused). */
	tmp = (C_word)a,       /* Update "a" and return original value, */
	a += 3,                /* exactly like we did in f_201. */
  /* Trace procedure name generated by (gensym). Kind of useless :) */
  C_trace("test.scm:8: g9");

  t4 = t3; /* Compilation artefact; don't worry about it */

  /* Retrieve procedure from closure we just created, and call it,
   * with 3 arguments: itself (t4), the continuation of the call
   * to "interfaces" (t0[2]), and the bytevector buffer (t2).
  ((C_proc3)C_fast_retrieve_proc(t4))(3, t4, ((C_word*)t0)[2], t2);

Our foreign-lambda's wrapper function now consists of two stages. The first stage first creates a continuation for the usual wrapper function. Then it calls the garbage collector to clear the stack, after which this wrapper-continuation is invoked. This wrapper is the second function here, and it corresponds closely to the wrapper function we saw in the ilen example. However, this wrapper constructs a closure around the C stub function instead of simply calling it. This closure is then called: C_fast_retrieve_proc simply extracts the function from the closure object we just created, it is cast to a 3-argument procedure type and invoked with the continuation of the interfaces call site.

You can see how closures are created in CHICKEN. I will explain this in depth in a future blog post, but the basic approach is pretty clever: the whole thing is one big C expression which stores successive words at the free slots in the allocated space a, while ensuring that after the expression a will point at the next free word. The dance with tmp ensures that the whole expression which allocates the closure results in the initial value of a. That initial value was the first free slot before we executed the expression, and afterwards it holds the closure. Don't worry if this confuses you :)

Calling Scheme from C

Now, with the basics out of the way, let's do something funkier: instead of calling C from Scheme, we call Scheme from C! There is a C API for embedding CHICKEN in a larger C program, but that's not what you should use when calling Scheme from C code that was itself called from Scheme.

The "easy" way

Our little interfaces-listing program has one theoretical flaw: the list of interfaces could be very long (or the names could be long), so we may theoretically run out of stack space. So, we should avoid allocating unbounded lists directly on the stack without checking for overflow. Instead, let's pass the allocated objects to a callback procedure which prints the interface, in a "streaming" fashion.

As I explained before, a regular foreign-lambda uses the C stack in the regular way, it doesn't know about continuations or the Cheney on the MTA garbage collection style, and there's no way to call CHICKEN functions from there, because the GC would "collect" away the C function by longjmp()ing past it. However, the foreign-safe-lambda has a special provision for that: it can "lock" the current live data by putting a barrier between this C function and the Scheme code it calls:

(import foreign)

(foreign-declare "#include \"sys/types.h\"")
(foreign-declare "#include \"sys/socket.h\"")
(foreign-declare "#include \"ifaddrs.h\"")

(define interfaces
  (foreign-safe-lambda* scheme-object ((scheme-object receiver))
    "C_word len, str, *a;\n"
    "struct ifaddrs *ifa, *i;\n"
    "if (getifaddrs(&ifa) != 0)\n"
    "  C_return(C_SCHEME_UNDEFINED);\n"
    "for (i = ifa; i != NULL; i = i->ifa_next) {\n"
    "  len = strlen(i->ifa_name);\n"
    "  a = C_alloc(C_SIZEOF_STRING(len));\n"
    "  str = C_string(&a, len, i->ifa_name);\n"
    "  C_save(str);\n"
    "  C_callback(receiver, 1);\n"

(print "The following interfaces are available: ")
(interfaces print)

This will display the interfaces one line at a time, by using CHICKEN's print procedure as the callback.

We won't look at the compiled source code for this implementation, because it is identical to the earlier one, except for the changed foreign-lambda body. The implementation of C_callback() is of interest, but it is a little hairy, so I'll leave it you to explore it yourself.

The basic idea is rather simple, though: it simply calls setjmp() to establish a new garbage collection trampoline. This means that the foreign-lambda will always remain on the stack. The callback is then invoked with a continuation which sets a flag to indicate that the callback has returned normally, in which case its result will be returned to the foreign-lambda. If it didn't return normally, we arrived at the trampoline because a GC was triggered. This means the remembered continuation will be re-invoked, like usual.

However, when the callback did return normally, we can simply return the returned value because the foreign-lambda's stack frame is still available due to the GC barrier we set up.

The C_save macro simply saves the callback's arguments on a special stack which is read by C_do_apply. It is also used by callback_return_continuation: it saves the value and triggers a GC to force the returned value into the heap. That way, we can return it safely to the previous stack frame without it getting clobbered by the next allocation.

A harder way

The above code has another flaw: if the callback raises an exception, the current exception handler will be invoked with the continuation where it was established. However, that might never return to the callback, which means we have a memory leak on our hands!

If the callback doesn't return normally, the foreign-lambda will remain on the stack forever. How do we avoid that little problem? The simplest is of course to wrap the callback's code in handle-exceptions or condition-case. However, that's no fun at all.

Besides, in real-world code we want to avoid the overhead of a GC every single time we invoke a C function, so foreign-safe-lambda is not really suitable for functions that are called in a tight loop. In such cases, there is only one way: to deeply integrate in CHICKEN and write a completely native procedure! Because truly native procedures must call a continuation when they want to pass a result somewhere, we'll have to chop up the functionality into three procedures:

(import foreign)
(use lolevel)     ; For "location"

(foreign-declare "#include \"sys/types.h\"")
(foreign-declare "#include \"sys/socket.h\"")
(foreign-declare "#include \"ifaddrs.h\"")

(define grab-ifa!
  (foreign-lambda* void (((c-pointer (c-pointer "struct ifaddrs")) ifa))
    "if (getifaddrs(ifa) != 0)\n"
    "  *ifa = NULL;\n"))

(define free-ifa!
  (foreign-lambda* void (((c-pointer (c-pointer "struct ifaddrs")) ifa))

(define next-ifa
  (foreign-primitive (((c-pointer (c-pointer "struct ifaddrs")) ifa))
    "C_word len, str, *a;\n"
    "if (*ifa) {\n"
    "  len = strlen((*ifa)->ifa_name);\n"
    "  a = C_alloc(C_SIZEOF_STRING(len));\n"
    "  str = C_string(&a, len, (*ifa)->ifa_name);\n"
    "  *ifa = (*ifa)->ifa_next;\n"
    "  C_kontinue(C_k, str);\n"
    "} else {\n"
    "  C_kontinue(C_k, C_SCHEME_FALSE);\n"

(define (interfaces)
  ;; Use a pointer which the C function mutates.  We could also
  ;; return two values(!) from the "next-ifa" foreign-primitive,
  ;; but that complicates the code flow a little bit more.
  ;; Sorry about the ugliness of this!
  (let-location ((ifa (c-pointer "struct ifaddrs"))
                 (i (c-pointer "struct ifaddrs")))
    (grab-ifa! (location ifa))
    (unless ifa (error "Could not allocate ifaddrs"))
    (set! i ifa)

    (handle-exceptions exn
      (begin (free-ifa! (location ifa))      ; Prevent memory leak, and
             (signal exn))                   ; re-raise the exception
      (let lp ((result '()))
        (cond ((next-ifa (location i)) =>
               (lambda (iface)
                 (lp (cons iface result))))
               (free-ifa! (location ifa))

;; We're once again back to constructing a list!
(print "The following interfaces are available: " (interfaces))

This compiles to something very similar to the code behind a foreign-safe-lambda, but it's obviously going to be a lot bigger due to it being cut up, so I won't duplicate the C code here. Remember, you can always inspect it yourself with csc -k.

Anyway, this is like the foreign-safe-lambda, but without the implicit GC. Also, instead of "returning" the value through C_return() we explicitly call the continuation C_k through the C_kontinue() macro, with the value we want to pass on to the cond. If we wanted to return two values, we could simply use the C_values() macro instead; we're free to do whatever Scheme can do, so we can even return multiple values, as long as the continuation accepts them.

If an exception happens anywhere in this code, we won't get a memory leak due to the stack being blown up. However, like in any C code, we need to free up the memory behind the interface addresses. So we can't really escape our cleanup duty!

You might think that there's one more problem with foreign-primitive: because it doesn't force a GC before calling the C function, there's still no guarantee about how much space you still have on the stack. Luckily, CHICKEN has a C_STACK_RESERVE, which defines how much space that is guaranteed to be left on the stack after each C_demand(). Its value is currently 0x10000 (i.e., 64 KiB), which means you have some headroom to do basic allocations like we do here, but you shouldn't allocate too many huge objects. There are ways around that, but unfortunately not using the "official" FFI (that I'm aware of, anyway). For now we'll stick with the official Scheme API.

The die-hard way: calling Scheme closures from C

So far, we've discussed pretty much only things you can find in the CHICKEN manual's section on the FFI. Let's take a look at how we can do things a little differently, and instead of passing the string or #f to a continuation, we pass the callback as a procedure again, just like we did for the "easy" way:

(import foreign)
(use lolevel)

(foreign-declare "#include \"sys/types.h\"")
(foreign-declare "#include \"sys/socket.h\"")
(foreign-declare "#include \"ifaddrs.h\"")

(define grab-ifa!
  (foreign-lambda* void (((c-pointer (c-pointer "struct ifaddrs")) ifa))
    "if (getifaddrs(ifa) != 0)\n"
    "  *ifa = NULL;\n"))

(define free-ifa!
  (foreign-lambda* void (((c-pointer (c-pointer "struct ifaddrs")) ifa))

(define next-ifa
  (foreign-primitive (((c-pointer (c-pointer "struct ifaddrs")) ifa)
                      (scheme-object more) (scheme-object done))
    "C_word len, str, *a;\n"
    "if (*ifa) {\n"
    "  len = strlen((*ifa)->ifa_name);\n"
    "  a = C_alloc(C_SIZEOF_STRING(len));\n"
    "  str = C_string(&a, len, (*ifa)->ifa_name);\n"
    "  *ifa = (*ifa)->ifa_next;\n"
    "  ((C_proc3)C_fast_retrieve_proc(more))(3, more, C_k, str);\n"
    ;; Alternatively:
    ;; "  C_save(str); \n"
    ;; "  C_do_apply(2, more, C_k); \n"
    ;; Or, if we want to call Scheme's APPLY directly (slower):
    ;; "  C_apply(5, C_SCHEME_UNDEFINED, C_k, more, \n"
    ;; "          str, C_SCHEME_END_OF_LIST); \n"
    "} else {\n"
    "  ((C_proc2)C_fast_retrieve_proc(done))(2, done, C_k);\n"
    ;; Alternatively:
    ;; "  C_do_apply(0, done, C_k); \n"
    ;; Or:
    ;; "  C_apply(4, C_SCHEME_UNDEFINED, C_k, done, C_SCHEME_END_OF_LIST);\n"

(define (interfaces)
  (let-location ((ifa (c-pointer "struct ifaddrs"))
                 (i (c-pointer "struct ifaddrs")))
    (grab-ifa! (location ifa))
    (unless ifa (error "Could not allocate ifaddrs"))
    (set! i ifa)

    (handle-exceptions exn
      (begin (free-ifa! (location ifa))
             (signal exn))
      (let lp ((result '()))
        (next-ifa (location i)
                  (lambda (iface)               ; more
                    (lp (cons iface result)))
                  (lambda ()                    ; done
                    (free-ifa! (location ifa))

(print "The following interfaces are available: " (interfaces))

The magic lies in the expression ((C_proc3)C_fast_retrieve_proc(more))(3, more, C_k, str). We've seen something like it before in generated C code snippets: First, it extracts the C function pointer from the closure object in more. Then, the function pointer is cast to the correct type; C_proc3 refers to a procedure which accepts three arguments. This excludes the argument count, which actually is the first argument in the call. The next argument is the closure itself, which is needed when the closures has local variables it refers to (like result and lp in the example). The argument after the closure is its continuation. We just pass on C_k: the final continuation of both more and done is the continuation of lp, which is also the continuation of next-ifa. Finally, the arguments following the continuation are the values passed as arguments: iface for the more closure.

The done closure is invoked as C_proc2 with only itself and the continuation, but no further arguments. This corresponds to the fact that done is just a thunk.

I've shown two alternative ways to call the closure. The first is to call the closure through the C_do_apply function. This is basically a dispatcher which checks the argument count and uses the correct C_proc<n> cast and then calls it with the arguments, taken from a temporary stack on which C_save places the arguments. The implementation behind it is positively insane, and worth checking out for the sheer madness of it.

The second alternative is to use C_apply, which is the C implementation of Scheme's apply procedure. It's a bit awkward to call from C, because this procedure is a true Scheme procedure. That means it accepts an argument count, itself and its continuation and only then its arguments, which are the closure and the arguments to pass to the closure, with the final argument being a list:

(apply + 1 2 '(3 4)) => 10

In C this would be:

C_apply(6, C_SCHEME_UNDEFINED, C_k, C_closure(&a, 1, C_plus),
        C_fix(1), C_fix(2), C_list2(C_fix(3), C_fix(4)));

It also checks its arguments, so if you pass something that's not a list as its final argument, it raises a nice exception:

(import foreign)
((foreign-primitive ()
   "C_word ab[C_SIZEOF_CLOSURE(1)], *a = ab; \n"
   "C_apply(4, C_SCHEME_UNDEFINED, C_k, "
   "        C_closure(&a, 1, (C_word)C_plus), C_fix(1));"))

This program prints the following when executed:

 Error: (apply) bad argument type: 1
         Call history:
         test.scm:2: g11         <--

And this brings us to our final example, where we go absolutely crazy.

The guru way: Calling Scheme closures you didn't receive

You might have noticed that the error message above appears without us passing the error procedure to +, and if you had wrapped the call in an exception handler it would've called its continuation, without us passing it to the procedure. In some situations you might like to avoid boring the user with passing some procedure to handle some exceptional situation. Let's see if we can do something like that ourselves!

It turns out to be pretty easy:

(import foreign)
(use lolevel)

(foreign-declare "#include \"sys/types.h\"")
(foreign-declare "#include \"sys/socket.h\"")
(foreign-declare "#include \"ifaddrs.h\"")

(define grab-ifa!
  (foreign-lambda* void (((c-pointer (c-pointer "struct ifaddrs")) ifa))
    "if (getifaddrs(ifa) != 0)\n"
    "  *ifa = NULL;\n"))

(define free-ifa!
  (foreign-lambda* void (((c-pointer (c-pointer "struct ifaddrs")) ifa))

(define (show-iface-name x)
  (print x)

(define next-ifa
  (foreign-primitive (((c-pointer (c-pointer "struct ifaddrs")) ifa))
    "C_word len, str, *a, show_sym, show_proc;\n"
    "if (*ifa) {\n"
    "  len = strlen((*ifa)->ifa_name);\n"
    "  a = C_alloc(C_SIZEOF_INTERNED_SYMBOL(15) + C_SIZEOF_STRING(len));\n"
    "  str = C_string(&a, len, (*ifa)->ifa_name);\n"
    "  *ifa = (*ifa)->ifa_next;\n"
    ;; The new bit:
    "  show_sym = C_intern2(&a, C_text(\"show-iface-name\"));\n"
    "  show_proc = C_block_item(show_sym, 0);\n"
    "  ((C_proc3)C_fast_retrieve_proc(show_proc))(3, show_proc, C_k, str);\n"
    "} else {\n"
    "  C_kontinue(C_k, C_SCHEME_FALSE);\n"

(define (interfaces)
  (let-location ((ifa (c-pointer "struct ifaddrs"))
                 (i (c-pointer "struct ifaddrs")))
    (grab-ifa! (location ifa))
    (unless ifa (error "Could not allocate ifaddrs"))
    (set! i ifa)

    (handle-exceptions exn
      (begin (free-ifa! (location ifa))
             (signal exn))
      (let lp ()
        ;; next-ifa now returns true if it printed an interface and is
	;; ready to get the next one, or false if it reached the end.
        (if (next-ifa (location i))
            (free-ifa! (location ifa)))))))

(print "The following interfaces are available: ")

This uses C_intern2 to look up the symbol for "show-iface-name" in the symbol table (or intern it if it didn't exist yet). We store this in show_sym. Then, we look at the symbol's first slot, where the value is stored for the global variable identified by the symbol. The value slot always exists, but if it is undefined, the value is C_SCHEME_UNDEFINED. Anyway, we assume it's defined and we call it like we did in the example before this one: extract the first slot from the closure and call it.

This particular example isn't very useful, but the technique can be used to invoke hook procedures, and in fact the core itself uses it from barf() when it invokes ##sys#error-hook to construct and raise an exception when an error situation occurs in the C runtime.

Flattr this!  (why?)

One of CHICKEN's coolest features has to be its unique approach to garbage collection. When someone asked about implementation details (hi, Arthur!), I knew this would make for an interesting blog post. This post is going to be long and technical, so hold on to your hats! Don't worry if you don't get through this in one sitting.


There's a whole lot of stuff that we'll need to explain before we get to the actual garbage collector. CHICKEN's garbage collection (GC) strategy is deeply intertwined with its compilation strategy, so I'll start by explaining the basics of that, before we can continue (pun intended) with the actual GC stuff.

A short introduction to continuation-passing style

The essence of CHICKEN's design is a simple yet brilliant idea by Henry Baker, described in his paper CONS Should Not CONS Its Arguments, Part II: Cheney on the M.T.A.. The paper is pretty terse, but it's well-written, so I recommend you check it out before reading on. If you grok everything in it, you probably won't get much out of my blog post and you can stop reading now. If you don't grok it, it's probably a good idea to re-read it again later.

Baker's approach assumes a Scheme to C compiler which uses continuation-passing style (CPS) as an internal representation. This is the quintessential internal representation of Scheme programs, going all the way back to the first proper Scheme compiler, RABBIT.

Guy L. Steele (RABBIT's author) did not use CPS to make garbage collection easier. In fact, RABBIT had no GC of its own, as it relied on MacLISP as a target language (which compiled to PDP-10 machine code and had its own garbage collector). Instead, continuations allowed for efficient implementation of nested procedure calls. It eliminated the need for a stack to keep track of this nesting by simply returning the "next thing to do" to a driver loop which took care of invoking it. This made it possible to write down iterative algorithms as a recursive function without causing a stack overflow.

Let's consider a silly program which sums up all the numbers in a list, and shows the result multiplied by two:

(define (calculate-sum lst result)
  (if (null? lst)
      (calculate-sum (cdr lst) (+ result (car lst)))))

(define (show-sum lst)
  (print-number (* 2 (calculate-sum lst 0))))

(show-sum '(1 2 3))

A naive compilation to C would look something like this (brutally simplified):

void entry_point() {
  exit(0); /* Assume exit(1) is explicitly called elsewhere in case of error. */

void toplevel() {
  /* SCM_make_list() & SCM_fx() allocate memory.  "fx" stands for "fixnum". */
  SCM_obj *lst = SCM_make_list(3, SCM_fx(1), SCM_fx(2), SCM_fx(3));

SCM_obj* show_sum(SCM_obj *lst) {
  SCM_obj result = calculate_sum(lst, SCM_fx(0));
  /* SCM_fx_times() allocates memory. */
  return SCM_print_number(SCM_fx_times(SCM_fx(2), result));

SCM_obj* calculate_sum(SCM_obj *lst, SCM_obj *result) {
  if (lst == SCM_NIL) { /* Optimised */
    return result;
  } else {
    /* SCM_fx_plus() allocates memory. */
    SCM_obj *tmp = SCM_cdr(lst);
    SCM_obj *tmp2 = SCM_fx_plus(result, SCM_car(lst));
    return calculate_sum(tmp, tmp2); /* Recur */

SCM_obj *SCM_print_number(SCM_obj *data) {
  printf("%d\n", SCM_fx_to_integer(data));
  return SCM_VOID;

This particular implementation probably can't use a copying garbage collector like CHICKEN uses, because the SCM_obj pointers which store the Scheme objects' locations would all become invalid. But let's ignore that for now.

Due to the recursive call in calculate_sum(), the stack just keeps growing, and eventually we'll get a stack overflow if the list is too long. Steele argued that this is a silly limitation which results in the proliferation of special-purpose "iteration" constructs found in most languages. Also, he was convinced that this just cramps the programmer's style: we shouldn't have to think about implementation details like the stack size. In his time people often used goto instead of function calls as a performance hack. This annoyed him enough to write a rant about it, which should be required reading for all would-be language designers!

Anyway, a compiler can transparently convert our Scheme program into CPS, which would look something like this after translation to C:

/* Set up initial continuation & toplevel call. */
void entry_point() {
  SCM_cont *cont = SCM_make_cont(1, &toplevel, SCM_exit_continuation);
  SCM_call *call = SCM_make_call(0, cont);

void SCM_driver_loop(SCM_call *call) {
  /* The trampoline to which every function returns its continuation. */
    call = SCM_perform_continuation_call(call);

SCM_call *toplevel(SCM_cont *cont) {
  SCM_cont *next = SCM_make_cont(1, &show_sum, cont);
  SCM_obj *lst = SCM_make_list(3, SCM_fx(1), SCM_fx(2), SCM_fx(3));
  return SCM_make_call(1, next, lst);

SCM_call *show_sum(SCM_cont *cont, SCM_obj *lst) {
  SCM_cont *next = SCM_make_cont(1, &show_sum_continued, cont);
  SCM_cont *now = SCM_make_cont(2, &calculate_sum, next);
  return SCM_make_call(2, now, lst, SCM_fx(0));

SCM_call *calculate_sum(SCM_cont *cont, SCM_obj *lst, SCM_obj *result) {
  if (lst == SCM_NIL) { /* Optimised */
    return SCM_make_call(1, cont, result);
  } else {
    SCM_obj *tmp = SCM_cdr(lst);
    SCM_obj *tmp2 = SCM_fx_plus(result, SCM_car(lst));
    SCM_cont *now = SCM_make_cont(2, &calculate_sum, cont);
    return SCM_make_call(2, now, tmp, tmp2); /* "Recur" */

SCM_call *show_sum_continued(SCM_cont *cont, SCM_obj *result) {
  SCM_cont *now = SCM_make_cont(1, &SCM_print_number, cont);
  SCM_obj *tmp = SCM_fx_times(SCM_fx(2), result);
  return SCM_make_call(1, now, tmp);

SCM_call *SCM_print_number(SCM_cont *cont, SCM_obj *data) {
  printf("%d\n", SCM_fx_to_integer(data));
  return SCM_make_call(1, cont, SCM_VOID);

In the above code, there are two new data types: SCM_cont and SCM_call.

An SCM_cont represents a Scheme continuation as a C function's address, the number of arguments which it expects (minus one) and another continuation, which indicates where to continue after the C function has finished. This sounds recursive, but as you can see the very first continuation created by entry_point() is a specially prepared one which will cause the process to exit.

An SCM_call is returned to the driver loop by every generated C function: this holds a continuation and the arguments with which to invoke it. SCM_perform_continuation_call() extracts the SCM_cont from the SCM_call and invokes its C function with its continuation and the arguments from the SCM_call. We won't dwell on the details of its implementation now, but assume this is some magic which just works.

You'll also note that the primitives SCM_car(), SCM_cdr(), SCM_fx_plus() and SCM_fx_times() do not accept a continuation. This is a typical optimisation: some primitives can be inlined by the compiler. However, this is not required: you can make them accept a continuation as well, at the cost of further splintering the C functions into small sections; the calculate_sum() function would be split up into 4 separate functions if we did that.

Anyway, going back to the big picture we can see that this continuation-based approach consumes a more or less constant amount of stack space, because each function returns to driver_loop. Baker's fundamental insight was that the stack is there anyway (and it will be used by C), and if we don't need it for tracking function call nesting, why not use it for something else? He proposed to allocate all newly created objects on the stack. Because the stack would hopefully fit the CPU's cache in its entirety, this could give quite a performance benefit.

Generational collection

To understand why keeping new data together on the stack can be faster, it's important to know that most objects are quite short-lived. Most algorithms involve intermediate values, which are accessed quite a bit during a calculation but are no longer needed afterwards. These values need to be stored somewhere in memory. Normally you would store them together with all other objects in the main heap, which may cause fragmentation of said heap. Fragmentation means that memory references may cross page boundaries. This is slow, because it will clear out the CPU's memory cache and may even require swapping it in, if the machine is low on memory.

On top of that, generating a lot of intermediate values means generating a lot of garbage, which will trigger many GCs during which a lot of these temporary objects will be cleaned up. However, during these GCs, the remaining longer-lived objects must also be analysed before it can be decided they can stick around.

This is rather wasteful, and it turns out we can avoid doing so much work by categorising objects by age. Objects that have just been created belong to the first generation and are stored in their own space (called the nursery - I'm not kidding!), while those that have survived several GC events belong to older generations, which each have their own space reserved for them. By keeping different generations separated, you do not have to examine long-lived objects of older generations (which are unlikely to be collected) when collecting garbage in a younger generation. This can save us a lot of wasted time.

Managing data on the stack

The Cheney on the M.T.A. algorithm as used by CHICKEN involves only two generations; one generation consisting of newly created objects and the other generation consisting of older objects. In this algorithm, new objects get immediately promoted (or tenured) to the old generation after a GC of the nursery (or stack). Such a GC is called a minor GC, whereas a GC of the heap is called a major GC.

This minor GC is where the novelty lies: objects are allocated on the stack. You might wonder how that can possibly work, considering the lengths I just went through to explain how CPS conversion gets rid of the stack. Besides, by returning to the trampoline function whenever a new continuation is invoked, anything you'd store on the stack would need to get purged (that's how the C calling convention works).

That's right! The way to make this work is pretty counter-intuitive: we go all the way back to the first Scheme to C conversion I showed you and make it even worse. Whenever we want to invoke a continuation, we just call its function. That means that the example program we started out with would compile to this:

/* Same as before */
void entry_point() {
  SCM_cont *cont = SCM_make_cont(1, &toplevel, SCM_exit_continuation);
  SCM_call *call = SCM_make_call(0, cont);

SCM_call *saved_cont_call; /* Set by SCM_save_call, read by driver_loop */
jmp_buf empty_stack_state; /* Set by driver_loop, read by minor_GC */

void SCM_driver_loop(SCM_call *call) {
  /* Save registers (including stack depth and address in this function) */
  if (setjmp(empty_stack_state))
    call = saved_cont_call; /* Got here via longjmp()? Use stored call */


void SCM_minor_GC() {
  /* ...
     Copy live data from stack to heap, which is a minor GC.  Described later.
     ... */
  longjmp(empty_stack_state); /* Restore registers (jump back to driver_loop) */

void toplevel(SCM_cont *cont) {
  if (!fits_on_stack(SCM_CONT_SIZE(0) + SCM_CALL_SIZE(1) +
                     SCM_FIXNUM_SIZE * 3 + SCM_PAIR_SIZE * 3)) {
    SCM_save_call(0, &toplevel, cont); /* Mutates saved_cont_call */
    SCM_minor_GC(); /* Will re-invoke this function from the start */
  } else {
    /* The below stuff will all fit on the stack, as calculated in the if() */
    SCM_cont *next = SCM_make_cont(1, &show_sum, cont);
    SCM_obj *lst = SCM_make_list(3, SCM_fx(1), SCM_fx(2), SCM_fx(3));
    SCM_call *call = SCM_make_call(1, next, lst);

void show_sum(SCM_cont *cont, SCM_obj *lst) {
  if (!fits_on_stack(SCM_CONT_SIZE(0) * 2 +
                     SCM_CALL_SIZE(2) + SCM_FIXNUM_SIZE)) {
    SCM_save_call(1, &show_sum, cont, lst);
  } else {
    SCM_cont *next = SCM_make_cont(1, &show_sum_continued, cont);
    SCM_cont *now = SCM_make_cont(2, &calculate_sum, next);
    SCM_call *call = SCM_make_call(2, now, lst, SCM_fx(0));

void calculate_sum(SCM_cont *cont, SCM_obj *lst, SCM_obj *result) {
  /* This calculation is overly pessimistic as it counts both arms
     of the if(), but this is acceptable */
  if (!fits_on_stack(SCM_CALL_SIZE(1) + SCM_FIXNUM_SIZE +
                     SCM_CONT_SIZE(1) + SCM_CALL_SIZE(2))) {
    SCM_save_call(2, &calculate_sum, cont, lst, result);
  } else {
    if (lst == SCM_NIL) {
      SCM_call *call = SCM_make_call(1, cont, result);
    } else {
      SCM_obj *tmp = SCM_cdr(lst);
      SCM_obj *tmp2 = SCM_fx_plus(result, SCM_car(lst));
      SCM_cont *now = SCM_make_cont(2, &calculate_sum, cont);
      SCM_call *call = SCM_make_call(2, now, tmp, tmp2);
      SCM_perform_continuation_call(call); /* "Recur" */

void show_sum_continued(SCM_cont *cont, SCM_obj *result) {
  if (!fits_on_stack(SCM_CONT_SIZE(1) + SCM_CALL_SIZE(1) + SCM_FIXNUM_SIZE)) {
    SCM_save_call(1, &show_sum_continued, cont, result);
  } else {
    SCM_cont *now = SCM_make_cont(1, &SCM_print_number, cont);
    SCM_obj *tmp = SCM_fx_times(SCM_fx(2), result);
    SCM_call *call = SCM_make_call(1, now, tmp);

void SCM_print_number(SCM_cont *cont, SCM_obj *data) {
  if (!fits_on_stack(SCM_CALL_SIZE(1))) {
    SCM_save_call(1, &show_sum_continued, cont, data);
  } else {
    printf("%d\n", SCM_fx_to_integer(data));
    SCM_call *call = SCM_make_call(1, cont, SCM_VOID);

Whew! This program is quite a bit longer, but it isn't that different from the second program I showed you. The main change is that none of the continuation functions return anything. In fact, these functions, like Charlie in the M.T.A. song, never return. In the earlier version every function ended with a return statement, now they end with an invocation of SCM_perform_continuation_call().

To make things worse, allocating functions now also use alloca() to place objects on the stack. That means that the stack just keeps filling up like the first compilation I showed you, so we're back to where we started! However, this program is a lot longer due to one important thing: At the start of each continuation function we first check to see if there's enough space left on the stack to accommodate the objects this function will allocate.

If there's not enough space, we re-create the SCM_call with which this continuation function was invoked using SCM_save_call(). This differs from SCM_make_call() in that it will not allocate on the stack, but will use a separate area to set aside the call object. The pointer to that area is stored in saved_cont_call.

SCM_save_call() can't allocate on the stack for a few reasons: The first and most obvious reason is that the saved call wouldn't fit on the stack because we just concluded it is already full. Second, the arguments to the call must be kept around even when the stack is blown away after the GC has finished. Third, this stored call contains the "tip" of the iceberg of live data from which the GC will start its trace. This is described in the next section.

After the minor GC has finished, we can jump back to the trampoline again. We use the setjmp() and longjmp() functions for that. When the first call to SCM_driver_loop() is made, it will call setjmp() to save all the CPU's registers to a buffer. This includes the stack and instruction pointers. Then, when the minor GC finishes, it calls longjmp() to restore those registers. Because the stack and instruction pointer are restored, this means execution "restarts" at the place in driver_loop() where setjmp() was invoked. The setjmp() then returns again, but now with a nonzero value (it was zero the first time). The return value is checked and the call is fetched from the special save area to get back to where we were just before we performed the GC.

This is half the magic, so please make sure you understand this part!

The minor GC

The long story above served to set up all the context you need to know to dive into the GC itself, so let's take a closer look at it.

Picking the "live" data from the stack

As we've seen, the GC is invoked when the stack has completely filled up. At this point, the stack is a complete mess: it has many stack frames from all the function calls that happened between the previous GC and now. These stack frames consist of return addresses for the C function calls (which we're not even using), stack-allocated C data (which we don't need) and somewhere among that mess there are some Scheme objects. These objects themselves also belong to two separate categories: the "garbage" and the data that's still being used and needs to be kept around (the so-called live data). How on earth are we going to pick only the interesting bits from that mess?

Like I said before, the saved call contains the "tip of the iceberg" of live data. It turns out this is all we need to get at every single object which is reachable to the program. All you need to do is follow the pointers to the arguments and the continuation stored in the call. For each of these objects, you copy them to the heap and if they are compound objects you follow all the pointers to the objects stored within them, and so on. Let's take a look at a graphical representation of how this works. In the picture below I show the situation where a GC is triggered just after the second invocation of calculate-sum (i.e., the first recursive call of itself, with the list '(2 3)):

After the initial shock from seeing this cosmic horror has worn off, let's take a closer look. It's like a box of tangled cords: if you take the time to carefully untangle them, it's easy, but if you try to do it all at once, it'll leave you overwhelmed. Luckily, I'm going to talk you through it. (by the way: this is an SVG so you can zoom in on details as far as you like using your browser's zooming functionality).

Let's start with the big picture: On the left you see the stack, on the right the heap after copying and in the bottom centre there's a small area of statically allocated objects, which are not subject to GC. To get your bearings, check the left margin of the diagram. I have attempted to visualise C stack frames by writing each function's name above a line leading to the bottom of its frame.

Let's look at the most recently called function, at the top of the stack. This is the function which initiated the minor GC: the second call to calculate_sum(). The shaded area shows the pointers set aside by SCM_save_call(), which form the tip of the iceberg of live data. More on that later.

The next frame belongs to the first call to calculate_sum(). It has allocated a few things on the stack. The topmost element on the stack is the last thing that's allocated due to the way the stack grows upwards in this picture. This is a pointer to an SCM_call object, marked with "[call]", which is the name of the variable which is stored there. If you go back to the implementation of calculate_sum(), you can see that the last thing it does is allocate an SCM_call, and store its pointer in call. The object itself just precedes the variable on the stack, and is marked with a thick white border to group together the machine words from which it is composed. From bottom to top, these are:

  • A tag which indicates that this is a call containing 2 arguments,
  • a pointer to an SCM_cont object (taken from the now variable),
  • a pointer to an SCM_obj object (the cdr of lst, taken from tmp) and
  • a pointer to an SCM_obj object (a fixnum, taken from tmp2).

Other compound objects are indicated in the same way.

You'll also have noticed the green, white and dashed arcs with arrow tips. These connect pointers to their target addresses. The dashed ones on the right hand side of the stack indicate pointers that are used for local variables in C functions or SCM_call objects. These pointers are unimportant to the garbage collector. The ones on the left hand side of the stack are pointers from Scheme objects to other Scheme objects. These are important to the GC. The topmost pointer inside the call object we just looked at has a big dashed curve all the way down to the cdr of lst, and the one below it points at the value of result, which is the fixnum 1.

If you look further down the stack, you'll see the show_sum procedure which doesn't really allocate much: an SCM_call, the initial intermediate result (fixnum 0), and two continuations (next and now in the C code). The bulk of the allocation happens in toplevel, which contains the call to show_sum and allocates a list structure. This is on the stack in reverse order: first the pair X = (3 . ()), then the pair Y = (2 . <X>) and the pair Z = (1 . <Y>). The () is stored as SCM_NIL in the static area, to which the cdr of the bottom-most pair object on the stack points, which is represented by a long green line which swoops down to the static area.

Copying the live data to the heap

The green lines represent links from the saved call to the live data which we need to copy. You can consider the colour green "contagious": imagine everything is white initially, except for the saved call. Then, each line starting at the pointers of the call are painted green. The target object to which a line leads is also painted green. Then, we recursively follow lines from pointers in that object and paint those green, etc. The objects that were already in the heap or the static area are not traversed, so they stay white.

When an object is painted green, it is also copied to the heap, which is represented by a yellow line. The object is then overwritten by a special object which indicates that this object has been moved to the heap. This special object contains a forwarding pointer which indicates the new location of the object. This is useful when you have two objects which point to the same other object, like for example in this code:

(let ((a (list 3 2 1))
      (b (cons 4 a))
      (c (cons 4 a)))

Here you have two lists (4 3 2 1) which share a common tail. If both lists are live at the moment of GC, we don't want to copy this tail twice, because that would result in it being split into two distinct objects. Then, a set-car! on a might only be reflected in b but not c, for example. The forwarding pointers prevent this from happening by simply adjusting a copied object's constituent objects to point to their new locations. Finally, after all data has been copied, all the newly copied objects are checked again for references to objects which may have been relocated after the object was copied.

The precise algorithm that performs this operation is very clever. It requires only two pointers and a while loop, but it still handles cyclic data structures correctly. The idea is that you do the copying I described above in a breadth-first way: you only copy the objects stored in the saved call (without touching their pointers). Next, you loop from the start of the heap to the end, looking at each object in turn (initially, those are the objects we just copied). For these objects, you check their components, and see whether they exist in the heap or in the stack. If they exist in the stack, you copy them over to the end of the heap (again, without touching their pointers). Because they are appended to the heap, the end pointer gets moved to the end of the last object, so the while loop will also take the newly copied objects into consideration. When you reach the end of the heap, you're done. In C, that would look something like this:

SCM_obj *slot;
int i, bytes_copied;
char *scan_start = heap_start;

for(i = 0; i < saved_object_count(saved_call); ++i) {
  obj = get_saved_object(saved_call, i);
  /* copy_object() is called "mark()" in CHICKEN.
     It also set up a forwarding pointer at the original location */
  bytes_copied = copy_object(obj, heap_end);
  heap_end += bytes_copied;

while(scan_start < heap_end) {
  obj = (SCM_obj *)scan_start;
  for(i = 0; i < object_size(obj); ++i) {
    slot = get_slot(obj, i);
    /* Nothing needs to be done if it's in the heap or static area */
    if (exists_in_stack(slot)) {
      if (is_forwarding_ptr(slot)) {
        set_slot(obj, i, forwarding_ptr_target(slot));
      } else {
        bytes_copied = copy_object(slot, heap_end);
        set_slot(obj, i, heap_end);
        heap_end += bytes_copied;
  scan_start += object_size(obj);

This algorithm is the heart of our garbage collector. You can find it in runtime.c in the CHICKEN sources in C_reclaim(), under the rescan: label. The algorithm was invented in 1970 by C.J. Cheney, and is still used in the most "state of the art" implementations. Now you know why Henry Baker's paper is called "Cheney on the M.T.A." :)

After the data has been copied to the heap, the longjmp() in minor_GC() causes everything on the stack to be blown away. Then, the top stack frame is recreated from the saved call. This is illustrated below:

Everything in the shaded red area below the stack frame for driver_loop() is now unreachable because there are no more pointers from live data pointing into this region of the stack. Any live Scheme objects allocated here would have been copied to the heap, and all pointers which pointed there relayed to this new copy. Unfortunately, this stale copy of the data will permanently stick around on the stack, which means this data is forever irreclaimable. This means it is important that the entry point should consume as little stack space as possible.

The major GC

You might be wondering how garbage on the heap is collected. That's what the major GC is for. CHICKEN initially only allocates a small heap area. The heap consists of two halves: a fromspace and a tospace. The fromspace is the heap as we've seen it so far: in normal usage, this is the part that's used. The tospace is always empty.

When a minor GC is copying data from the stack to the fromspace, it may cause the fromspace to fill up. That's when a major GC is triggered: the data in the fromspace is copied to the tospace using Cheney's algorithm. Afterwards, the areas are flipped: the old fromspace is now called tospace and the old tospace is now called fromspace.

During a major GC, we have a slightly larger set of live data. It is not just the data from the saved call, because that's only the stuff directly used by the currently running continuation. We also need to consider global variables and literal objects compiled into the program, for example. These sorts of objects are also considered live data. Aside from this, a major collection is performed the same way as a minor collection.

The smart reader might have noticed a small problem here: what if the amount of garbage cleaned up is less than the data on the stack? Then, the stack data can't be copied to the new heap because it simply is too small. Well, this is when a third GC mode is triggered: a reallocating GC. This causes a new heap to be allocated, twice as big as the current heap. This is also split in from- and tospace. Then, Cheney's algorithm is performed on the old heap's fromspace, using one half of the new heap as tospace. When it's finished, the new tospace is called fromspace, and the other half of the new heap is called tospace. Then, the old heap is de-allocated.

Some practical notes

The above situation is a pretty rough sketch of the way the GC works in CHICKEN. Many details have been omitted, and the actual implementation is extremely hairy. Below I'll briefly mention how a few important things are implemented.

Object representation

You might have noticed that the stack grows pretty quickly in the CPS-converted C code I showed you. That's because the SCM_obj representation requires allocating every object and storing a pointer to it, so that's a minimum of two machine words per object.

CHICKEN avoids this overhead for small, often-used objects like characters, booleans and fixnums. It ensures all allocated objects are word-aligned, so all pointers to objects have their lower bits set to zero. This means you can easily see whether something is a pointer to an object or something else.

All objects in CHICKEN are represented by a C_word type, which is the size of a machine word. So-called immediate values are stored directly inside the machine word, with nonzero lower bits. Non-immediate values are cast to a pointer type to a C structure which contains the type tag and bits like I did in the example.

Calls are not represented by objects in CHICKEN. Instead, the C function is simply invoked directly from the continuation's caller. Continuations are represented as any other object. For didactic reasons, I used a separate C type to distinguish it from SCM_obj, but in Scheme continuations can be reified as first-class objects, so they shouldn't be represented in a fundamentally different way.


You might be wondering how closures are implemented, because this hasn't been discussed at all. The answer is pretty simple: in the example code, a SCM_call object stored a plain C function's address. Instead, we could store a closure instead: this is a new type of object which holds a C function plus its local variables. Each C function receives this closure as an extra argument (in the CHICKEN sources this is called self). When it needs to access a closed-over value, it can be accessed from the closure object.


Another major oversight is the assumption that objects can only point from the stack into the heap. If Scheme was a purely functional language, this would be entirely accurate: new objects can refer to old objects, but there is no way that a preexisting object can be made to refer to a newly created object. For that, you need to support mutation.

But Scheme does support mutation! So what happens when you use vector-set! to store a newly created, stack-allocated value in an old, heap-allocated vector? If we used the above algorithm, the newly created element would either be part of the live set and get copied, but the vector's pointer would not be updated, or it wouldn't be part of the live set and the object would be lost in the stack reset.

The answer to this problem is also pretty simple: we add a so-called write barrier. Whenever a value is written to an object, it is remembered. Then, when performing a GC, these remembered values are considered to be part of the live set, just like the addresses in the saved call. This is also the reason CHICKEN always shows the number of mutations when you're asking for GC statistics: mutation may slow down a program because GCs might take longer.

Stack size

How does CHICKEN know when the stack is filled up? It turns out that there is no portable way to detect how big the stack is, or whether it has a limit at all!

CHICKEN works around this simply by limiting its stack to a predetermined size. On 64-bit systems, this is 1MB, on 32-bit systems it's 256KB. There is also no portable way of obtaining the address of the stack itself. On some systems, it uses a small bit of assembly code to check the stack pointer. On other systems, it falls back on alloca(), allocating a trivial amount of data. The address of the allocated data is the current value of the stack pointer.

When initialising the runtime, just before the entry point is called, the stack's address is taken to determine the stack's bottom address. The top address is checked in the continuation functions, and the difference between the two is the current stack size.

A small rant

While doing the background research for this post, I wanted to read Cheney's original paper. It was very frustrating to find so many references to it, which all lead to a a paywall on the ACM website.

I think it's absurd that the ACM charges $15 for a paper which is over forty years old, and only two measly pages long. What sane person would plunk down 15 bucks to read 2 pages, especially if it is possibly outdated, or not even the information they're looking for?

The ACM's motto is "Advancing Computing as a Science & Profession", but I don't see how putting essential papers behind a paywall is advancing the profession, especially considering how many innovations now happen as unpaid efforts in the open source/free software corner of the world. Putting such papers behind a paywall robs the industry from a sorely-needed historical perspective, and it stifles innovation by forcing us to keep reinventing the wheel.

Some might argue that the ACM needs to charge money to be able to host high-quality papers and maintain its high quality standard, but I don't buy it. You only need to look at USENIX, which is a similar association. They provide complete and perpetual access to all conference proceedings, and the authors maintain full rights to their work. The ACM, instead, has now come up with a new "protection" racket, requiring authors to give full control of their rights to the ACM, or pay for the privilege of keeping the rights on their own work, which is between $1,100 and $1,700 per article.

On a more positive note, authors are given permission to post drafts of their papers on their own website or through their "Author-izer" service. Unfortunately, this service only works when the link is followed directly from the domain on which the author's website is located (through the Referer header). This is not how the web works: it breaks links posted in e-mail as well as search engines.

Secondly, the ACM are also allowing their special interest groups to provide full access to conference papers of the most recent conference. However, this doesn't seem to be encouraged in any way, and only a few SIGs seem to do this.

Luckily, I found a copy of the Cheney paper on some course website. So do yourself a favour and get it before it's taken down :(

Update: If you are also concerned about this, please take a small moment to add your name to this petition.

Further reading

Garbage collection is a fascinating subject whose roots can be traced back all the way to the origins of LISP. There's plenty of information to be found:

Flattr this!  (why?)

Today I'd like to talk about how CHICKEN Scheme handles distribution of language extensions (which we call "eggs"). There are some unique features of our setup that might be interesting to users of other languages as well, and I think the way backwards compatibility was kept is rather interesting.

In the beginning

First, a little bit of history, so you know where we're coming from. CHICKEN was initially released in the year 2000, and the core system was available as a tarball on the website. In 2002 it was moved into CVS and in 2004 to Darcs (yes, there were good open source DVCSes before Git).

Throughout this period, eggs were simply stored as tarballs (curiously bearing a ".egg" extension) in some well-known directory on the CHICKEN website. The egg installation tool had this location built in. For example, the egg named foo would be fetched from http://www.call-with-current-continuation.org/eggs/foo.egg.

To contribute (or update!) an extension, you simply sent a tarball to Felix and he would upload it to the site. This was a very centralised way of working, creating a lot of work for Felix. So in 2005, he asked authors to put all eggs in a version control system: Subversion. At the time, every contributer was given write access to the entire repo! These were simpler times, when we had only a handful of contributors.

The switch to Subversion allowed for a neat trick: whenever an egg was modified, it triggered a "post-commit hook" which tarred up the egg and uploaded it to the website. This was a very simple addition which automated the work done by Felix, while ensuring the existing tools did not have to be modified. Egg authors now had the freedom to modify their code as they liked, and new releases would appear for download within seconds.

If an author used the conventional trunk/tags/branches layout, the post-commit hook automatically detected this and would upload the latest tag. In other words, we reached a level of automation where "making a release" was simply tagging your code!

Documentation for eggs originally lived on the same website as the eggs did, but this was eventually moved into svnwiki, one of the first wikis to use Subversion as a backing store. To make things even simpler, the core system was also moved into Subversion. Now everything was in one system, for everyone to hack on, using the same credentials everywhere. Life was good!

Start of the DVCS wars

This worked great for years, and the number of contributors steadily increased. Meanwhile, distributed version control systems were gaining mainstream popularity, and contributors started experimenting with Git, Mercurial, Bazaar and Fossil. People grumbled that CHICKEN was still on Subversion.

The next major release, CHICKEN 4.0, provided for a "clean slate", with the opportunity to rewrite the distribution system. This simplified things, replacing the brittle post-commit script with a CGI program called "henrietta", which would serve the eggs via HTTP. The download location for eggs was put into a configuration file, which allowed users to host their own mirror. This is useful if for example a company wants to set up a private deployment-server containing proprietary eggs. We also gained a mirror for general use, graciously provided by Alaric.

The difference was that now there was no static tarball, but when you downloaded an egg, its files would be served straight from either svn, a local directory tree or a website. If we ever decided to migrate the egg repository to a completely different version control system, we could simply add a new back-end to Henrietta. Nothing would have to be modified on the client.

The new system

In 2009, CHICKEN core was moved into a Git repository, as it looked like Git was winning the DVCS wars. New users were often complaining about having to use crusty old Subversion. By this time, people even used DVCSes exclusively, only synchronising to the svn repo. This meant it was no longer the "canonical" repository for all eggs. It was becoming nothing but a hassle for those who preferred other VCSes.

Another problem was that we had still a maintenance problem: commit access on the svn repo is centrally managed, through one big mod_authz_svn configuration file, listing which users have access to which "sub-repositories". If someone wants to grant commit access to another developer, this has to be requested via the mailing list or the server's maintainer.


To solve these problems, we started to consider new ways to allow users to develop their eggs using their favorite VCS. The new system had a few strict requirements:

  • It had to be completely backwards-compatible. No changes should be made to CHICKEN core. New eggs published through this system should be available to older CHICKENs, too.
  • It had to be completely VCS-independent. We want to avoid extra work when the next VCS fad comes along. Furthermore, it should work with all popular code hosters, for maximum freedom of choice. Self-hosting should explicitly be an option.
  • The existing workflow of egg authors should not fundamentally change; especially the release procedure of making a tag should stay.
  • There should be a way to avoid broken links if someone takes down their repo.
  • Most of all, the system had to be simple.

A simple solution

The simplest way to make the distribution system VCS-independent is to ignore VCSes altogether! Instead, we download source files over HTTP and mirror them from the CHICKEN server.

This idea was rather natural: our Subversion setup had always allowed direct access to plain files over HTTP through mod_dav_svn. Most popular code hosting sites (Github, Bitbucket, Google Code etc) also allow this, either directly or via some web repo viewer's "download raw file" link, which can be constructed from a VCS tag and file name. Also, Henrietta already supported serving eggs from a local directory tree which meant we had to make almost no modifications to our existing tool chain.

To make this work, all that's needed is:

  • Some daemon which periodically fetches new eggs.
  • A "master list" of where each egg is hosted.
  • For each egg, a list of released versions for that egg.
  • A "base URI" where the files for that release can be downloaded.
  • A list of files for that release (or the name of a tarball, which is equivalent).

We already had a so-called ".meta-file", which contains info about the egg (author, license, name, category etc). In an earlier incarnation of the post-commit hook this file also contained a list of the files that the egg consisted of, so it made sense to re-use this facility.

We only needed to take care of the daemon, the master egg list and a way to communicate the base URI. This was simple, and I wrote the daemon (dubbed "henrietta-cache") over a weekend during a hackathon. It really is simple and consists of only 300+ lines of (rather ugly) Scheme code. At the hackathon, Moritz helped out by moving the existing eggs to this new scheme, and testing with various hosting providers.

But not the simplest solution

The clever reader has probably already noted that the setup could be simplified by putting the henrietta-cache logic into the client program. We chose not to do this because it would break two requirements: that of backwards compatibility and that of avoiding broken links.

Strictly speaking, the backwards compatibility problem could be solved by embedding the functionality into chicken-install and eventually removing henrietta-cache from the server.

Broken links are a bigger problem, though. Currently, if a repo becomes unavailable, this is no problem; we still have a cached copy on our servers. Even if the repo goes offline forever and nobody has a copy of it anymore, we can still import the cached files into a fresh repo and take over maintenance from there.

Some incremental improvements

Unfortunately, the new system made it easier for Github and Bitbucket users than for CHICKEN Subversion users to maintain their eggs, because these sites allow tarball downloads, while the Subversion users had to list each file in their egg in the meta file. Under the old system this was not required, because it simply offered the entire svn egg directory for download.

After some people complained about having to do this extra step, I wrote another simple "helper" egg with the tongue-in-cheek name "pseudo-meta-egg-info". This is a small (80 lines) Spiffy web application which can generate "pseudo" meta files containing a full list of all the files in a Subversion subdirectory, and a list of all the tags available. This all happens on-the-fly, which means that egg authors could now revert to their old workflow of simply tagging their egg to make a release!

Technically, this helper webapp can be extended and deployed for any hosting site, so if you decide to host your own repository it could generate the list of tags and files for that, too. CHICKEN isn't big enough to ask Google, Github or Bitbucket to run this on their servers, of course, so some helper plug-ins and shell scripts for svn, hg and git were made as well. These will generate the list of tags and file names and put them in the meta- and release-info files.

Current status

The new system has been in use for over two years (since March 29th, 2011) and it has been doing a good job, requiring only very little maintenance and few modifications after the initial release. We've already reaped the benefits of our setup: Github and Bitbucket both had several periods of downtime, during which eggs were still available, even if they were hosted there.

The following graph shows the number of available CHICKEN eggs, starting with the "release 4" branch (requires an SVG-capable browser). There's a small skew because the script I used to generate the graph only checked for existence, not whether the egg was released.

As you can see, Mercurial (hg) and Git took off almost simultaneously, but where git is still steadily increasing in popularity, hg mostly stagnated. Subversion (svn) saw a few drops from eggs that were moved into hg/git. You'd guess that most git users would use Github, but it turns out that Bitbucket is reasonably popular among Chicken users too. We also have three authors who have opted to host their own repositories. You can see this in the breakdown of today's eggs by host:

Hosting site VCSes # of eggs
code.call-cc.org svn 454
github.com git 85
bitbucket.org hg, git 41
gitorious.org git 5
chust.org fossil 5
kitten-technologies.co.uk fossil 3
code.stapelberg.de git 1

Finally, the graph shows that people are still releasing new eggs from svn, but most new development takes place in git. And yes, there are a few eggs in Fossil, too! Bazaar is currently not listed. One possible explanation is that Loggerhead (its web viewer) does not allow easy construction of stable URLs to raw files for a particular tag (or zip file/tarball), so serving up eggs straight from a repo is not possible. Another reason could be that bzr simply isn't that popular among CHICKEN users. If you're a bzr user and would like to use this distribution scheme, please have a look at Loggerhead issues #473691 and #739022. If you know a way around this, please share your knowledge on our release instructions page.

Things to improve

Needless to say, I'm rather happy that the system satisfied all the requirements we set for it, and that it saw such uptake. The majority of newly released eggs are using one of the new systems (too bad it's Git, but I guess that's inevitable).

However, as always, there is room for improvement. The current system has a few flaws, all of which are due to the fact that henrietta-cache simply copies code off an external site:

  • There's no "stable" tarball per egg release. This is required for OS package managers, which usually verify with a checksum whether the source package has not changed. Recently, Mario improved on this situation by providing tarballs, but these are merely tarballs of the henrietta-cache mirror on that particular server. However, these should be expected to be stable...
  • If an egg author moves tags around, nobody will know. Different henrietta-cache mirrors may then have an inconsistent view of the distributed repository. We have two egg mirrors, and so far this has happened once or twice. This requires some manual intervention: just blow away all the cached files and wait for it to re-synch, or trigger the synch manually.
  • Egg authors cannot sign their eggs; each egg is downloaded from a source that may not be trustworthy. This is tricky, especially because most people don't want to mess around with PGP keys anyway. CHICKEN core releases aren't signed either, so this isn't very high on our priority list.

I think some of these problems are a result of "going distributed", similar to the problem that you should not rewrite history that has already been pushed.

Flattr this!  (why?)

Older articles...
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 License. All code fragments on this site are hereby put in the public domain.