CHICKEN's numeric tower: part 1

Originally, CHICKEN only supported fixnums (small integers) and flonums (floating-point numbers). The upcoming CHICKEN 5 will support a full numeric tower, adding arbitrary-length integers, rational numbers and complex numbers. This is the first in a series of blog posts about my journey to make this a reality. We'll start with a bit of background information. Later parts will dive into the technical details.

In the beginning, there were two numerical types

Like I mentioned, CHICKEN originally only supported fixnums and flonums. This is still the case in CHICKEN 4. When a fixnum overflows, it is coerced into a flonum. On 32-bit systems, this buys us 52 bits of precision, which is more than the 30 bits of precision fixnums offer:

 #;1> most-positive-fixnum
 1073741823
 #;2> (+ most-positive-fixnum 1)
 1073741824.0

This works reasonably well, and is well-behaved until you go beyond the 52 bits supported by the floating-point representation:

 #;3> (flonum-print-precision 100)
 #;4> (expt 2 53)
 9007199254740992.0
 #;5> (+ (expt 2 53) 1)
 9007199254740992.0
 #;6> (= (expt 2 53) (+ (expt 2 53) 1))
 #t

On a 64-bit machine, overflow of the 62 bits of a fixnum to the 52 bits of a flonum is rather weird:

 #;1> (= most-positive-fixnum (- (+ most-positive-fixnum 1) 2))
 #t

Since we only have fixnums and flonums, any attempt to enter a rational number will result in a flonum:

 #;1> 1/2
 0.5
 #;2> 1/3
 0.333333333333333

Complex numbers are not supported at all:

 #;1> 1+2i
 
 Error: unbound variable: 1+2i

Of course, some people still needed to work with complex numbers, so a long time ago, Thomas Chust created the "complex" egg. This added complex number support to the reader, and the basic numeric operators were overridden to support complex numbers. About a year later, Felix Winkelmann created the original version of the "numbers" egg, using Thomas's code for complex numbers. This added arbitrarily large integers ("bignums" in Lisp parlance) and rational number support via the GNU MP library. Thus, CHICKEN finally had a full numeric tower, and it was completely optional, too. Pretty awesome!

Cracks start to show

Unfortunately, it's not as awesome as it sounds. There are some problems with having parts of the numeric tower as an add-on, instead of having it all in core:

In a Scheme with modules, + from the scheme module should always refer to the same procedure. So if a module imports that + instead of the one from the numbers module, it will not understand extended numeric types. This means that you can't easily combine a library that uses numbers with one that doesn't. If you pass a bignum to the library that does not use numbers, it will raise an exception. This is mostly a problem with Scheme itself, which doesn't have a clean way to define polymorphic procedures. This makes the numeric tower a built-in special case. It is possible to mutate procedures, but allowing for that implies a big performance hit on all code, even if you don't use the numbers egg.
The numbers egg extends the reader to support extended numeric literals. This means that if some code somewhere loads the numbers egg, the reader extension is active even though you didn't load numbers yourself. This can cause confusion because normal numeric operations don't accept these numbers. For an example, see this bug report.
Speaking of extended numeric literals: the compiler doesn't know how to serialise those into the generated C code. This means you can't compile Scheme code containing such literals. You'd have to use string->number everywhere, instead. I found a clever hack to make this work with the numbers egg, but it isn't fool-proof. For instance, it doesn't work when cross-compiling to a platform with different endianness, or if one platform is 32-bit and the other is 64-bit.
The compiler can optimise tight loops by using inline C functions for primitive operations such as the built-in numerical procedures. A current weak spot of CHICKEN is that (as far as I know), eggs can't add such inline C function replacements. So, any code that uses the numbers egg is doomed to have bad performance in critical loops. I think making inlining of C functions available for user code would be a great project (hint, hint!).
Because the FFI (foreign function interface) is built into the compiler, it doesn't support bignums. This means 64-bit integers returned from C are converted to flonums, losing precision. Eggs can't hook into the FFI deeply enough to override this.

One could argue that these are all language or implementation limitations. On the one hand, that's a fair argument. On the other hand, keeping everything "open" so it can be tweaked by the user prevents many optimisations. It also makes the implementation more complex. For instance, there are hooks in core specifically for the numbers egg, to support reading and writing extended literals. The numeric tower needs deeper integration than most other things because numbers are a basic type, much like symbols, strings or lists. So, it makes more sense to have this in the core system.

The start of my quest

Traditionally, Lisps have supported a full numeric tower. At least since the MacLISP days (the early 1970s; see also The History of Lisp), bignums have been pretty standard. Scheme formalises this in the standard, but it does not require full support for all numeric types. Still, in my opinion any serious Lisp or Scheme implementation should support the full numerical tower. It's one of those things that make Lisp unique and more cause for that famous smugness of us Lisp weenies.

It is fantastic when a language supports arbitrarily large integers. Not having to worry about overflows helps prevent various nasty security bugs (luckily, overflowing into flonums, like CHICKEN, mitigates most of these). Bignums can also make it much easier to interact with native code, because integer width is never a problem. It basically frees the programmer from having to think about "unimportant" low-level details. Rational numbers (i.e., fractions like 1/2 or 3/5) and complex numbers are just icing on the cake that add a real feeling of "thoroughness" to Lisp.

This idea, and the fact that other "proper" Scheme implementations support the full numeric tower out of the box always frustrated me. I believe people are less likely to take CHICKEN seriously as a full Scheme implementation. Especially new users are often surprised when CHICKEN does not work as expected. Tutorials don't mention that the numeric tower is partly optional!

More experienced users were also frustrated with the limitations of having numbers as a separate egg, like you can see for example in this thread. In it, some of the problems are indicated, and it is also made clear why a GNU MP-based implementation should not be part of CHICKEN.

From all of this, I decided that the best way to get bignums into core would be to start with finding a good BSD-licensed implementation. Then I could replace GMP with this new implementation in the numbers egg, tweak it to use CHICKEN's naming conventions and finally integrate the new code into core. How hard could it be, really? Little did I suspect that 5 years later, the code would finally be introduced to core!

A very slow, but BSD-licensed implementation

Finding a BSD-licensed bignum implementation is not very difficult, and I quickly settled on the Scheme48 implementation, which was originally taken from MIT Scheme. I've always admired Scheme48 for its extremely clean and easy to understand code base, and CHICKEN core already used the syntax-rules implementation from Scheme48, so it made a lot of sense to use their code. Unfortunately, it turned out that the implementation was extremely inefficient, especially when dealing with rational numbers ("ratnums"). After a few weeks of intensive hacking to fix the worst problems, it was finally ready.

This new implementation was much more efficient than the GMP-based numbers egg, but that's only because the GMP-based version relied heavily on finalizers to clean up memory. The new version integrated properly with the CHICKEN garbage collector. This reduced a whole lot of overhead. Having said that, GMP itself is the fastest bignum implementation you'll ever find, so if you can at all get away with using it in your project, do so!

CHICKEN 5 is announced

The CHICKEN core team (of which I'm a member) decided that CHICKEN 5 should be a clean break, with no backwards compatibility. We wanted to finally restructure the core libraries, which had become rather messy, and change a few confusing aspects about modules. Doing this with backwards compatibility would sap too much development energy and possibly result in an even bigger mess. When this decision was made, I decided that this would be the perfect opportunity to finally integrate the numbers egg into core.

I had been working on the numbers egg on and off over the past years, hoping for a good moment to add it to core. When the opportunity presented itself, at first I naively thought a few tweaks would suffice to integrate it. I thought I only had to make some name changes and rearrange some functions. The Scheme48 code base used very descriptive and highly abstract naming, whereas CHICKEN uses terse names and has both inline and CPS variants for primitive operations. Besides, quite a bit of code in the numbers egg was purely in Scheme, whereas CHICKEN has a more-or-less official C API. So, I had to convert some of the functions to C. This would probably also result in some performance improvements.

Small changes lead to a total rewrite

During the conversion to C, I noticed various opportunities for performance improvements. For instance, the Scheme48 code still relied on malloc() to allocate temporary numbers in several places. Where this was done, the final result of an operation would then be allocated into GC-managed memory and the temporary buffer was immediately freed.

Rewriting the code to allocate directly in GC-able memory resulted in quite the restructuring of the code, because we'd need to have a restartable continuation at every point where an allocation would take place. For example, here's the code for negating a bignum:

static void big_neg(C_word c, C_word self, C_word k, C_word x)
{
  bignum_type big = big_of(x); /* Extract bignum data */
  C_word negated_big = bignum_new_sign(big, !(BIGNUM_NEGATIVE_P (big)));
  C_return_bignum(k, negated_big);
}

static bignum_type bignum_new_sign(bignum_type bignum, int negative_p)
{
  bignum_type result =
    (bignum_allocate ((BIGNUM_LENGTH (bignum)), negative_p));  /* mallocs */
  bignum_destructive_copy (bignum, result);  /* basically a manual memcpy */
  return (result);
}

It looks very simple, but a lot is going on under the hood. The C_return_bignum function contained all the hairy complexity; it would either convert the bignum to a fixnum, deallocate the bignum and call the passed continuation, or it would set up a continuation that would copy the bignum into a heap-allocated copy and deallocate the original bignum, and pass that to an allocation function.

This was changed into the following, which uses the core's _u_ naming convention to indicate that the function is unsafe, i.e. it doesn't check its arguments:

void C_ccall C_u_bignum_negate(C_word c, C_word self, C_word k, C_word x)
{
  C_word kab[C_SIZEOF_CLOSURE(3)], *ka = kab, k2, negp, size;

  /* Create continuation k2, to call after allocation */
  k2 = C_closure(&ka, 3, (C_word)bignum_negate_2, k, x);
  
  negp = C_i_not(C_u_i_bignum_negativep(x)); /* Toggle sign */
  size = C_u_i_bignum_size(x);
  C_allocate_bignum(3, (C_word)NULL, k2, size, negp, C_SCHEME_FALSE);
}

static void bignum_negate_2(C_word c, C_word self, C_word new_big)
{
  C_word k = C_block_item(self, 1), /* Extract original continuation */
         old_big = C_block_item(self, 2); /* Extract original bignum */

  /* Copy old bignum digits to newly allocated (negated) bignum */
  C_memcpy(C_bignum_digits(new_big), C_bignum_digits(old_big),
           C_header_size(C_internal_bignum(old_big))-C_wordstobytes(1));

  C_kontinue(k, new_big); /* "Return" the new bignum by calling k with it */
}

The new version looks hairier but does less, because it allocates the bignum directly into the nursery or the heap. Because this may require a GC, it needs to have a continuation, which can be invoked from the GC's trampoline. That's the reason this has to be cut into two separate C functions. There are functions that allocate 2 bignums or even more, which I had to cut up into 3 or more functions!

Besides using "native" naming conventions, this new version also gets rid of the unnecessary, un-CHICKENish bignum_type abstraction. Instead, it uses only C_word as its type. This also removed the need for some questionable type casts. Luckily, the final negating version that ended up in CHICKEN 5 is a lot simpler and again only one function, but that required a breakthrough in thinking that I hadn't had at this point yet. I will discuss this breakthrough in the final post in this series.

After having taken care of all the functions, very little remained of the original Scheme48 code. It was completely mutilated! Earlier I had to rewrite some of the Scheme code to improve performance, and now I was restructuring the C code. To top it off, after studying other bignum implementations, it became clear that the Scheme48 code was pretty slow when compared to other Schemes. It only implemented the "classical" algorithms, and it was optimised for readability, not speed.

So, I studied up on better algorithms to make it perform more acceptably. In the next few parts, I'll share with you a few things that I've learned.

More magic

Cautionary tales from a programmer

About this blog

CHICKEN's numeric tower: part 1 Posted on 2016-10-10