About this blog

Hi, I'm Peter Bex, a Scheme and free software enthusiast from the Netherlands. See my user page on the CHICKEN wiki or my git server for some of my projects.


Are you in need of support for CHICKEN Scheme? Or maybe you want clear tech writing like on this very blog? Then you're in luck! I am now available as a freelance consultant!

The 3 most recent posts (archive) Atom feed

Recently I joined bevuta IT, where I am now working on a big project written in Clojure. I'm very fortunate to be working in a Lisp for my day job!

As I've mostly worked with Scheme and have used other Lisps here and there, I would like to share my perspective on the language.

Overall design

From a first view, it is pretty clear that Clojure has been designed from scratch by (mostly) one person who is experienced with Lisps and as a language designer. It is quite clean and has a clear vision. Most of the standard library has a very consistent API. It's also nice that it's a Lisp-1, which obviously appeals to me as a Schemer.

My favourite aspect of the language is that everything is designed with a functional-first mindset. This means I can program in the same functional style as I tend to do in Scheme. Actually, it's even more functional, because for example its maps (what would be hash tables in Scheme) are much less clunky to deal with. In Scheme, SRFI-69 hash tables are quite imperative, with hash-table-set! and hash-table-update! being the ways to insert new entries, which of course mutate the existing object. Similarly, vectors can easily be extended (on either end!) functionally.

The underlying design of Clojure's data structures must be different. It needs to efficiently support functional updates; you don't want to fully copy a hash table or vector whenever you add a new entry. I am not sure how efficient everything is, because the system I'm working on isn't in production yet. A quick look at the code implies that various data structures are used under the hood for what looks like one data structure in the language. That's a lot of complexity! I'm not sure that's a tradeoff I'd be happy to make. It makes it harder to reason about performance. You might just be using a completely different underlying data structure than expected, depending on which operations you've performed.

(non) Lispiness

To a seasoned Lisp or Scheme programmer, Clojure can appear positively bizarre. For example, while there is a cons function, there are no cons cells, and car and cdr don't exist. Instead, it has first and rest, which are definitely saner names for a language designed from scratch. It has "persistent lists", which are immutable lists, but in most day to day programming you will not even be using lists, as weird as that sounds!

Symbols and keywords

One thing that really surprised me is that symbols are not interned. This means that two symbols which are constructed on the fly, or when read from the same REPL, are not identical (as in eq or eq?) to one another:

user> (= 'foo 'foo)
true
user> (identical? 'foo 'foo)
false

Keywords seem to fulfil most "symbolic programming" use cases. For example, they're almost always used as "keys" in maps or when specifying options for functions. Keywords are interned:

user> (= :foo :foo)
true
user> (identical? :foo :foo)
true

Code is still (mostly) expressed as lists of symbols, though. When you're writing macros you'll deal with them a lot. But in "regular" code you will deal more with keywords, maps and vectors than lists and symbols.

Numeric tower

A favorite gotcha of mine is that integers are not automatically promoted to bignums like in most Lisps that support bignums. If you need bignums, you have to use special-purpose operators like +' and -':

user> (* (bit-shift-left 1 62) 2)
Execution error (ArithmeticException) at user/eval51159 (REPL:263).
integer overflow
user> (*' (bit-shift-left 1 62) 2)
9223372036854775808N

user> (* (bit-shift-left 1 62) 2N) ; regular * supports BigInt inputs, though
9223372036854775808N
user> (* 1N 1) ; but small BigInts aren't normalized to Java Longs
1N

This could lead to better performance at the cost of more headaches when dealing with the accidental large numbers in code that was not prepared for them.

What about rationals, you ask? Well, those are just treated as "the unusual, slow case". So even though they do normalize to regular integers when simplifying, operations on those always return BigInts:

user> (+ 1/2 1/4)
3/4
user> (+ 1/2 1/2)
1N
user> (/ 1 2) ; division is the odd one out
1/2
user> (/ 4 2) ; it doesn't just punt and always produce bignums, either:
2

The sad part is, bitwise operators do not support bignums, at all:

user> (bit-shift-right 9223372036854775808N 62)
Execution error (IllegalArgumentException) at user/eval51167 (REPL:273).
bit operation not supported for: class clojure.lang.BigInt
user> (bit-shift-right' 9223372036854775808N 62) ; does not exist
Syntax error compiling at (*cider-repl test:localhost:46543(clj)*:276:7).
Unable to resolve symbol: bit-shift-right' in this context

There's one benefit to all of this: if you know the types of something going into numeric operators, you will typically know the type that comes out, because there is no automatic coercion. Like I mentioned, this may provide a performance benefit, but it also simplifies reasoning about types. Unfortunately, this does not work as well as you would hope because division may change the type, depending on whether the result divides cleanly or not.

Syntax

For many Lispers, this is the elephant in the room. Clojure certainly qualifies as a Lisp, but it is much heavier on syntax than most other Lisps. Let's look at a small contrived example:

(let [foo-value (+ 1 2)
      bar-value (* 3 4)]
  {:foo foo-value
   :bar bar-value})

This is a let just like in Common Lisp or Scheme. The bindings are put inside square brackets, which is literal syntax for vectors. Inside this vector, key-value pairs are interleaved, like in a Common Lisp property list.

The lack of extra sets of "grouping" parentheses is a bit jarring at first, but you get used to it rather quickly. I still mess up occasionally when I accidentally get an odd number of entries in a binding vector. Now, the {:foo foo-value :bar bar-value} syntax is a map, which acts like a hash table (more on that below).

There doesn't seem to be a good rationale about why vectors are used instead of regular lists, though. What I do really like is that all the binding forms (even function signatures!) support destructuring. The syntax for destructuring maps is a bit ugly, but having it available is super convenient.

What I regard as a design mistake is the fact that Clojure allows for optional commas in lists and function calls. Commas are just whitespace to the reader. For example:

(= [1, 2, 3, 4] [1 2 3 4]) => true
(= '(1, 2, 3, 4) '(1 2 3 4)) => true
(= {:foo 1, :bar 2, :qux 3} {:foo 1 :bar 2 :qux 3}) => true
(= (foo 1, 2, 3, 4) (foo 1 2 3 4)) => true
;; A bit silly:
(= [,,,,,,1,,,2,3,4,,,,,,] [1 2 3 4]) => true

Maybe this is to make up for removing the extra grouping parentheses in let, cond and map literal syntax? With commas you can add back some clarity about which items belong together. Rarely anybody uses commas in real code, though. And since it's optional it doesn't make much sense.

This has an annoying ripple effect on quasiquotation. Due to this decision, a different character has to be used for unquote, because the comma was already taken:

`(1 2 ~(+ 1 2)) => (1 2 3)
`(1 2 ~@(list 3 4)) => (1 2 3 4)

This might seem like a small issue, but it is an unnecessary and stupid distraction.

Minimalism

One of the main reasons I enjoy Scheme so much is its goal of minimalism. This is achieved through elegant building blocks. This is embodied by the Prime Clingerism:

  Programming languages should be designed not by piling feature on
  top of feature, but by removing the weaknesses and restrictions
  that make additional features appear necessary.

Let's check the size of the clojure.core library. It clocks in at 640 identifiers (v1.10.1), which is a lot more than R5RS Scheme's 218 identifiers. It's not an entirely fair comparison as Scheme without SRFI-1 or SRFI-43 or an FFI has much less functionality as well. Therefore, I think Clojure's core library is fairly small but not exactly an exercise in minimalism.

Clojure reduces its API size considerably by having a "sequence abstraction". This is similar to Common Lisp's sequences: you can call map, filter or length on any sequence-type object: lists, vectors, strings and even maps (which are treated as key/value pairs). However, it is less hacky than in Common Lisp because for example with map you don't need to specify which kind of sequence you want to get back. I get the impression that in Common Lisp this abstraction is not very prominent or used often but in Clojure everything uses sequences. What I also liked is that sequences can be lazy, which removes the need for special operators as well.

If you compare this to Scheme, you have special-purpose procedures for every concrete type: length, vector-length, string-length etc. And there's no vector-map in the standard, so you need vector-map from SRFI 43. Lazy lists are a separate type with its own set of specialized operators. And so on and so forth. Using concrete types everywhere provides for less abstract and confusing code and the performance characteristics of an algorithm tend to be clearer, but it also leads to a massive growth in library size.

After a while I really started noticing mistakes that make additional features appear necessary: for example, there's a special macro called loop to make tail recursive calls. This uses a keyword recur to call back into the loop. In Scheme, you would do that with a named let where you can choose your own identifier to recur. It's also not possible to nest such Clojure loops, because the identifier is hardcoded. So, this called for adding another feature, which is currently in proposal. Speaking of recur, it is also used for tail recursive self-calls. It relies on the programmer rather than the compiler to mark calls as tail recursive. I find this a bit of a cop-out, especially in a language that is so heavily functional. Especially since this doesn't work for mutually tail-recursive functions. The official way to do those is even more of a crutch.

I find the special syntax for one-off lambdas #(foo %) just as misguided as SRFI 26 (cut and cute). You often end up needing to tweak the code in such a way that you have to transform the lambda to a proper fn. And just like cut, it doesn't save that many characters anyway and makes the code less readable.

The -> macro is a clever hack which allows you to "thread" values through expressions. It implicitly adds the value as the first argument to the first forms, the result of that form as the first argument for the next, etc. Because the core library is quite well-designed, this works 90% of the time. Then the other 10% you need ->> which does the same but adds the implicit argument at the end of the forms. And that's not always enough either, so they decided to add a generic version called as-> which binds the value to a name so you can put it at any place in the forms. These macros also don't compose well. For example, sometimes you need a let in a -> chain to have a temporary binding. That doesn't work because you can't randomly insert forms into let, so you have to split things up again.

And as I note below, the minimalism is kind of "fake" because some essentials simply aren't provided; you have to rely on Java for that.

Java integration

Clojure was originally designed as a "hosted language", so it leverages the JVM. It does this admirably well; Java classes can be seamlessly invoked through Clojure, without any ceremony:

user> (java.util.UUID/randomUUID)
#uuid "bb788bae-5099-4a64-9c37-f6219d40a47f"

;; alternatively:
user> (import 'java.util.UUID)
java.util.UUID
user> (UUID/randomUUID)
#uuid "0bfd2092-14e1-4b88-a465-18698943ea4e"

The downside is that the above is the way to generate a random UUID. So even though uuids have literal syntax in Clojure (as #uuid "..."), there is no Lispy API for them in the Clojure standard library. This can be pretty frustrating, especially in the beginning. There's no clear indication where to look; sometimes you'll be poring over Java language docs for random stuff you thought would have a Clojure interface (like, say, creating temporary files or dealing with byte arrays). At those moments, you're basically programming Java with parentheses.

Having said that, there will often be community-provided nicer APIs for many of those things, but then you need to decide between adding an extra dependency just for a slightly nicer syntax.

Development style

REPL-driven development

Speaking of Java, one thing that constantly bothers me is the slow startup times of the REPL. In my current project, it takes almost 30 seconds to boot up a development REPL. Half a minute!

Luckily, there's great Slime-like Emacs integration with CIDER. Basically, the only sane way to do iterative development is by connecting to a REPL first thing you do and then sending your code to it all the time.

Now, this may sound weird from a Scheme programmer, but I never fully bought into the REPL style of developing. Sure, I experiment all the time in the REPL to try out a new API design or to quickly iterate on some function I'm writing. But my general development style tends more towards the "save and then run the test suite from an xterm". Relying solely on the REPL just "feels" jarring to me. I also constantly run into issues where re-evaluating a buffer doesn't get rid of global state that was built up on a previous run. When this happens, I'm testing an old version of some function without realising it. Keeping track of the "live" state versus the textual code I'm looking at is a total mind fuck for me. I don't understand how others can do this.

Another thing I seem to constantly do is write some code, have the tests go all green, only to see the CI crash on some cyclic dependency in my namespaces. The REPL does not always see those, because reloading a buffer with a namespace declaration works just fine when you loaded the imported namespaces before, even though they refer to the namespace being re-evaluated.

One thing I really find very nice when you're using CIDER is that everything (and I do mean everything) from Clojure is just a "jump to source" away. Most of the builtin functions seems to be written in Clojure itself. For example, if you want to know how map is implemented, you can just press M-. to see it.

Maps and keywords for everything

One thing you'll really notice is that in idiomatic Clojure code, maps are used for everything. A map is a functionally updateable hash table. It looks like this:

{:key-1 "value 1"
 :key-2 "value 2"}

This lends to a very dynamic style of programming, very much like you would in (dare I say it?) PHP. A bit of a strange comparison, but PHP also makes dealing with arrays (which double as maps in a weird way) extremely ergonomic. There, missing nested keys are automatically created on the fly and because of a strange quirk in its developmental history, arrays are the only objects which are passed by value. This means you can program in a referentially transparent way, while still mutating them inside functions at will. Not exactly the same mechanism, but the end effect on programming style feels very similar: you reach for them whenever you want to bunch some stuff together. It is the go-to data structure when you need flexibility.

In other Lisps you'd use alists (or plists, or SRFI-69 hash tables) for this, but they don't deal so well with nested maps and the library is not as convenient. For example, you can easily select, drop and rename keys in a map:

(-> {:key-1 "value 1" :key-2 "value 2"}
    (set/rename-keys {:key-1 :key})
    (dissoc :key-2)
    (assoc :foo "bar")) => {:key "value 1" :foo "bar"}

This -> notation took me a while to get used to by the way, and I'm still not entirely comfortable with it. I explained how it works above. It's a macro for "threading" expressions. In Scheme, you'd probably use a let* for this, or something. In Clojure that would look like this:

(let [map {:key-1 "value 1" :key-2 "value 2"}
      map (set/rename-keys map {:key-1 :key})
      map (dissoc map :key-2)
      map (assoc :foo "bar")]
  map) => {:key "value 1" :foo "bar"}

As you can see, the version with -> is much more convenient and less repetitive. Unfortunately, it doesn't compose that well (duh, it's a macro), but because of the way the standard library is designed it is more useful than it would seem at first glance.

Anyway, the way maps are typically used everywhere in a project means that there's a lot less "structure" to your data structures. It is extremely convenient to use maps, even though there are also things like records and protocols. Because of their convenience, you'll end up using maps for everything. As I've noticed in my refactorings, when you change the structure of maps, a lot of code is going to break without a clear indication of where it went wrong.

This is made extra painful by "nil punning". For example, when you look up something in a map that doesn't exist, nil is returned. In Clojure, many operations (like first or rest) on nil just return nil instead of raising an error. So, when you think you are looking up something in a map, but the "map" is actually nil, it will not give an error, but it will return nil.

Now like I said, sometimes you may get an error on nil. It's a bit unclear which operations are nil-punning and which will give a proper error. So when you finally get a nil error, you will have a hell of a time trying to trace back where this nil got generated, as that may have been several function calls ago. This is an example where I really like the strictness of Scheme as compared to some other Lisps, as nil-punning is traditionally a dynamic Lisp thing; it's not unique to Clojure.

Multimethods with keywords

Initially, I was quite impressed by the way multimethods work; they're super simple and clean, yet powerful. First, you declare the multimethod and a "decision procedure", which returns a value that can be compared:

(defmulti say-hi :kind)

(defmethod say-hi :default [animal]
  (println (:name animal) "says hello"))

(defmethod say-hi :duck [animal]
  (println (:name animal) "says quack"))

(defmethod say-hi :dog [animal]
  (println (:name animal) "says woof"))

(say-hi {:name "Daffy" :kind :duck})  => "Daffy says quack"
(say-hi {:name "Pluto" :kind :dog})   => "Pluto says woof"
(say-hi {:name "Peter" :kind :human}) => "Peter says hello"

Using multimethods takes some care and taste, because it splits up your logic. So instead of having one place where you have decisions made with an if or cond tree, you have a function call and then depending on how the multimethod was defined, a different function will be called. This is basically what makes C++ so difficult to deal with in large projects: when people use function overloading, it can get really messy. You need to figure out which of the many things called "say-hi" is actually called in a situation, before you can dive into that implementation.

Compared to the insane amount of customizability that e.g. CLOS offers you, the design restraint shown in Clojure multimethods was nice to see, but then I realised this simplicity can be completely defeated by building hierarchies. That is, Clojure allows you to define a *hierarchy* on *keywords*. This was a huge wtf for me, because to me, keywords are just static entities that are unrelated to eachother.

When you realise how Clojure keywords can be namespaced, it makes slightly more sense: this gives them some separation.

A keyword can appear in "bare" form like :foo. This is a globally scoped keyword that belongs to no particular code. It's definitely not smart to hang a hierarchy onto such a keyword, and you're also better off not adding any "meta attributes" to them.

The other form is ::foo, which puts the keyword in the current namespace, which is shorthand for ::more-magic.net/foo if you are in the more-magic.net namespace.

Conclusion

All in all, Clojure is a well-designed language with neat features and it's certainly a lot better than most other JVM languages. There are things in it that I wish Scheme had, and it's certainly functional and modern. As a general programming language, I just can't get over the JVM and all its Java trappings, which is just not my cup of tea.

Apart from the JVM, there are some gratuitous departures from traditional Lisps, especially the "rich syntax" and the extreme reliance and overloading of keywords and maps.

As always, such things are a matter of taste, so take my opinion with a large grain of salt.


As you may know, I co-maintain the uri-generic egg, together with Ivan Raikov. We had just been working on fixing a bug and porting it to CHICKEN 5 when I stumbled across the WHATWG URL specification, an evolution over RFC 3986. I found it hard to believe they dropped the formal grammar from the RFC, so I checked the issue queue and found a closed ticket from 2015.

They replaced the BNF with a series of steps which is several pages long and overly concerned with implementation-specific details.

It really got to me that such an important and basic part of the web stack is so informally specified. So I wrote an appeal to them to restore a formal grammar in this ticket. I think the reasons are worth being spread more widely, so I'm reproducing it here on my blog.

My request

I would like to offer my opinion from an implementor's perspective and hopefully convince the WG to restore a formal grammar. Let me start by providing some background on where I'm coming from. Feel free to skip this next section.

My background

I am the co-maintainer of the uri-generic egg for CHICKEN Scheme. This implementation attempts to follow RFC 3986 to the letter, and this has resulted in what IMO is a very high-quality implementation (at least, as far as parsing is concerned; URL construction still has some known issues). Oftentimes when we ran into issues, we've compared it with other implementations. It turns out that many of these are lacking in some way or another. I think the main reason is that they're not attempting to really implement the formal grammar (even if they claim to be RFC compliant), while we do. We even have a growing repository of alternative implementations using different parser generators which all pass the same test suite! (feel free to now call me a smug Lisp/Scheme weenie :) )

I wasn't aware of the WHATWG spec until I saw it mentioned in a libcurl post. It piqued my interest because I'm always looking for more test cases. The web platform test suite looks like a big, juicy set to start using in our egg's tests. I'd also consider implementing the WHATWG spec if this increases compatibility with other implementations.

What I expect from a spec

As an implementor, I routinely check the RFC's ABNF as a guide to determine what a valid URL should look like. If someone finds a certain URL our implementation doesn't parse, or if it parses an URL that it shouldn't, the first thing I do is go back to the ABNF in the RFC to verify the behaviour. It is compact, to the point and, for a trained eye, it is trivial to quickly determine if a parser should accept a given (sub)string or not.

The collected ABNF of RFC 3986 is a brief three screenful. In contrast, the algorithm in the WHATWG spec is roughly eighteen screenful. It is an overly detailed and nonstandard way of defining a grammar. This makes it harder to determine which language is accepted by this algorithm. It also makes it hard for me to determine what the changes are, compared to the RFC. Implementing the WHATWG spec would (for me) involve a complete rewrite.

The specification is so focused on the mechanics of a specific manual parsing technique that it almost precludes parser generators or other implementations. Parser generators have a long tradition in theory and practice, and can generate **efficient** language recognisers. Even today, it is an active research field; PEG grammars for example have been "discovered" as recently as 2004.

The way I think about it is that the purpose of this spec is to define what a URL "officially" looks like. So, as an implementor, I don't understand the hesitation to supply a formal grammar. Not having one will likely result in different people interpreting the spec differently. This results in _less_ interoperability, which defeats the point of a spec.

Other reasons why I think a formal grammar is important

Finally, I would like to emphasise the importance of parsers based on formal grammars over ad hoc ones for security reasons. Let's say you have a pipeline of multiple processors which use different URL parsers. For example, you might have a HTML parser on a comment form which cleans URLs by dropping JavaScript and data URLs, among other things, or a mail client which blocks intranet or file system-local URLs before invoking an HTML viewer. If these are all ad hoc "informal" parsers that try to "fix" syntactically invalid URLs, it is nigh-impossible to verify that filtering them for "safe" URLs is correct. That's because it's impossible to decide which language is really accepted by an ad hoc implementation. An implementation further down the stack might interpret an URL (radically) different from one up the stack and you have a nice little exploit in the making.

If you're not convinced by my measly attempts at explaining this idea, please watch the talk "The Science of Insecurity". Meredith Patterson states the case much more eloquently than I ever could. This talk was an absolute eye-opener for me.

With this context, it baffled me to read the statement that "there are several large parts of the spec that cannot be captured by any kind of grammar". This is literally equivalent to saying "we can't know if an URL will be valid without evaluating the algorithm". This means you cheerfully drag the halting problem into what should be a simple, straightforward notation (come on, URLs aren't **that** ill-defined!). As far as I can tell, the RFC defines a regular grammar. The decision to go from a regular to an unrestricted grammar should not be taken lightly!


We're getting close to a CHICKEN 5 release, so let's take a look at the cool new stuff!

Overhaul of built-in modules

The biggest change you'll notice when you fire up CHICKEN and start to use it is that the modules that come shipped with core are completely different from CHICKEN 4. The functionality is mostly the same, but we moved things around (a lot!) to make things more logical.

This is also the main reason we decided to bump the major version number: the modules have different names, procedures have been renamed, merged or dropped.

You can take a look at the complete list in the CHICKEN 5 manual. We've taken the module layout from R7RS small as inspiration, but since CHICKEN is still an R5RS Scheme first (with r7rs being an optional extension) we had to make some changes.

So, we define a scheme module which contains the entire R5RS language. For everything that is a CHICKEN-specific extension to standard R5RS Scheme, we put it under a (chicken ...) name, which tries to follow the R7RS naming conventions.

For example, R7RS defines a (scheme process-context) module with the following procedures:

  • command-line
  • exit
  • emergency-exit
  • get-environment-variable
  • get-environment-variables

Likewise, CHICKEN defines a (chicken process-context) module, which is a superset of the corresponding R7RS module. Take a look at its manual page; you can see that it defines many more procedures, but it includes all the standard ones too.

By using the R7RS names but with scheme replaced by chicken, the new modules should be easy to remember for anyone used to R7RS. Of course, you can still write portable standard R7RS programs via the r7rs egg, which defines a 100% compatible (scheme process-context) module with only the R7RS identifiers.

There is one important caveat: Because our scheme modules exports everything from R5RS Scheme, we don't provide, say, a (chicken cxr) module for all the cadadr, caddar and so on, because those are all in scheme. This also means that the (chicken load) module does not export load; that's already in scheme. Instead, it defines various non-standard CHICKEN extensions like load-relative and such.

Saner module imports

Speaking of modules, we've improved the way modules are linked into user code. In CHICKEN 4, there's a very strict distinction between modules and (compilation) units. This was an endless source of confusion for beginners. For example, why did (import foo) give an error when you tried to actually refer to an identifier from the foo module? That's because import didn't actually load the code, just the import library. To actually load the code and import the library, you needed (use foo). You could also load the code without importing it via (require-library foo). This should help with cross-compilation. The idea was that you would only need to load the import library on the host, and have the library itself compiled for the target, but in practice you needed to compile the library twice anyway (once on the host, once for the target).

We got rid of this mess: now the canonical way to import the foo library is simply (import foo). For more info, see this post by Felix outlining how to improve imports.

Full numeric tower

Of course, support for the full numeric tower is a personal favorite of mine, having spent a lot of time to perfect this stuff!

Most importantly, this means you no longer need to worry about integer computations over- or underflowing into a flonum and all the weird floating-point problems that entails. Bignums are also a necessity when dealing with 64-bit numeric C types in the FFI. For example, we finally support the size_t type correctly. To me, complex numbers and exact fractions (aka rational numbers) are a nice added bonus, as you could already get them before with the numbers egg. However, by having these types built-in, they're more efficient and you don't have to worry about passing these numbers to code that can't handle them because support happened not to be compiled in.

Take some time to read my blog series about the numeric tower if you're interested in the details.

Declarative egg description language

The chicken-install program to install eggs was rewritten along with all the surrounding tools. The main reason to do this was to make the life of package maintainers easier.

The old version of chicken-install would download, build, install and (optionally) run the unit tests as part of one command. If any dependencies were missing, it would also recursively download, build, install and run tests for those as well. The new version cleanly separates these steps, by generating shell scripts (batch files on Windows) that can do the necessary actions to build and install.

To make this easier, we also had to re-think the egg "language". In CHICKEN 4, a .setup-file was simply a Scheme program in which a few helper procedures were available for calling the compiler. This means it's impossible to create a simple shell script that will separate the build and install steps. That's why we now have a separate, declarative file which describes the components of an egg. See the .egg file documentation for a concrete example.

The rewritten chicken-install will now also cache eggs to avoid re-downloading the same eggs again and again. By default the cache is stored in a dot-directory under the user's home directory. This can be overridden with the CHICKEN_EGG_CACHE environment variable, which might also help package maintainers take the distributed files from another location.

See these design notes for more information about the goals and motivations behind the rewrite.

Improved support for static compilation

In principle, CHICKEN 4 has good support for static compilation. In practice, egg authors would not include the necessary commands for building their libraries statically. Most people don't have a real need for static linking, which means they tend not to make an effort to support it just in case someone else might need it.

The upshot of this was that you could only really compile programs statically when they didn't use any eggs, or if you created a custom build script that would compile the eggs manually with the required -static option. With the new chicken-install, you get static compilation support automatically, for free.

Note that in CHICKEN 4, you could also build eggs and programs using the so-called deployment mode. This allowed shipping a program with all its libraries in one directory. This worked quite well if your target platform supported it, but not all platforms did. Static compilation covers all the use cases that deployment supported and works reliably on all platforms, so we decided to drop deployment mode with all the complexity it brings.

Other noteworthy things

But wait, there's more!

  • Code generation is now fully deterministic, making builds reproducible. This allows you to verify that any given file of generated C code corresponds to the Scheme source code by recompiling it with the same CHICKEN version, both for user code and for CHICKEN core itself. As an added bonus, because the generated C output is deterministic, ccache can be used to get much faster builds (before, it would invalidate the cache as each file would be different).
  • We've improved how symbols are garbage collected, which was optional and somewhat broken in CHICKEN 4. This will speed up code that generates many symbols, and stops symbol table stuffing attacks from being a threat.
  • We have removed quite a bit of bloat: The srfi-1, srfi-13, srfi-14, srfi-69 and srfi-18 libraries have been removed from core! Not to worry though; they are now available as eggs. This will both allow faster development and encourage innovation and competition from alternatives to these non-essential libraries (especially R7RS-large seems to be geared towards renewal of some of these). We've also moved several non-SRFI procedures from core: object-evict, compile-file, binary-search, procedures for dealing with queues, scan-input-lines and POSIX group-information have all been moved to eggs. Support for SWIG has been removed, as it was bit-rotting and nobody seemed to be using it anyway.
  • Ports can now be bi-directional, so there's no more unnecessary distinction between input-ports and output ports. This maps more cleanly to file descriptor semantics, which can also be opened for both reading and writing.
  • Random number generation has been completely replaced. Before, we used libc's rand(), which produces very low quality random numbers. CHICKEN 5 uses the WELL512 PRNG to generate random integers, and it provides access to the system entropy pool for generating cryptographically secure streams of random bytes (using /dev/urandom on *nix, and RtlGenRandom on Windows).

Conclusion

There's a lot to like about the new CHICKEN, so go ahead and give it a spin! Release candidate 1 was made available today for you to try. The full list of changes can of course be found in the NEWS file. If you're already a happy CHICKEN 4 user, we've created a porting guide for you, to make it easier to make the transition from 4 to 5. If you need more help, you can of course contact the always friendly CHICKEN community.


Older articles...
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 License. All code fragments on this site are hereby put in the public domain.