Clojure from a Schemer's perspective

Recently I joined bevuta IT, where I am now working on a big project written in Clojure. I'm very fortunate to be working in a Lisp for my day job!

As I've mostly worked with Scheme and have used other Lisps here and there, I would like to share my perspective on the language.

Overall design

From a first view, it is pretty clear that Clojure has been designed from scratch by (mostly) one person who is experienced with Lisps and as a language designer. It is quite clean and has a clear vision. Most of the standard library has a very consistent API. It's also nice that it's a Lisp-1, which obviously appeals to me as a Schemer.

My favourite aspect of the language is that everything is designed with a functional-first mindset. This means I can program in the same functional style as I tend to do in Scheme. Actually, it's even more functional, because for example its maps (what would be hash tables in Scheme) are much less clunky to deal with. In Scheme, SRFI-69 hash tables are quite imperative, with hash-table-set! and hash-table-update! being the ways to insert new entries, which of course mutate the existing object. Similarly, Clojure vectors can easily be extended (on either end!) functionally.

The underlying design of Clojure's data structures must be different. It needs to efficiently support functional updates; you don't want to fully copy a hash table or vector whenever you add a new entry. I am not sure how efficient everything is, because the system I'm working on isn't in production yet. A quick look at the code implies that various data structures are used under the hood for what looks like one data structure in the language. That's a lot of complexity! I'm not sure that's a tradeoff I'd be happy to make. It makes it harder to reason about performance. You might just be using a completely different underlying data structure than expected, depending on which operations you've performed.

(non) Lispiness

To a seasoned Lisp or Scheme programmer, Clojure can appear positively bizarre. For example, while there is a cons function, there are no cons cells, and car and cdr don't exist. Instead, it has first and rest, which are definitely saner names for a language designed from scratch. It has "persistent lists", which are immutable lists, but in most day to day programming you will not even be using lists, as weird as that sounds!

Symbols and keywords

One thing that really surprised me is that symbols are not interned. This means that two symbols which are constructed on the fly, or when read from the same REPL, are not identical (as in eq or eq?) to one another:

user> (= 'foo 'foo)
true
user> (identical? 'foo 'foo)
false

Keywords seem to fulfil most "symbolic programming" use cases. For example, they're almost always used as "keys" in maps or when specifying options for functions. Keywords are interned:

user> (= :foo :foo)
true
user> (identical? :foo :foo)
true

Code is still (mostly) expressed as lists of symbols, though. When you're writing macros you'll deal with them a lot. But in "regular" code you will deal more with keywords, maps and vectors than lists and symbols.

Numeric tower

A favorite gotcha of mine is that integers are not automatically promoted to bignums like in most Lisps that support bignums. If you need bignums, you have to use special-purpose operators like +' and -':

user> (* (bit-shift-left 1 62) 2)
Execution error (ArithmeticException) at user/eval51159 (REPL:263).
integer overflow
user> (*' (bit-shift-left 1 62) 2)
9223372036854775808N

user> (* (bit-shift-left 1 62) 2N) ; regular * supports BigInt inputs, though
9223372036854775808N
user> (* 1N 1) ; but small BigInts aren't normalized to Java Longs
1N

This could lead to better performance at the cost of more headaches when dealing with the accidental large numbers in code that was not prepared for them.

What about rationals, you ask? Well, those are just treated as "the unusual, slow case". So even though they do normalize to regular integers when simplifying, operations on those always return BigInts:

user> (+ 1/2 1/4)
3/4
user> (+ 1/2 1/2)
1N
user> (/ 1 2) ; division is the odd one out
1/2
user> (/ 4 2) ; it doesn't just punt and always produce bignums, either:
2

The sad part is, bitwise operators do not support bignums, at all:

user> (bit-shift-right 9223372036854775808N 62)
Execution error (IllegalArgumentException) at user/eval51167 (REPL:273).
bit operation not supported for: class clojure.lang.BigInt
user> (bit-shift-right' 9223372036854775808N 62) ; does not exist
Syntax error compiling at (*cider-repl test:localhost:46543(clj)*:276:7).
Unable to resolve symbol: bit-shift-right' in this context

There's one benefit to all of this: if you know the types of something going into numeric operators, you will typically know the type that comes out, because there is no automatic coercion. Like I mentioned, this may provide a performance benefit, but it also simplifies reasoning about types. Unfortunately, this does not work as well as you would hope because division may change the type, depending on whether the result divides cleanly or not.

Syntax

For many Lispers, this is the elephant in the room. Clojure certainly qualifies as a Lisp, but it is much heavier on syntax than most other Lisps. Let's look at a small contrived example:

(let [foo-value (+ 1 2)
      bar-value (* 3 4)]
  {:foo foo-value
   :bar bar-value})

This is a let just like in Common Lisp or Scheme. The bindings are put inside square brackets, which is literal syntax for vectors. Inside this vector, key-value pairs are interleaved, like in a Common Lisp property list.

The lack of extra sets of "grouping" parentheses is a bit jarring at first, but you get used to it rather quickly. I still mess up occasionally when I accidentally get an odd number of entries in a binding vector. Now, the {:foo foo-value :bar bar-value} syntax is a map, which acts like a hash table (more on that below).

There doesn't seem to be a good rationale about why vectors are used instead of regular lists, though. What I do really like is that all the binding forms (even function signatures!) support destructuring. The syntax for destructuring maps is a bit ugly, but having it available is super convenient.

What I regard as a design mistake is the fact that Clojure allows for optional commas in lists and function calls. Commas are just whitespace to the reader. For example:

(= [1, 2, 3, 4] [1 2 3 4]) => true
(= '(1, 2, 3, 4) '(1 2 3 4)) => true
(= {:foo 1, :bar 2, :qux 3} {:foo 1 :bar 2 :qux 3}) => true
(= (foo 1, 2, 3, 4) (foo 1 2 3 4)) => true
;; A bit silly:
(= [,,,,,,1,,,2,3,4,,,,,,] [1 2 3 4]) => true

Maybe this is to make up for removing the extra grouping parentheses in let, cond and map literal syntax? With commas you can add back some clarity about which items belong together. Rarely anybody uses commas in real code, though. And since it's optional it doesn't make much sense.

This has an annoying ripple effect on quasiquotation. Due to this decision, a different character has to be used for unquote, because the comma was already taken:

`(1 2 ~(+ 1 2)) => (1 2 3)
`(1 2 ~@(list 3 4)) => (1 2 3 4)

This might seem like a small issue, but it is an unnecessary and stupid distraction.

Minimalism

One of the main reasons I enjoy Scheme so much is its goal of minimalism. This is achieved through elegant building blocks. This is embodied by the Prime Clingerism:

  Programming languages should be designed not by piling feature on
  top of feature, but by removing the weaknesses and restrictions
  that make additional features appear necessary.

Let's check the size of the clojure.core library. It clocks in at 640 identifiers (v1.10.1), which is a lot more than R5RS Scheme's 218 identifiers. It's not an entirely fair comparison as Scheme without SRFI-1 or SRFI-43 or an FFI has much less functionality as well. Therefore, I think Clojure's core library is fairly small but not exactly an exercise in minimalism.

Clojure reduces its API size considerably by having a "sequence abstraction". This is similar to Common Lisp's sequences: you can call map, filter or length on any sequence-type object: lists, vectors, strings and even maps (which are treated as key/value pairs). However, it is less hacky than in Common Lisp because for example with map you don't need to specify which kind of sequence you want to get back. I get the impression that in Common Lisp this abstraction is not very prominent or used often but in Clojure everything uses sequences. What I also liked is that sequences can be lazy, which removes the need for special operators as well.

If you compare this to Scheme, you have special-purpose procedures for every concrete type: length, vector-length, string-length etc. And there's no vector-map in the standard, so you need vector-map from SRFI 43. Lazy lists are a separate type with its own set of specialized operators. And so on and so forth. Using concrete types everywhere provides for less abstract and confusing code and the performance characteristics of an algorithm tend to be clearer, but it also leads to a massive growth in library size.

After a while I really started noticing mistakes that make additional features appear necessary: for example, there's a special macro called loop to make tail recursive calls. This uses a keyword recur to call back into the loop. In Scheme, you would do that with a named let where you can choose your own identifier to recur. It's also not possible to nest such Clojure loops, because the identifier is hardcoded. So, this called for adding another feature, which is currently in proposal. Speaking of recur, it is also used for tail recursive self-calls. It relies on the programmer rather than the compiler to mark calls as tail recursive. I find this a bit of a cop-out, especially in a language that is so heavily functional. Especially since this doesn't work for mutually tail-recursive functions. The official way to do those is even more of a crutch.

I find the special syntax for one-off lambdas #(foo %) just as misguided as SRFI 26 (cut and cute). You often end up needing to tweak the code in such a way that you have to transform the lambda to a proper fn. And just like cut, it doesn't save that many characters anyway and makes the code less readable.

The -> macro is a clever hack which allows you to "thread" values through expressions. It implicitly adds the value as the first argument to the first forms, the result of that form as the first argument for the next, etc. Because the core library is quite well-designed, this works 90% of the time. Then the other 10% you need ->> which does the same but adds the implicit argument at the end of the forms. And that's not always enough either, so they decided to add a generic version called as-> which binds the value to a name so you can put it at any place in the forms. These macros also don't compose well. For example, sometimes you need a let in a -> chain to have a temporary binding. That doesn't work because you can't randomly insert forms into let, so you have to split things up again.

And as I note below, the minimalism is kind of "fake" because some essentials simply aren't provided; you have to rely on Java for that.

Java integration

Clojure was originally designed as a "hosted language", so it leverages the JVM. It does this admirably well; Java classes can be seamlessly invoked through Clojure, without any ceremony:

user> (java.util.UUID/randomUUID)
#uuid "bb788bae-5099-4a64-9c37-f6219d40a47f"

;; alternatively:
user> (import 'java.util.UUID)
java.util.UUID
user> (UUID/randomUUID)
#uuid "0bfd2092-14e1-4b88-a465-18698943ea4e"

The downside is that the above is the way to generate a random UUID. So even though uuids have literal syntax in Clojure (as #uuid "..."), there is no Lispy API for them in the Clojure standard library. This can be pretty frustrating, especially in the beginning. There's no clear indication where to look; sometimes you'll be poring over Java language docs for random stuff you thought would have a Clojure interface (like, say, creating temporary files or dealing with byte arrays). At those moments, you're basically programming Java with parentheses.

Having said that, there will often be community-provided nicer APIs for many of those things, but then you need to decide between adding an extra dependency just for a slightly nicer syntax.

Development style

REPL-driven development

Speaking of Java, one thing that constantly bothers me is the slow startup times of the REPL. In my current project, it takes almost 30 seconds to boot up a development REPL. Half a minute!

Luckily, there's great Slime-like Emacs integration with CIDER. Basically, the only sane way to do iterative development is by connecting to a REPL first thing you do and then sending your code to it all the time.

Now, this may sound weird from a Scheme programmer, but I never fully bought into the REPL style of developing. Sure, I experiment all the time in the REPL to try out a new API design or to quickly iterate on some function I'm writing. But my general development style tends more towards the "save and then run the test suite from an xterm". Relying solely on the REPL just "feels" jarring to me. I also constantly run into issues where re-evaluating a buffer doesn't get rid of global state that was built up on a previous run. When this happens, I'm testing an old version of some function without realising it. Keeping track of the "live" state versus the textual code I'm looking at is a total mind fuck for me. I don't understand how others can do this.

Another thing I seem to constantly do is write some code, have the tests go all green, only to see the CI crash on some cyclic dependency in my namespaces. The REPL does not always see those, because reloading a buffer with a namespace declaration works just fine when you loaded the imported namespaces before, even though they refer to the namespace being re-evaluated.

One thing I really find very nice when you're using CIDER is that everything (and I do mean everything) from Clojure is just a "jump to source" away. Most of the builtin functions seems to be written in Clojure itself. For example, if you want to know how map is implemented, you can just press M-. to see it.

Maps and keywords for everything

One thing you'll really notice is that in idiomatic Clojure code, maps are used for everything. A map is a functionally updateable hash table. It looks like this:

{:key-1 "value 1"
 :key-2 "value 2"}

This lends to a very dynamic style of programming, very much like you would in (dare I say it?) PHP. A bit of a strange comparison, but PHP also makes dealing with arrays (which double as maps in a weird way) extremely ergonomic. There, missing nested keys are automatically created on the fly and because of a strange quirk in its developmental history, arrays are the only objects which are passed by value. This means you can program in a referentially transparent way, while still mutating them inside functions at will. Not exactly the same mechanism, but the end effect on programming style feels very similar: you reach for them whenever you want to bunch some stuff together. It is the go-to data structure when you need flexibility.

In other Lisps you'd use alists (or plists, or SRFI-69 hash tables) for this, but they don't deal so well with nested maps and the library is not as convenient. For example, you can easily select, drop and rename keys in a map:

(-> {:key-1 "value 1" :key-2 "value 2"}
    (set/rename-keys {:key-1 :key})
    (dissoc :key-2)
    (assoc :foo "bar")) => {:key "value 1" :foo "bar"}

This -> notation took me a while to get used to by the way, and I'm still not entirely comfortable with it. I explained how it works above. It's a macro for "threading" expressions. In Scheme, you'd probably use a let* for this, or something. In Clojure that would look like this:

(let [map {:key-1 "value 1" :key-2 "value 2"}
      map (set/rename-keys map {:key-1 :key})
      map (dissoc map :key-2)
      map (assoc map :foo "bar")]
  map) => {:key "value 1" :foo "bar"}

As you can see, the version with -> is much more convenient and less repetitive. Unfortunately, it doesn't compose that well (duh, it's a macro), but because of the way the standard library is designed it is more useful than it would seem at first glance.

Anyway, the way maps are typically used everywhere in a project means that there's a lot less "structure" to your data structures. It is extremely convenient to use maps, even though there are also things like records and protocols. Because of their convenience, you'll end up using maps for everything. As I've noticed in my refactorings, when you change the structure of maps, a lot of code is going to break without a clear indication of where it went wrong.

This is made extra painful by "nil punning". For example, when you look up something in a map that doesn't exist, nil is returned. In Clojure, many operations (like first or rest) on nil just return nil instead of raising an error. So, when you think you are looking up something in a map, but the "map" is actually nil, it will not give an error, but it will return nil.

Now like I said, sometimes you may get an error on nil. It's a bit unclear which operations are nil-punning and which will give a proper error. So when you finally get a nil error, you will have a hell of a time trying to trace back where this nil got generated, as that may have been several function calls ago. This is an example where I really like the strictness of Scheme as compared to some other Lisps, as nil-punning is traditionally a dynamic Lisp thing; it's not unique to Clojure.

Multimethods with keywords

Initially, I was quite impressed by the way multimethods work; they're super simple and clean, yet powerful. First, you declare the multimethod and a "decision procedure", which returns a value that can be compared:

(defmulti say-hi :kind)

(defmethod say-hi :default [animal]
  (println (:name animal) "says hello"))

(defmethod say-hi :duck [animal]
  (println (:name animal) "says quack"))

(defmethod say-hi :dog [animal]
  (println (:name animal) "says woof"))

(say-hi {:name "Daffy" :kind :duck})  => "Daffy says quack"
(say-hi {:name "Pluto" :kind :dog})   => "Pluto says woof"
(say-hi {:name "Peter" :kind :human}) => "Peter says hello"

Using multimethods takes some care and taste, because it splits up your logic. So instead of having one place where you have decisions made with an if or cond tree, you have a function call and then depending on how the multimethod was defined, a different function will be called. This is basically what makes C++ so difficult to deal with in large projects: when people use function overloading, it can get really messy. You need to figure out which of the many things called "say-hi" is actually called in a situation, before you can dive into that implementation.

Compared to the insane amount of customizability that e.g. CLOS offers you, the design restraint shown in Clojure multimethods was nice to see, but then I realised this simplicity can be completely defeated by building hierarchies. That is, Clojure allows you to define a hierarchy on keywords. This was a huge wtf for me, because to me, keywords are just static entities that are unrelated to eachother.

When you realise how Clojure keywords can be namespaced, it makes slightly more sense: this gives them some separation.

A keyword can appear in "bare" form like :foo. This is a globally scoped keyword that belongs to no particular code. It's definitely not smart to hang a hierarchy onto such a keyword, and you're also better off not adding any "meta attributes" to them.

The other form is ::foo, which puts the keyword in the current namespace, which is shorthand for ::more-magic.net/foo if you are in the more-magic.net namespace.

Conclusion

All in all, Clojure is a well-designed language with neat features and it's certainly a lot better than most other JVM languages. There are things in it that I wish Scheme had, and it's certainly functional and modern. As a general programming language, I just can't get over the JVM and all its Java trappings, which is just not my cup of tea.

Apart from the JVM, there are some gratuitous departures from traditional Lisps, especially the "rich syntax" and the extreme reliance and overloading of keywords and maps.

As always, such things are a matter of taste, so take my opinion with a large grain of salt.

More magic

Cautionary tales from a programmer

About this blog

Clojure from a Schemer's perspective Posted on 2021-03-03