Designing Lispy DSLs, part 4: SSQL

Today we'll look at an old, experimental DSL of my own design. I've always referred to it as a failed experiment, but perhaps it's really a successful experiment, because it helped me figure out why this type of DSL doesn't work too well. Whatever the status, I'll use it as an example of what makes a bad DSL.

The DSL in question is SSQL, a way of embedding SQL as S-expressions into Scheme code. Interestingly, it seems I had a bad feeling about it from the start; the initial commit had the following message:

 Add another doomed project - ssql

It turned out not to be completely doomed, because Moritz Heidkamp has kindly taken over maintenance and has been improving and polishing the library. I might even use it again for my own projects if I ever get tired of working directly with SQL.

Scoped access

For my day job I used to write a lot of Rails code, and I got tired of the restrictions in ActiveRecord. I have to mention that this was in the days before Arel, which is a great improvement in the way you can use custom queries in Rails.

With ActiveRecord, you could write code that would automatically prevent users from accessing things they shouldn't be able to access with the scoped_access plugin. This allowed you to write things in your controller like the following:

scoped_access Customer

def method_scoping
  ScopedAccess::ClassScoping.new(Customer, :user => {:id => current_user.id})
end

I don't recall exactly how it worked, but when you had a complex query, this could cause clashes when the same table was joined in twice, especially if the condition was complex. In different situations, different queries could be generated. Back then, you also needed to know internally-generated join aliases in order to scope related tables. Remember, this was quite a while ago, and I was a bit of a newbie and had been programming Ruby and Rails for only a year or two. There may have been better ways to do this even then.

In any case, this scoping problem annoyed me no end and I knew there had to be a better way. It was obvious that if you represent the query in a more complex data structure than a simple string, you can easily fetch all the references to a particular table (even if it is aliased), and add some scoping to it. This could be done even if it required the addition of joins, and even if those tables were already joined under arbitrary names, as long as you would alpha-rename all aliases to avoid clashes with user-created aliases.

Here's an example of the SSQL syntax. This example is based on a toy data model for an IMDB-clone with films, actors and their roles in them:

'(select (columns (col actors id firstname lastname)
                  (col roles character movie_id))
         (from actors roles)
         (where (and (= (col actors firstname) "Bruce")
                     (= (col actors lastname) "Campbell")
		     (= (col actors id) (col roles actor_id)))))

The regular SQL equivalent of this:

 SELECT actors.id, actors.firstname, actors.lastname,
        roles.character, roles.movie_id
 FROM actors, roles
 WHERE actors.firstname = 'Bruce'
   AND actors.lastname = 'Campbell'
   AND actors.id = roles.actor_id;

The SSQL for column selection can be a little ugly or verbose, so it's also allowed to specify columns with a dot instead of the col form (probably a mistake, complicating the DSL design):

'(select (columns actors.id actors.firstname actors.lastname
                  roles.character roles.movie_id)
         (from actors roles)
         (where (and (= actors.firstname "Bruce")
                     (= actors.lastname "Campbell")
		     (= actors.id roles.actor_id))))

The columns "noise word" is still required, because that makes it easier to walk the expression and programmatically manipulate it. In any case, scoping a table is easy, even for arbitrarily complex cases:

(let ((query
        '(select (columns actors.firstname actors.lastname
                          roles.character movies.title)
                 (from (join left
                             (join left actors
                                   (join inner roles (as movies m2)
                                         (on (and (= m2.id roles.movie_id)
                                                  (> m2.year 2000))))
                                   (on (= roles.actor_id actors.id)))
                             movies
                             (on (= movies.id roles.movie_id)))))))
  (scope-table 'movies '(< (col movies year) 2005) query))

;; Results in the following:
(select (columns actors.firstname actors.lastname
                 roles.character movies.title)
        (from (join left
                    (join left actors
                          (join inner roles (as movies m2)
                                (on (and (= m2.id roles.movie_id)
                                         (> m2.year 2000))))
                          (on (= roles.actor_id actors.id)))
                    movies
                    (on (= movies.id roles.movie_id))))
        (where (and (< (col m2 year) 2005)
                    (< (col movies year) 2005))))

The initial query selects all the films in the database, including all actors with the roles they played in that film. However, the actors are only included for films that were released after the year 2000. Earlier films are returned without the actors.

Now, the magic happens in the call to scope-table, which returns the same query, but with all occurrences of the movies table scoped to include only films released before the year 2005. Note that this scopes both the main query and the joined table m2 even though it's aliased.

It's all about the syntax

Okay, so it turns out that this idea works beautifully. Let's look at why I think this DSL was a failure. One reason is the fact that SQL is a huge language, especially when you consider all the extensions provided by various implementations.

You could say "but you don't have to support the full language". That's true, but the problem with a language that maps directly to SQL is that users will expect being able to do all the things they can do in regular SQL. For example, when Common Table Expressions were first introduced into PostgreSQL, I started seeing many places in my code bases at work where those would be useful. The same was true for Window Functions. These are both extremely useful extensions, and I'm now making regular use of them. I wouldn't want to miss them, so any SQL DSL really needs to support them for me to take it seriously.

The thing both extensions have in common is that they introduce completely new syntax. That's because there are absolutely no common building blocks for language constructs; every feature is a set of arbitrarily-placed keywords to help a parser make sense of it (with many optional "noise" keywords to help a human make sense of it). This means each feature has to be taught separately to SSQL, resulting in a large set of rules on how to convert them to SQL.

The SQL grammar is so complicated that its sheer size has serious performance implications on a parser, as pointed out by this blog post. Because EXPLAIN is a PostgreSQL extension, they simply decided to change this command's syntax to make it faster to parse. The old syntax is still supported for backwards-compatibility, but this change is a great illustration of how much of a moving target the SQL syntax really is. Other SQL implementations don't generally move as fast as PostgreSQL in adding features, but as I indicated earlier, I really like these features and use them on a regular basis.

Database independence with SQL-based syntax?

Another complication is supporting multiple databases. SSQL supports ANSI SQL as a baseline, with optional extensions that are available if the back-end supports it. The nice thing is that this provides a degree of database independence. All back-ends can automatically quote strings and table names correctly depending on the database, making SQL injection bugs effectively impossible. For example,

'(select (columns (col actors firstname lastname birth-date))
         (from actors)
         (where (= actors.lastname "O'Neill")))

gets output as the following in PostgreSQL and SQLite:

 SELECT actors.firstname, actors.lastname, actors."birth-date"
 FROM actors
 WHERE actors.lastname = 'O''Neill';

The MySQL back-end outputs the following:

 SELECT actors.firstname, actors.lastname, actors.`birth-date`
 FROM actors
 WHERE actors.lastname = 'O\'Neill';

These differences are relatively small and don't affect the syntax of the S-expression version. However, there are other examples that do. For example, MySQL's INSERT statement allows syntax which mirrors the UPDATE statement, using SET:

 INSERT INTO movies SET title = 'Alien', year = 1979;

whereas PostgreSQL only allows the standard syntax (which MySQL also supports):

 INSERT INTO movies (title, year) VALUES ('Donnie Darko', 2001);

The question then becomes whether the (unnecessary) syntax with SET should be allowed, and, if so, whether this should be emulated in PostgreSQL by rewriting it to the standard syntax. There are tens of such silly examples (CONCAT versus || versus logical OR, case insensitive LIKE versus ILIKE, etc), but there are a lot of more fundamental differences, too. Finally, using ANSI as a baseline is nice, but many of ANSI's features aren't widely implemented. Common Table Expressions are a good example; they're standardized, but neither MySQL nor SQLite support them, and Postgres only started supporting them very recently. Oh, and fuck proprietary RDBMSes; Oracle long ignored ANSI and invented more nonstandard extensions than MySQL ever did, and as a result, their users are as clueless about ANSI SQL as the average MySQL user. Finally, there are many ANSI features that none of these databases support. This means you have to implement a feature in ANSI, then override it to produce an error message saying it's unsupported in this dialect for all implementations that don't support it. An alternative approach is to implement no base but make everything completely implementation-specific. However, this results in a bigger risk of producing DSL inconsistencies between dialects.

A better approach

Recently, the relational algebra has been gaining some more interest. For example, there's Alf for Ruby and the UNIX shell, and of course Arel, which I mentioned earlier.

I think this is a better approach; relational algebra has just a handful of concepts and there's no syntax associated with it, so you can invent your own syntax to best fit your DSL. It also prevents you from getting distracted by the differences in various SQL implementations. You can see this with Alf already; it has total abstraction over the DBMS. It can use flat files or SQL, or any other back-end you'd like, as long as it fits the relational model (to be fair, so can PostgreSQL with SQL/MED foreign data wrappers). The flip side of such a high level of abstraction is that it will be harder to make use of any killer features offered by your RDBMS; you get the lowest common denominator in features.

Optimizing queries also becomes hard. You can no longer hand-optimize them when writing them, and you'd probably end up with an optimizer in your library. This is pretty insane, since there's also an SQL optimizer and query planner inside your RDBMS, so you're doing twice the work, and there's twice the opportunity for getting it wrong.

Despite these disadvantages, the "relational algebra DSL" approach is more viable than the "SQL DSL" approach. ClojureQL also initially took the approach of providing a DSL closely modeled on SQL, but later completely revised the DSL to be more abstract and closer to relational algebra than to SQL.

I think it's interesting to see what other SQL-like DSL projects will do. For example, Clojure also has Korma, which is rather close to SQL and looks like it can currently only perform a limited subset of all possible queries. I wonder what they'll do when users start clamoring for richer back-end support? Racket used to have SchemeQL, but that project seems to have vanished from the web. The website of its parent project, Schematics, doesn't mention it at all anymore. The same seems to have happened to a Common Lisp interface called CL-RDBMS (at least the "homepage" link currently points to a broken web site).

There's a popular library for Common Lisp called CLSQL. It looks like an enormous amount of engineering went into it. If that's required to get a useful SQL DSL, it might not be worth it unless the advantages outweigh the effort required. Note that even after 10 years of development, CLSQL still has no outer join support. I think that's indicative of how hard it is to properly support SQL from a DSL.

Wrap-up

The lessons I learned from the SSQL experiment are in retrospect rather simple, and seem to echo earlier blog posts:

The language you're targeting should be small and have few core concepts.
The relevant standards should be fully implemented in all back-ends you want to support.
Back-ends shouldn't have any arbitrary extensions that you're expected to support.
Look for an underlying theory; this may be a better abstraction than the target language.
Try to find examples of similar libraries. Did others try, and fail or give up? If so, why? How complex are existing implementations? Are they complete?

This post will be the last post in this series, at least for a while. There aren't that many other interesting DSLs with which I'm familiar, and I've exhausted the list of novel design concepts that I'm able to distill from existing DSLs.

More magic

Cautionary tales from a programmer

About this blog

Designing Lispy DSLs, part 4: SSQL Posted on 2012-08-20

Scoped access

It's all about the syntax

Database independence with SQL-based syntax?

A better approach

Wrap-up