Designing Lispy DSLs, part 1: SCSS

Setting up this blog was a good excuse to try out SCSS, which I'd been meaning to do for quite a long time. Working with SCSS and exploring its limitations got me thinking about what makes a good Lispy DSL (domain specific language). This post is the first of a series. Today we'll look at SCSS; in future installments we'll explore other examples of Lispy DSLs.

The idea behind SCSS isn't unique; by generating CSS from a more powerful language you get to use the abstraction systems provided by that language. Abstractions are sorely needed when writing advanced CSS; for example, you often need to use one color in many different situations. In plain CSS, you need to repeat this color value for every usage and, if a few instances need to change, you must go find and replace them. You can imagine it's easy to forget one, or to replace too many! Where HTML makes it easy to write semantically meaningful content (by assigning IDs and classes, for example), CSS doesn't have any way to indicate how style elements logically relate.

As an interesting side note, one of the creators of CSS, Bert Bos, thinks that "real-language" features are unnecessary in CSS. He goes as far as saying constants shouldn't even be added to CSS. His main argument basically boils down to other people are stupid so you don't get to use advanced features, either. Luckily, many people disagree and have written their own server-side preprocessing languages.

Some of these projects (like Less and Sass) take the approach of adding their own syntax extensions to "plain" CSS, while others (like an older syntax of Sass) design their own custom language that's inspired by the concepts in CSS but quite different in syntax. All these projects are purely about generating CSS from another language. But we are smug Scheme weenies, and to us code and data are one and the same. A typical Schemer would prefer not just to generate CSS from SCSS, but to represent CSS in a first-class value, so that it can be manipulated at will. And that's exactly what SCSS offers... at first glance.

The devil is in the many, messy details

When you first look at CSS, it seems like a simple enough language. Indeed, the core syntax is rather simple. Each rule set has selectors separated by commas followed by declarations between curly braces, separated by semicolons:

#my-id, p.my-class, div {
  background-color: green;
  width: 10em;
  margin-left: 5px;
  border: 1px solid rgb(0, 128, 0);
}

There are three selectors here: the first one selects any element with the id attribute "my-id", the second one selects every p element (paragraph) which has "my-class" listed in its class attribute. The third one simply selects all div elements. The declarations are simple property/value pairs which determine how the selected elements will be displayed.

In Scheme, we can easily represent this as lists of items, where each item is a list of selectors and values, and that's exactly what SCSS does:

`(css+
   (((= id "my-id") (p (= class my-class)) div)
    (background-color "#008000") ; Should we use string values
    (color green)                ; for classes and colors, or symbols?
    (width "10em")
    (margin-left "5px")
    (border "1px solid rgb(0, 128, 0)")))

One neat feature that's added by most of these CSS preprocessors is that you can nest items. This places the full expression of their parent before the sub-item, which means that item will only match the selector within its parent:

`(css+
   (div
    (border "1px solid rgb(0, 128, 0)")
    (((// (= class "some-child")) (// (= id "some-other-child")))
     (color orange))))

This compiles to the following CSS:

div {
  border: 1px solid rgb(0, 128, 0);
}

div .some-child,
div #some-other-child {
  color: orange
}

When looking at the examples we should start to get a funny feeling. Aside from the fact that the selector syntax is rather heavy on parens which makes it hard to read even for a Schemer, there are a few problems. The first problem is the fact that we are representing the property values as flat strings (or symbols). This means you can't easily, say, find all the elements that have a particular color somewhere in their values without very heavy additional parsing (in CSS, green, rgb(0,128,0) and #008000 all mean exactly the same thing). You also can't easily compose declarations with variables without doing string manipulations, which mostly defeats the point of using a first-class representation:

(let ((company-color "#008000")
      (page-width 1000)
      (logo-size 20))
  `(css+
     ((= class "menu")
      (border-left ,(sprintf "1px solid ~A" ,company-color))
      (width ,(sprintf "~Apx" (- page-width logo-size)))
     ((= class "whatever")
      (background ,(sprintf "url(\"img/back.png\") no-repeat 10px 20px ~A"
                            ,company-color))))))

The second problem is that strings, being directly injected into the CSS, don't get "escaped". This means you can't take any user input (let's say a font name, or a color value) and use this in a declaration value; this can destroy your entire layout if it contains a semicolon or curly brace - at best an annoying bug, at worst, a security issue. You might just put everything in one string for all the difference it makes:

`(css+
   (.my-class
    (color "#222; list-style-type: circle; margin-left: 5px")))

The third "problem" points us in the right direction. The border-property is actually a shorthand property. The border declaration from the first example breaks down into the following full declarations:

html {
  border-top: 1px solid rgb(0, 128, 0);
  border-right: 1px solid rgb(0, 128, 0);
  border-bottom: 1px solid rgb(0, 128, 0);
  border-left: 1px solid rgb(0, 128, 0);
}

Unfortunately, this decomposition is impossible to do in SCSS without parsing the property's string values. Besides, even if we were to do that, these properties themselves are shorthands, too! For example, the border-top declaration itself breaks down into these declarations:

html {
  border-top-width: 1px;
  border-top-style: solid;
  border-top-color: rgb(0, 128, 0);
}

This is similar to how in Scheme macros can rewrite convenient notation to a simpler core language. The better approach would be to compile down to the core CSS forms rather than trying to use these complex properties directly.

To get this far, we'd have to decompose everything to its simplest form and assemble more complex properties in terms of simpler ones. In CSS, each property basically has its own free-form "value" syntax which can get quite complex. Some examples:

html {
  /**
   * Images can be full URIs (dragging in another pretty large RFC), which can
   * *optionally* be quoted (why all this unnecessary optional stuff?)
   */
  background-image: url("path/to/image.png");

  /* You can use named "counters" (what, there are no variables in CSS?!) */
  content: "Chapter " counter(my-chapter-counter) ". ";
  counter-increment: my-chapter-counter;      /* Add 1 to chapter */

  /**
   * Lists of font names (strings), separated by spaces and possibly quoted.
   * Also, a restricted set of specially-defined "generic font families"
   * like serif, fantasy (WTF) and monospace, and even specially-defined
   * "system fonts" like status-bar, small-caption, icon, and menu.
   */
  font-family: Helvetica, "Comic Sans MS", fantasy, small-caption;

  /**
   * Different size types: em, ex, px, pt, in, cm, mm, percentage, unit-less.
   * Margins and paddings take 1, 2, 3 or 4 values which expand into -top,
   * -right, -bottom and -left.
   */
  margin: 1px 2em 30% 0;
}

Seriously, who comes up with this stuff? I'm not saying any of these things are useless, but from a language design standpoint, this seems rather excessive. CSS 3 is even more extreme; there, "image" value-types get so complex that they need their own separate document to specify. The background shorthand property grew in complexity as well. Two examples from these drafts (quick, what visual effect do these have? No cheating):

html {
  list-style-image:
      radial-gradient(circle, #006, #00a 90%, #0000af 100%, white 100%);
  background: url("chess.png") 40% / 10em gray round fixed border-box;
}

Finally, the CSS3 animations draft spec adds a completely new syntax element for key frames. This is the only place in plain CSS where curly brace sections are nested inside other curly brace sections.

This highly variable and ever-changing aspect of the syntax means that it's quite an open-ended language. This makes it quite hard to cover all future extensions. The one point that gives me hope is the fact that all this complexity is built up out of a set of core "atoms" like length units, URIs and colors. These atoms do not seem to change too much.

This observation shows us an opportunity for a better CSS DSL; we could try to map these atoms to suitable Scheme values, possibly ignoring the details of how complex values are composed out of these atoms. This is basically what the W3C did with their CSS DOM API. Taking a good look at this DOM API might help to get some inspiration, even if the API itself is unwieldy and un-Lispy (it's very OOP-ish).

In a language without a small set of well-defined atoms, you will need special parsers and generators for each separate type. This is very confusing to people. I know, because this is exactly the approach I took for representing HTTP headers in intarweb. I don't consider intarweb to be a true DSL since it doesn't really have "native" syntax for its header values. Everything passes through construction procedures which do accept "native" values. However, it does illustrate the point; I've had several requests for explanation of how to do common (what I thought were) simple things or "just give me a way to write out the raw header". That's a DSL failure; DSLs ought to be straightforward and easy to understand, yet powerful.

I like to think that Intarweb isn't a complete failure, because when working with intarweb, once everything is parsed, it's often rather nice not to have to deal with parsing anymore. Things like cookies or authentication attributes are notoriously hard to parse correctly, and if everyone up the entire server-side HTTP stack needs to roll their own parser, that's a lot of wasted effort, and a lot of inconsistent implementations with their own bugs. Manipulating these values is also a breeze and never involves string manipulation.

What might a better SCSS look like?

From our new understanding of the nature of CSS, let's try improving it iteratively. For starters, we would like to use parenthetical notation for everything. Plain strings should be disallowed except where they are appropriate and are always quoted and escaped. Making this simple change gives us the following:

`(scss+
   (((= class "foo") (= class "bar"))
    (border-left-color (rgb 0 128 0))
    (border-left-width (em 1))
    ;; unsure whether we should allow this shorthand..
    (border-right (px 1) solid ,orange)
    (width (px ,(- page-width sidebar-width)))
    ((// p)
     (color green)
     (font-family #("Helvetica" "Comic Sans MS" sans-serif)))))

I've used vectors to describe sequences of things, whereas composite declarations like border-right are simply expressions with more than two subexpressions. Built-ins like sans-serif and green are symbols. As you can see, because there are no strings, lengths can be calculated without having to perform string manipulation. Another valid approach would be having a special "color" object type with associated procedures that operate on them. If we wanted to do this, SCSS could export variables with color definitions so that green is simply an alias for (rgb 0 128 0), and you could perform "color-algebraic" operations:

`(scss+
   (((= class "foo") (= class "bar"))
    (border-left-color ,(rgb 0 128 0))   ; "rgb" is a constructor procedure now
    (border-left-width ,(em 1))          ; So are "em"...
    (border-right ,(px 1) solid ,orange) ; .. and "px"
    ((// p)
     (color ,green)
     ;; A green background which is darker by 50%
     (background-color ,(darken green .5))
     (font-family #("Helvetica" "Comic Sans MS" sans-serif)))))

I can't think of any useful operations on font types, so I've kept sans-serif a symbol here. How far you want to go depends on your goals, and involves striking a balance between ease of use, safety, and power. For instance, you could define a separate type for everything, including fonts, but that would make it harder to use. It would also make it harder to introduce mistakes, especially if the CSS generator will validate while rendering. However, strict validation also means allowing extensions (like those from CSS3) becomes harder!

The selector syntax could use some love too, but I'm less critical of that. The basic idea is fine; it can extend to include arbitrary selectors. It currently supports the + sibling and > child selector as well as the class and id comparisons. Because these operators are in the operator position of a list, adding new ones is as simple as adding a new procedure in Scheme. A pseudo-selector like p:first-child for example could simply be translated to (: p first-child) without breaking anything else.

Right now selectors are simply grouped by adding an extra set of parens around them to put them in a list. Using a visual cue like and or or to indicate grouping might help for readability, as would getting rid of the // selector for hierarchical nesting. As long as we make sure all selectors are unused property names there's no ambiguity in simply nesting a new rule inside another one:

`(scss+
   ((= class "foo")
    (color ,orange)
   (div
    (margin-left ,(px 1))
    ((or (= class (or "foo" "bar"))
         (= id qux))
     (border-left-color ,(rgb 0 128 0))
     (font-family #("Helvetica" "Comic Sans MS" sans-serif))))))

Instead of repeating the class selection, we just put the (or ...) around the class, which is a nice abbreviation, but overall I'm not too happy about this version, so let's back up a step.

We can't guarantee that the selector symbols will remain unused as property values, because we don't know what property names the CSS spec might add in the future. We should strive to avoid potential clashes with future extensions. Also, dropping the // makes it harder to traverse an SCSS tree and perform manipulations since the traversal code would need a full list of all known selectors. So after all, it looks like it's better to keep the //. But we can drop some unnecessary parens by taking the previous example and just putting the // before the selector. Since it's been modified to be one s-expression, we can do that. We can also allow the = selector to accept any attribute (not just classes). While we're at it, this selector should also accept multiple values to avoid repetition:

`(scss+
   ((or (~= p class "foo")    ; Change to (has-word? p class "foo") ?
        (+ div (= p class "bar" my-attr "qux")))
    (border-left-color ,(rgb 0 128 0))
    (font-family #("Helvetica" "Comic Sans MS" sans-serif))

    (// (= * class (or "foo" "bar"))
        (color ,orange)))

   (div
      (display block)
      (// span
          (text-align left))))

The example above also shows the extensibility of operators by adding the ~= selector (a very unschemely name...). Let's see the CSS this would compile to:

p[class~="foo"],
div + p.bar[my-attr="qux"] {
  border-left-color: rgb(0, 128, 0);
  font-family: "Helvetica", "Comic Sans MS", sans-serif;
}

p[class~="foo"] *.foo,
p[class~="foo"] *.bar,
div + p.bar[my-attr="qux"] *.foo,
div + p.bar[my-attr="qux"] *.bar {
  color: orange;
}

div {
  display: block;
}

div span {
  text-align: left;
}

That's not too bad! There's a lot of redundancy in the resulting CSS that we abstracted away via the combination of shortened or-alternatives and hierarchical nesting. The original SCSS also had this hierarchical nesting, by the way, so this type of redundancy is already avoided even by using a slightly flawed DSL.

In CSS, the #foo and .bar syntaxes are shorthands for selecting on IDs and classes, because these are so common. There is no technical need to support these shortcuts, so if this makes your design less clean, you can always drop them and opt to use the generic selectors everywhere. For IE6 and other crippled browsers, the renderer could detect class selection and rewrite it to the short syntax. You could always consider extending the Scheme reader to get the same brevity at a higher level, while keeping SCSS itself simple (not that I would recommend doing that, but the option exists).

Lessons learned

I will try to wrap up each blog post in this series by listing the general design rules that we can extract from the DSL under discussion. To wrap an existing language like CSS into a DSL, the following approach seems useful:

First, identify the atomic building blocks. If there are many, this may spell trouble.
Decide which building blocks are essential to be represented "first class" in a structured way, and which can be unstructured strings or symbols (Lisp's atoms).
Determine the combination rules of these atoms and how to translate this to s-expressions.
Think about whether you want to rely on the host language and expose shorthands and abstractions directly, or if you want to rely on Scheme's abstraction facilities.
If possible, look in what direction the language evolved, and how it has been extended in the past. Your design must be able to accommodate changes in these directions.
Finally, use parentheses and "noise symbols" sparingly, but effectively! Try striking a balance between notation and manipulation convenience.

I realize that some of the things I've said in this post might be contradictory. I might be too vague and hand-wavery in some places. Hell; many things are probably bloody obvious to some of you. But the main point is that it's important to remember that design is hard, and will always involve trade-offs.

I hope that you understand that when designing a DSL you'd better think about what use cases you want it to support before considering how to answer a particular design question. It's very easy to get carried away and overdesign a DSL, but another pitfall is to have too little design (like SCSS, in my opinion). Next time we'll look at a design that's pretty close to ideal, and show that even with that, there are some problems.

More magic

Cautionary tales from a programmer

About this blog

Designing Lispy DSLs, part 1: SCSS Posted on 2012-07-28

The devil is in the many, messy details

What might a better SCSS look like?

Lessons learned