S-expressions

Posted on by Chris Warburton

S-expressions are a serialisation format. Their simplicity makes it easy to write parsers, pretty-printers, translators, preprocessors, editor plugins, graphical editors, etc. so you don’t have to inspect the serialised form if you don’t want to. Similar to how there are loads of ways to make a Web site which don’t involve hand-editing HTML, e.g. we can convert from something like markdown/asciidoc/mediawiki/bbcode/etc., we can generate pages programmatically via Haskell/PHP/Python/etc., we can use a WYSIWYG editor, etc. Since s-expressions are much simpler than HTML, using such abstractions is nowhere near as “leaky” (s-expressions just use ( and ) rather than arbitrary XML tags, there’s no tag/attribute redundancy, all text is double-quoted, there are no abbreviations to expand (like namespaces), etc.).

If you want to, it’s pretty easy to make your own alternative ‘interface’ to such data. There are already loads out there too, e.g.

Whilst the parenthesis-heavy format of s-expressions is not necessary, it usually crops up in anything discussing Lisp and its derivatives, simply because it’s much more popular than these alternatives. To me, that mostly indicates that concerns about “too many parentheses” are really a non-issue, despite being made by many who are new to the format.

My Approach

I’m a heavy Emacs user, so I use Emacs to edit everything. My ‘solution’ to the parentheses ‘problem’ is to set their colour to a very low contrast whenever the buffer contains an s-expression language; here’s the relevant Emacs config (written in s-expressions!):

;; Make parentheses dimmer when editing LISP
(defface paren-face
  '((((class color) (background dark))
     (:foreground "grey30"))
    (((class color) (background light))
     (:foreground "grey30")))
  "Face used to dim parentheses.")

(mapcar (lambda (mode)
          (add-hook mode
                    (lambda ()
                      (font-lock-add-keywords nil
                                              '(("(\\|)" . 'paren-face))))))
        '(emacs-lisp-mode-hook scheme-mode-hook racket-mode-hook))

I use show-paren-mode to highlight matching parentheses which are next to the cursor, but otherwise just “tune out” the parentheses in favour of indentation (the way Emacs indents s-expressions is nice enough that I seldom fiddle with it):

Emacs screenshot

For particular s-expressions-based languages, like Racket, we can get syntax colouring for symbols, etc. by using the corresponding Emacs mode; this also lets us trigger flycheck syntax checking, and so on.

One of the payoffs of editing a serialised format like s-expressions is that we can use tools like paredit and smartparens to avoid having to care about the textual representation at all: they make navigating and manipulating the syntax tree structure and content relatively nice, they ensure that parentheses and quoted strings always remain balanced, they automatically escape characters when written inside strings, etc.

Whilst we could, in theory, make similar tree navigators for languages with more complicated textual representations like, say, Haskell, in practice these aren’t as useful since these files are mostly in an unparseable state during editing; for example we might have written a let but not yet written an in, or a case without an of, or an = without a right-hand-side, etc.

In principle we can solve these in the same way as paredit: insert the whole language construct at once, and let the user fill in the gaps; yet this requires a whole raft of language-specific constructs, whilst paredit can get away with (/), [/], {/} and "/" for basically any language. It also requires custom keybindings to avoid ambiguity: whilst paredit can take over the ( key to insert a balanced () pair, there is no let key; the best we could do would be hooking into the spacebar and checking if we’ve just opened a let. In any case, our files would still be unparseable until the user’s filled in all of the gaps: for example let in is invalid; let x in is invalid; let x = in is invalid; let x = 42 in is invalid; only when we reach let x = 42 in y will the parser not choke. We could put in placeholders like let _ = _ in _, but then we’d need to decide whether the user wants to insert a character or overwrite a placeholder; and so on.

Far nicer to expose the tree structure separately, which can be managed without knowing anything about the syntax of our language.