Free Elves

Posted on by Chris Warburton

I’m a big fan of embedding arbitrary, possibly effectful, programs inside pure languages through the use of free (and “freer”) constructions, like free applicatives and free monads.

This raises an interesting question: if we can use an “embedded domain-specific language” to program in a “host” (e.g. Haskell) as if it were a “target” (e.g. SQL); can we take existing programs from the outside world, and embed them as values in the host language?

For example, say we want to integrate a Haskell program with a bash script. The most naïve thing to do would be to write the script to a file, then have our Haskell program call out to the bash program, passing the path to the script file as an argument.

A better approach would store the script in our Haskell code, call out to bash as before, and pipe the script into the subprocess’s standard input. This way, we move some of the responsibilities into our own program: for example, because there is no file, we don’t have to rely on the bash command to handle it correctly, or for the OS to give us read permission, etc.

An even better approach, at least until languages play nicely together, would keep the script as a separate file, but read it in at compile time. This way, the difficulties associated with locating and reading the file are only dealt with once, in a nicely self-contained way (i.e. if there’s a problem, we just abort the compilation; we’re not in the middle of some big job that needs tearing down correctly). Note that while Haskell doesn’t have a “compile-time phase” (unlike, say, Zig), we can use Template Haskell to mess around with the syntax tree during compilation, which seems to be enough for our purposes.

If reading in the script is preferable, why don’t we go one step further and read in the contents of the bash command too? We can treat the resulting value as a serialised form of some more-Haskell-friendly ELF datatype, which we can interpret using some form of free construction; with appropriate functions for running, providing an environment, etc. We can then ensure that the correct bash will be given the correct script, executed using the correct semantics, in an isolated/testable way, and so on, all without having to interact with the outside world at run-time.

This idea struck me as a little outlandish to begin with; but then I remembered that the Truffle implementation of Ruby does a similar thing, although they implement C and Ruby in the same VM, rather than implementing ELF (or similar) in Haskell. In fact, their code can actually be faster than just calling out to C via a foreign function interface, since their JIT compiler has full access to all of the Ruby and C, which allows optimisations to fuse things together.

A similar thing could be done with my hypothetical ELF embedding: we could treat the bash code as a “black box”; we could instead “deserialise” it into a datastructure with semantically-meaningful operations, which can be optimised in some way.

In the above example, the bash executable, the script and the environment (provided by our interpreter function) are all known at compile time, so we can partially-apply bash to the script, supercompiling all of the parsing functionality to produce a combined ‘bash+script’ which knows exactly what it needs to execute when invoked. More aggressive optimisations might use the known features of the pure environment to shortcut the highly dynamic nature of bash, since most of the features will be known to be unused.

A similar approach can be taken to embed with other languages, e.g. Python, Perl, or indeed Haskell.

Note that I’ve used Haskell here as an example; if I were really going to implement such a thing, I’d maybe target something more low-level (e.g. without garbage collection, RTS, etc.) so that I could tease out as much performance as possible. Languages like Zig, Terra, Nim or Rust. Another possibility would be to use the host language (e.g. Haskell) as a code generator, outputting an ELF executable. In the same way that combinators can splice together strings, or plumb together parsers, we could make combinators for fusing together executables.

This is pretty much an extension of my earlier thoughts on annotating values as pure to allow more aggressive optimisation.