Concatenative Languages

Posted on 2012-05-13 by Chris Warburton

I've been looking into the rather pompous-sounding genre of concatenative programming languages recently. They are called "concatenative" (essentially 'joining together') because for any 2 programs A and B, the result of run(AB) is the same as the result of run(run(A)run(B)). This also works in reverse, so we can split any program A at any point; if we call the portion on the left side B and the right side C, then run(run(B)run(C)) = run(A). One consequence of this property is that we can keep splitting up our programs until we have one part for every available processor (which could be a lot more than you may think!), run them all at the same time one-per-processor, then concatenate the results and run again (perhaps also in parallel). This is a lot nicer than current mainstream parallel programming, which I've had first-hand experience with!

This remarkable property of concatenative languages is due to them working in a subtley different way than more familiar languages. In most languages, we create self-contained bundles of usefulness called functions, and we apply these functions to things. For example, let's say we have a "plus" function and a "square" function, we can define "the square of the sum of 2 numbers" by making a function which takes 2 numbers as input, let's call them "x" and "y", and uses the other 2 functions to create the output. We write this as "λx.λy.square(sum(x)(y))". Because we apply functions to (explicit) inputs, we call these applicative functions.

In concatenative languages, we don't have any (explicit) input. Instead of shuffling around values and applying functions, instead we don't have any values and the entirety of our program is the creation of one big function. Rather than applying our functions to arguments, we compose them together to form aggregate functions. Thus we say that concatenative languages use compositional functions. In contrast to our applicative example, "the square of the sum of 2 numbers" is simply "sum square". It stands in contrast to the applicative case, since the actual numbers themselves don't appear in our definition, and rather than reducing down our hierarchy of functions through applying them, instead we're building up new, larger functions.

Just as a Universal computing language is possible with just two applicative functions, known as S and K, along with parentheses; there is likewise a Universal computing language with just two compositional functions, known as Cake and K, along with quotes. Brent Kerby explains this quite nicely.

Whilst we can create these functions in languages like Cat and Joy, based on the existing library of functions built in to those languages, to me that defeats the point. Why use a multitude of complex functions to build a couple of simple ones? The converse is, of course, the reductive ideal: make an entire language using just these 2 functions. That's what I've done. The language itself doesn't seem worthy of giving a name, so I'll just call it Stack.

Stack programs are strings of the symbols "k", "c" (the functions), "[" and "]" (the quotes). These get compiled into a Javascript function, which takes an array of functions as its input and produces an array of functions as output. These arrays are accessed in a "last in, first out" (LIFO) manner, which is why I've called it Stack.

Here's a Stack interpreter: type in a combination of ‘c’, ‘k’, ‘[’ and ’]’ to define a Stack program, and hit “Run” to have it evaluated on an empty stack. The result will be shown below, in the form of Javascript source code. Note that it's easy to get a stack underflow error, due to having too few functions on the stack. It's also possible to have too many levels of recursion, which causes Javascript to die. This second error is unfortunate, and may be worked around in future versions. Also note that there's no Currying, and thus the concatenative property doesn't hold in its entirety.