Blog Dump 8: Building a Language

Posted on by Chris Warburton

We want a high-level language. We want it to have objects. We want it to be dynamic (latent typing). We would like it to be functional. We want it to be pure; both in an object-oriented sense (everything is an object, there are no ‘built-in types’ or objects which differ in implementation from user-created objects) and in a functional sense (we don’t want implied state or mutable values, we want referential transparency, we want functions which take one object and return one object, we want currying).

To do this we should use an Id object system:

The master object is defined as a reference to another object (eg. a pointer) and some (unknown) contents. Thus we can say the type of an object is:

type(object) := (reference × contents)

This object can be used as a prototype by a function we’ll call clone:

clone :: object -> object
clone :: (reference × contents) -> (reference × contents)

The cloned object’s reference points to the cloned-from object, whilst the contents of the object should be independent of any other. We can’t initialise it with default values, since they’re to be immutable and thus we’d never get a non-default object! We must therefore forget about this object’s contents until they’re set or accessed, which requires us to be lazy in some respect. We don’t neccessarily need to use lazy computation, delaying all computation until we know everything we need to. We could deal with memory allocation and such if we knew the size of the object, or if we had functions in place to reallocate memory when the contents become more defined.

Every reference is an object, every reference points to an object and every object contains a reference. This gives a large number of objects, but we will not reduce this in the design phase.

To specify the behaviour of an object we look in its vtable, which is what the reference points to. A vtable is also an object, since we want a pure object system. Thus we have to make the root object somehow, ensuring that the mechanisms used to do so can be coerced into objects afterwards. This, once again, relies on a lazy approach to data:

# Get an ‘empty’ (ie. undefined) object
object = make_raw_object()
# Make another empty object
object_vtable = make_raw_object()
# Use this second object to describe the behaviour of ‘object’
give_reference(object, ref_to(object_vtable))
# Now make a third empty object
vtable_vtable = make_raw_object()
# This third object describes the behaviour of behaviour objects
give_reference(object_vtable, ref_to(vtable_vtable))
# Since the third object is a behaviour, it describes itself too
give_reference(vtable_vtable, ref_to(vtable_vtable))

Now we have the Id object model: everything is an object, our objects have a reference to some behaviour (vtable) along with some contents. Each object has a behaviour (object has object_vtable, object_vtable has vtable_vtable and vtable_vtable has vtable_vtable). We have used functions ‘make_raw_object’, ‘ref_to’ and ‘give_reference’. Not only are these are impure (they are not in our object model, and are probably not even objects if we’re using a low-level starting point), but they also break the object system by allowing new object hierarchies to be defined! Thus these should not exist in the final system, which is OK since they are not needed after we have bootstrapped the object system; thus they can live in a bootstrap compiler which can be thrown away once it’s compiled a native compiler.

Now we have a skeletal object hierarchy
object -> number -> integer -> natural
object_vtable -> num_vtable -> int_vtable -> nat_vtable

object -> vtable -> num_vtable -> int_vtable -> nat_vtable
object_vtable -> vtabl_vtabl -> vtabl_vtabl -> vtabl_vtabl -> vtabl_vtabl

object -> vtable -> vtable_vtable
object_vtable -> vtable_vtable -> vtable_vtable
Now we loop the hierarchy around so that the vtable’s reference points to vtable_vtable:

This is where the post trails off. From a quick skim I’m pretty certain I’ve got Id right so far, although I’ve stopped short of mentioning anything directly functional (eg. function objects, continuations, etc.). Maybe I’ll finish this off some time.