# SK logic in egglog: part 1, encoding and reduction

Posted on by Chris Warburton

Update: I’ve now published part 2

I’ve been really excited to try out egglog, since it seems like a great complement and application for the theory exploration systems I spent a few years playing with.

Egglog can be thought of as a powerful database, combining Datalog (a restricted form of Prolog, which is essentially first-order logic) and equality-saturation. The latter is most interesting to me, since it represents expressions via a graph of equivalence classes; very similar to the internal representation that QuickSpec v1 uses to search for conjectures (v2 switched to generating individual expressions on-the-fly, more like IsaCoSy; although much faster). I’ve played with equational systems like Maude before, but egglog seems like a sweet-spot in the design space.

You can try egglog using their online sandbox, or you can use the `egglog` attribute of Nixpkgs.

## A trivial example

Let’s define a very simple datatype called `Expr`, containing values of just two forms:

• A constant value `C`
• A function `Id` applied to some other `Expr` (like `(Id C)`, `(Id (Id C))`, etc.)

This is actually just a Peano/Grassman encoding of the natural numbers, but I’m using different names since we’re going to deviate from that definition in a moment. Instead, think of it as a very simple programming language, with two sorts of expression.

### A simple encoding

The most direct way to represent this in egglog is the following:

``````(datatype Expr
(Id Expr))
(declare C Expr)``````

Running this gives:

``````[INFO ] Declared sort Expr.
[INFO ] Declared function Id.
[INFO ] Declared function v0___.
``````

The `datatype` keyword is a shorthand for declaring a `sort` (i.e. a type), and for defining any number of constructors: `function`s which accept inputs and return a value of this `sort`. Here we can see that `Id` is such a `function`. Since `C` is a constant, rather than a `function`, we `declare` it separately; however, this results in an auto-generated nullary function with the unhelpful name `v0___`. That may be fine for many applications, but for these explorations I’d like something more readable.

### An encoding with names

In order to preserve our constant’s name, we’ll embed it as a `String` inside its value. To implement this we’ll use an extra constructor `E`, giving us:

``````(datatype Expr
(Id Expr)
(E  String))``````

Now our constant value can be represented using `(E "C")`, and we can define a shorthand for it using egglog’s `let`:

``(let C (E "C"))``

Now we can tell when a value is our `C` constant, since it will contain the string `"C"`. This can be seen if we ask egglog to draw its “database” of known values, using the `--to-svg` argument:

The inner box (labelled `E("C")`) is the constant value we defined using `let`. The outer box is an equivalence class, whose contents are known to be equal to each other. So far egglog only knows about our value `C`, so it’s all alone in the only equivalence class.

Note: Yes, it’s annoying that the graphical output writes parentheses like `E("C")`, whilst the actual egglog language uses prefix-style `(E "C")`. That seems to be a quirk of the rendering; all of the actual code will always use the latter style!

### How to run `Expr`essions

Next we’ll tell egglog how to ‘run’ our language, by making `Id` act like an identity function, i.e. returning the given `Expr` as its output. We can do that using a `rewrite` rule, referring to the argument `Expr` using a variable `x`:

``(rewrite (Id x) x)``

With this rule, wrapping an `Expr` value in any number of calls to `Id` results in an equivalent `Expr`. Let’s test this by defining a value that wraps `C` in a few calls to `Id`, and `check` that it’s still equal to `C`:

``````(let thrice
(Id (Id (Id C))))``````
``(check (= thrice C))``

Running the above through egglog gives the following:

``````[INFO ] Declared sort Expr.
[INFO ] Declared function Id.
[INFO ] Declared function E.
[INFO ] Declared rule (rewrite (Id x) x).
[ERROR] Check failed:
(= thrice C)
``````

Uh oh, what went wrong?

#### Inspecting egglog

Let’s show the value of `thrice`, using `(query-extract thrice)`:

``````[INFO ] Declared sort Expr.
[INFO ] Declared function Id.
[INFO ] Declared function E.
[INFO ] Declared rule (rewrite (Id x) x).
[INFO ] extracted with cost 5: (Id (Id (Id (E "C"))))
(Id (Id (Id (E "C"))))
``````

This shows why our `test` failed: `thrice` still has three `Id` wrappers, so egglog doesn’t know it’s equal to `C`. We’ll see why our `rewrite` rule hasn’t been applied in the next section; first we’ll take a look at egglog’s database:

There are now multiple values and multiple (equivalence) classes. Those values with arrows coming out are `function` calls (not the actual functions themselves), with the arrows pointing to the inputs used by that call. For example, the value pointing from `Id` to the class containing `E("C")` represents the subexpression `(Id C)` in our `thrice` value; whilst the value pointing to that (from another `Id` call) represents the subexpression `(Id (Id C))`, and so on.

Notice that arrows point to a class, rather than a specific value. The class containing `C` represents «anything equivalent to `C`», so the value representing the call `(Id C)` is actually more general, representing «`Id` applied to «anything equivalent to `C`»». That value lives in another equivalence class, for «anything equivalent to «`Id` applied to «anything equivalent to `C`»»». This builds up, so the value for `(Id (Id C))` represents the more general «`Id` applied to «anything equivalent to «`Id` applied to «anything equivalent to `C`»»»», and so on. The resulting “e-graph” (equation graph) is a very compact way to store complex inter-relationships between expressions.

#### Actually running `Expr`essions

The reason our `rewrite` rule didn’t change anything is that we didn’t tell egglog to run! We do this by calling `run`, but we have to decide how much to run: egglog rules can cause expressions to grow and grow, so we need to provide a cutoff. Due to egglog’s unification/canonicalisation mechanism, we’ll see that only a single application of our `rewrite` rule will be needed, which we can do with `(run 1)`:

``````[INFO ] Declared sort Expr.
[INFO ] Declared function Id.
[INFO ] Declared function E.
[INFO ] Declared rule (rewrite (Id x) x).
[INFO ] Ran schedule (repeat 1 (run)).
[INFO ] Report: Rule (rule ((= rewrite_va...: search 0.000s, apply 0.000s
Ruleset : search 0.000s, apply 0.000s, rebuild 0.000s

``````

Now we can try our `check` and `query-extract` again:

``````[INFO ] Declared sort Expr.
[INFO ] Declared function Id.
[INFO ] Declared function E.
[INFO ] Declared rule (rewrite (Id x) x).
[INFO ] Ran schedule (repeat 1 (run)).
[INFO ] Report: Rule (rule ((= rewrite_va...: search 0.000s, apply 0.000s
Ruleset : search 0.000s, apply 0.000s, rebuild 0.000s

[INFO ] Checked fact [ConstrainEq("thrice", "C")].
[INFO ] extracted with cost 2: (E "C")
(E "C")
``````

Success! Now let’s look at the database:

Now that egglog has run our rewrite rule, it’s discovered that all of the previous values are equivalent. Furthermore, the number of values has collapsed to just two: representing `C` and «`Id` applied to «anything equivalent to `C`»». The previous represention of `(Id (Id C))` has unified with that of `(Id C)`, since both of their inputs are now encompassed by the same class «anything equivalent to `C`»; and likewise for `(Id (Id (Id C)))`. In fact, that single `Id` call now represents any number of wrappers around `C`, showing them all to be equivalent!

## A larger example: SK logic

Now we’ve seen the basics of egglog, we can use it for a real programming language; specifically SK combinatory logic! This time we’ll use a binary constructor `App`, to represent the application of one combinator to another; and use the same `String` trick to represent two constants `S` and `K`:

``````(datatype Com
(App Com Com)
(C String))
(let S (C "S"))
(let K (C "K"))``````

SK expressions are evaluated according to the following rules, which reduce a `K` once it’s applied to two expressions; and an `S` once it’s applied to three:

``````(rewrite (App (App K x) y) x)
(rewrite (App (App (App S x) y) z) (App (App x z) (App y z)))``````

SK logic is a universal programming language, although it can be a bit unwieldy to read and write due to the complete absence of any naming mechanism. We’ll use our `String`-embedding trick to define useful constants, and use a `union` rule to unify their name with their SK definition.

For example, here is the identity function. Sometimes this is implemented using its own rewrite rule (giving SKI logic), but that’s redundant since the `S` and `K` rules are all we need:

``````(let I (C "I"))
(union I (App (App S K) K))``````

Here’s a Church-encoding of booleans and if/then/else:

``````(let TRUE  (C "TRUE"))
(let FALSE (C "FALSE"))
(let IF    (C "IF"))
(union TRUE  K)
(union FALSE (App S K))
(union IF    I)``````

We’ll test these using the following expression:

``````(let test
(App (App (App IF (App I FALSE)) S) (App K TRUE)))``````

Let’s walk through how we expect this to evaluate, assuming we’ve got the SK definitions of each correct:

• `(App I FALSE)` should equal `FALSE`, since `I` should act like the identity function.
• `(App IF FALSE)` should accept two arguments and return the second, like the “else branch” in other languages.
• In this case that “else branch” is `(App K TRUE)`
• `(App K TRUE)` won’t reduce any more (since it doesn’t have enough arguments), so it should be the final result (or equivalent).

Before running anything, let’s query the initial value of `test` and take a look at the database:

``````[INFO ] Declared sort Com.
[INFO ] Declared function App.
[INFO ] Declared function C.
[INFO ] Declared rule (rewrite (App (App K x) y) x).
[INFO ] Declared rule (rewrite (App (App (App S x) y) z) (App (App x z) (App y z))).
[INFO ] extracted with cost 17: (App (App (App (C "IF") (App (C "IF") (C "FALSE"))) (C "S")) (App (C "TRUE") (C "TRUE")))
(App (App (App (C "IF") (App (C "IF") (C "FALSE"))) (C "S")) (App (C "TRUE") (C "TRUE")))
``````

Note: Calls with multiple inputs are shown with multiple arrows emerging; the order of the arrows from left-to-right matches the order of the inputs.

So far the only equivalences egglog knows are those constants we explicitly defined with `union`. The value of `test` looks a little funny, since some of those definitions overlap and egglog may pick different representatives from the equivalence classes than the ones we originally wrote. The `rewrite` rules for `S` and `K` haven’t reduced anything yet. Let’s see what happens after using `(run 1)`:

``````[INFO ] Declared sort Com.
[INFO ] Declared function App.
[INFO ] Declared function C.
[INFO ] Declared rule (rewrite (App (App K x) y) x).
[INFO ] Declared rule (rewrite (App (App (App S x) y) z) (App (App x z) (App y z))).
[INFO ] Ran schedule (repeat 1 (run)).
[INFO ] Report: Rule (rule ((= v5___ (App...: search 0.000s, apply 0.000s
Rule (rule ((= v10___ (Ap...: search 0.000s, apply 0.000s
Ruleset : search 0.000s, apply 0.000s, rebuild 0.000s

[INFO ] extracted with cost 17: (App (App (App (C "IF") (App (C "IF") (C "FALSE"))) (C "S")) (App (C "TRUE") (C "TRUE")))
(App (App (App (C "IF") (App (C "IF") (C "FALSE"))) (C "S")) (App (C "TRUE") (C "TRUE")))
``````

A little simplification has occurred, combining some of the `App` calls (e.g. look at those which apply `I`), but the changes are not significant enough to affect the value of `test`. Let’s try again with `(run 2)`:

``````[INFO ] Declared sort Com.
[INFO ] Declared function App.
[INFO ] Declared function C.
[INFO ] Declared rule (rewrite (App (App K x) y) x).
[INFO ] Declared rule (rewrite (App (App (App S x) y) z) (App (App x z) (App y z))).
[INFO ] Ran schedule (repeat 2 (run)).
[INFO ] Report: Rule (rule ((= v5___ (App...: search 0.000s, apply 0.000s
Rule (rule ((= v10___ (Ap...: search 0.000s, apply 0.000s
Ruleset : search 0.000s, apply 0.000s, rebuild 0.000s

[INFO ] Rebuild before command:          0ms
[INFO ] extracted with cost 11: (App (App (C "FALSE") (C "S")) (App (C "TRUE") (C "TRUE")))
(App (App (C "FALSE") (C "S")) (App (C "TRUE") (C "TRUE")))
``````

Again, some more simplification has occurred: this time the `IF` expression has simplified, since egglog has discovered that `(App I FALSE)` and `(App IF (App I FALSE))` are both equivalent to `FALSE`.

(Notice that this is similar to our `Expr` example, since `IF` is defined as `I`, so `(App IF (App I FALSE))` is a bit like `(Id (Id C))`)

Using `(run 3)` leads to further changes, but no direct effect on the value of `test`:

``````[INFO ] Declared sort Com.
[INFO ] Declared function App.
[INFO ] Declared function C.
[INFO ] Declared rule (rewrite (App (App K x) y) x).
[INFO ] Declared rule (rewrite (App (App (App S x) y) z) (App (App x z) (App y z))).
[INFO ] Ran schedule (repeat 3 (run)).
[INFO ] Report: Rule (rule ((= v5___ (App...: search 0.000s, apply 0.000s
Rule (rule ((= v10___ (Ap...: search 0.000s, apply 0.000s
Ruleset : search 0.000s, apply 0.000s, rebuild 0.000s

[INFO ] extracted with cost 11: (App (App (C "FALSE") (C "S")) (App (C "TRUE") (C "TRUE")))
(App (App (C "FALSE") (C "S")) (App (C "TRUE") (C "TRUE")))
``````

Finally `(run 4)` is able to fully reduce `test`. The database seems to be saturated now, since running for longer does not lead to any more changes:

``````[INFO ] Declared sort Com.
[INFO ] Declared function App.
[INFO ] Declared function C.
[INFO ] Declared rule (rewrite (App (App K x) y) x).
[INFO ] Declared rule (rewrite (App (App (App S x) y) z) (App (App x z) (App y z))).
[INFO ] Ran schedule (repeat 4 (run)).
[INFO ] Report: Rule (rule ((= v5___ (App...: search 0.000s, apply 0.000s
Rule (rule ((= v10___ (Ap...: search 0.000s, apply 0.000s
Ruleset : search 0.000s, apply 0.000s, rebuild 0.000s

[INFO ] extracted with cost 5: (App (C "TRUE") (C "TRUE"))
(App (C "TRUE") (C "TRUE"))
``````

### Beyond β-equivalence

Our rewrite rules for reducing `S` and `K` expressions correspond to the β-reduction rule of λ-calculus, so we can say that an expressions like `(App (App K (App S S)) S)` is β-equivalent to the expression `(App S S)` (and vice-versa). There are other forms of equivalence we might want to implement, although the α-equivalence and η-equivalence of λ-calculus aren’t directly applicable to SK logic (since it lacks names and abstractions).

I am particularly interested in implementing extensional equivalence, which holds for two expressions which return equal results when applied to any argument; e.g. if `(App f x)` is equivalent to `(App g x)` regardless of `x`, then `f` and `g` are extensionally equivalent. In the example above, the expressions `(App FALSE K)` and `(App FALSE S)` are extensionally-equivalent; in fact, all expressions of the form `(App FALSE something)` are equivalent, since `FALSE` (defined as `(App S K)`) will entirely ignore its first argument. Whilst `(App FALSE K)` appears in the same equivalence class as `I`, egglog hasn’t put `(App FALSE S)` in that class.

Unfortunately my attempts to capture this in egglog so far have ended up collapsing the entire SK language into one big equivalence class, which is not particularly useful. Hence I’ve decided to punt on that for now, and stick a “part 1” in the title. Part 2 begins my investigation of this problem, using the excellent testing ecosystem of Haskell!