# SK logic in egglog: part 2, property-checking extensionality

Posted on by Chris Warburton

Update: part 3 is here

This is a spiritual follow-on from part 1 of my experiments with SK logic in egglog: that post serves as the motivation for this one, but the content here is somewhat standalone. It’s the result of a rabbit hole I entered, after getting results that were so unexpected and confusing that it made me question some of the assumptions I’d been making in those experiments.

I was modelling a small programming language called SK (combinatory) logic; and whilst I’m no stranger to being wrong about my code, I had chosen SK since it’s very simple and very familiar. Implementing SK is like writing “hello world”: something I’ve done countless times in all sorts of languages and paradigms, hence why it was an obvious choice when I wanted to learn egglog. In fact, combinatory logic even appears in egglog’s `examples/` folder! However, an innocuous-looking one-line extension brought the whole thing crashing down: all structure vanished and every term collapsed into one big blob, where everything equalled everything else.

Note for non-logicians: That’s not good!

Despite several rewrites, trying various different encodings, and much scouring of egglog’s Rust source, I kept running into the same problem. This made me seriously doubt my understanding, particularly of free variables. If I could spend years immersed in the worlds of mathematics, computer science, post-graduate academia, and software development; yet fail to grasp such a fundamental aspect in all that time; then what else might I be wrong about? Am I a complete fraud?

Thankfully I also trained as a physicist, so I know what to do when theoretical understanding goes awry: experiment! Or, as software engineers call it: test! I wanted to thoroughly test the assumptions I was making in those egglog experiments, so I went crawling back to the comfort of Haskell where I could use my favourite hammer: property-based testing. This would also let me try out the new falsify package (which I’ve blogged about before): in essence, to try and “falsify myself”, to figure out what I’d misunderstood.

This page gives the background of what I was attempting; along with executable Haskell code, properties it should satisfy, and `falsify` checks of those properties. The latter are defined using active code.

## The setup

I’ll explain some of this jargon below, but in short: that problematic line of code which broke everything was my attempt to implement extensional equality; the idea that if two things always behave in the same way, then we can consider them equal. To specify what “always” means, we need some notion of universal quantification: that doesn’t quite fit into the formal system of egglog, so my idea was to approximate universally-quantified variables using symbolic execution. This should be a conservative approach: failing to spot some equalities, but never mistakenly claiming two things to be equal. Instead, it seemed to do the extreme opposite: always claiming that everything is equal!

Preamble boilerplate…

Here’s the initial boilerplate for the script; it just needs GHC with `falsify` in its package database. For more details on how it’s executed, see this page’s Markdown source.

``````module Main (main) where

import Control.Applicative ((<|>))
import Control.Exception (assert)
import Data.Maybe (isJust)
import GHC.Natural (Natural)
import System.Environment (getEnv)
import Test.Falsify.Generator (Gen, Tree(..))
import qualified Test.Falsify.Generator as Gen
import Test.Falsify.Predicate (satisfies)
import Test.Falsify.Property (testGen)
import Test.Falsify.Range (Range)
import qualified Test.Falsify.Range as Range
import Test.Tasty (defaultMain)
import Test.Tasty.Falsify (testProperty)

-- Useful helpers. These are available in libraries, but I want this code to be
-- self-contained (other than falsify)

-- | Like 'uncurry', but for three arguments
uncurry3 :: (a -> b -> c -> d) -> (a, b, c) -> d
uncurry3 f (a, b, c) = f a b c

-- | Combine two results into a single tuple
tuple2 :: Applicative f => f a -> f b -> f (a, b)
tuple2 a b = (,) <\$> a <*> b

-- | Combine three results into a single tuple
tuple3 :: Applicative f => f a -> f b -> f c -> f (a, b, c)
tuple3 a b c = (,,) <\$> a <*> b <*> c

-- | Combine four results into a single tuple
tuple4 :: Applicative f => f a -> f b -> f c -> f d -> f (a, b, c, d)
tuple4 a b c d = (,,,) <\$> a <*> b <*> c <*> d

-- | Like a list, but never ends.
data Stream a = Cons { sHead :: !a, sTail :: Stream a } deriving (Functor)

instance Show a => Show (Stream a) where
show (Cons x _) = "[" ++ show x ++ ",...]"

sPrefix :: [a] -> Stream a -> Stream a
sPrefix []     ys = ys
sPrefix (x:xs) ys = Cons x (sPrefix xs ys)

-- | Drop elements from a 'Stream' which don't satisfy the given predicate.
sFilter :: (a -> Bool) -> Stream a -> Stream a
sFilter p (Cons x xs) = (if p x then Cons x else id) (sFilter p xs)

-- | Extract the nth element of a 'Stream'.
sAt :: Integral n => n -> Stream a -> a
sAt i (Cons x xs) = if i <= 0 then x else sAt (i - 1) xs

-- | All prefixes of the given Stream, e.g. [], [x], [x, y], [x, y, z], ...
sInits :: Stream a -> Stream [a]
sInits (Cons x xs) = Cons [] ((x:) <\$> sInits xs)

-- | All String values, in order of length
allStrings :: Stream String
allStrings = go 0
where strs 0 = [""]
strs n = [c:s | c <- [minBound..maxBound], s <- strs (n - 1)]
go   n = sPrefix (strs n) (go (n + 1))

-- | Like a list, but only stores data at the end. Useful for representing
-- | general recursion.
data Delay a = Now a | Later (Delay a) deriving (Eq, Functor, Ord, Show)

instance Applicative Delay where
(Now   f) <*> x =        f <\$> x
(Later f) <*> x = Later (f <*> x)
pure            = Now

(Now   x) >>= f = f x
(Later x) >>= f = Later (x >>= f)

-- | Unwrap (up to) a given number of 'Later' wrappers from a 'Delay' value.
runDelay :: Integral n => n -> Delay a -> Delay a
runDelay n x | n <= 0 = x
runDelay _ x@(Now _)  = x
runDelay n (Later x)  = runDelay (n - 1) x

-- | Try to extract a value from the given 'Delay', or use the given default
runDelayOr :: Integral n => a -> Delay a -> n -> a
runDelayOr def x n = case runDelay n x of
Now   x' -> x'
Later _  -> def``````

## SK (combinatory logic)

Combinatory logic, which I’ll shamelessly abbreviate as “SK”, is a particularly simple programming language. It can be written entirely with two symbols, traditionally called `S` and `K`, along with parentheses for grouping symbols together. We’ll represent SK expressions using the following Haskell datatype called `Com` (this representation also matches my definition in egglog):

``data Com = C String | App Com Com deriving (Eq, Ord)``

That `deriving` clause asks Haskell to automatically implement some useful interfaces:

• `Eq` provides the `==` function. Deriving this lets us check whether two `Com` values are identical (same structure and same `String` data).
• `Ord` provides comparisons like `<`. Deriving this gives us a lexicographic ordering, but the details aren’t important: as long as we have some implementation of `Ord`, we can use efficient `Set` datastructures.

We’ll implement the `Show` interface ourselves, to render expressions in traditional SK notation:

``````instance Show Com where
show (C c)             = c
show (App x (App y z)) = show x ++ "(" ++ show (App y z) ++ ")"
show (App x y)         = show x ++ show y``````

We represent the “primitives” `S` and `K` using the `C` constructor (annoyingly, value names in Haskell must begin with a lowercase letter; so we call these `s` and `k` instead):

``````-- | Primitive SK expressions
s, k :: Com
s = C "S"
k = C "K"``````

`App` groups expressions together two at a time, like a pair of parentheses. Traditional SK notation only shows parentheses “on the right”, e.g. `App s (App s k)` is written `S(SK)`, but `App (App s k) k` is simply `SKK`. The following functions extract the left and right children of an `App` value (returning `Maybe`, in case they’re given a primitive expression):

``````left, right :: Com -> Maybe Com
left  (App x _) = Just x
left  _         = Nothing
right (App _ y) = Just y
right _         = Nothing``````

### Anatomy of SK expressions

We can refer to various parts of an SK expression using the following terminology:

• The “root” is the top-most constructor.
• The “head” is the left-most leaf.
• The “spine” is the chain of left-nested `App` constructors between the head and the root. If the expression is a leaf, its spine is empty.
• The “arguments” (or “args”) of that head are the right-hand children hanging off the spine, ordered from the head to the root. If the spine is empty, there are no arguments.

Examples can be seen in the figure below. We will often describe an expression as “applying 〈its head〉 to 〈its arguments〉”.

### Running SK expressions

We can “run” SK expressions using two rewriting rules:

• An expression applying `K` to two args can be replaced by the first arg.
• An expression applying `S` to three args can be replaced by an expression which applies the first arg to the third arg and «the second applied to the third».

We can implement these as Haskell functions, returning `Nothing` if the rule didn’t match:

``````-- | Replaces Kxy with x, otherwise 'Nothing'
stepK :: Com -> Maybe Com
stepK (App (App (C "K") x) _) = Just x
stepK _                       = Nothing

-- | Replaces Sxyz with xz(yz), otherwise 'Nothing'
stepS :: Com -> Maybe Com
stepS (App (App (App (C "S") x) y) z) = Just (App (App x z) (App y z))
stepS _                               = Nothing``````

Notice that arguments can be any `Com` value, represented using the Haskell variables `x`, `y` and `z`; but only `App`, `s` and `k` have any effect on the output (this will be important for symbolic execution later!).

Remarkably, these two rules make SK a universal (Turing-complete) programming language! The following `step` function tries to apply both of these rules. It also applies the rules recursively to sub-expressions of `App`: note that we try stepping both children at once, to prevent any reducible parts of an expression getting “stuck” waiting for their sibling to finish. `Nothing` is returned when neither rule matched and no sub-expressions were rewritten; we say the expression is in normal form:

``````-- | Attempt to reduce a K or S combinator, or the children of an 'App'.
-- | 'Nothing' if the argument is in normal form.
step :: Com -> Maybe Com
step c = stepK c <|> stepS c <|> app l' r' <|> app l' r <|> app l r'
where app x y  = App <\$> x <*> y
(l , r ) = (left c    , right c   )
(l', r') = (l >>= step, r >>= step)``````

Next we’ll need to iterate the `step` function until its argument reaches normal form. We’ll distinguish normalised expressions using a separate type called `Normal`:

``````-- | Wraps Com values which are assumed to be in normal form
newtype Normal = N Com deriving (Eq, Ord)

instance Show Normal where
show = show . toCom

-- | Coerce a 'Normal' back to a 'Com', i.e. forget that it's in normal form.
toCom (N c) = c

-- | 'step' the given 'Com': if it worked, returns that in 'Left'; otherwise
-- | returns the original value (now 'Normal') in 'Right'.
toNormal :: Com -> Either Com Normal
toNormal c = case step c of
Just c' -> Left     c'
Nothing -> Right (N c)``````
Understanding `Normal`

The `Normal` type has the invariant that it only contains `Com` values which are in normal form. More specifically, we can say `step . toCom` is (extensionally) equal to `const Nothing`. Values of type `Normal` act as evidence that the `Com` they contain is in normal form (i.e. that it will return `Nothing` when given to `step`).

This invariant is not guaranteed to be the case; e.g. we can violate it by writing code like `N (App (App k s) k)` (since `KSK` reduces to `S` via the `stepK` rule). To prevent such violations, we avoid using the `N` constructor directly: instead always using the `toNormal` function, which only returns a `Normal` value when it is found to return `Nothing` from `step` (i.e. when our invariant holds).

`toNormal` is a smart constructor: a function which only returns values of the desired type when it’s checked they’re valid. Ideally we would enforce usage of `toNormal`, by encapsulating the `N` constructor, making the `Normal` type opaque. That’s usually achieved in Haskell using modules, but I want the code for this post to all live in one module (to make compilation and execution easier).

`toNormal` tells us when to stop iterating, but since SK is a universal programming language, there’s no way to know beforehand if it ever will. We’ll account for this by wrapping such undecidable computations in a `Delay` type:

``````-- | Step the given 'Com' until it reaches 'Normal' form.
reduce :: Com -> Delay Normal
reduce c = case toNormal c of
Left c' -> Later (reduce c')
Right n -> Now n``````

### Normal equivalence

The rules of SK are confluent, meaning that every expression has at most one `Normal` form, and applying the reduction rules in any order, to any parts of an expression, will not change its `Normal` form or the ability to reach it using those rules. As a consequence, all SK expressions with the same `Normal` form are equivalent and interchangable.

To see this, consider two expressions X and Y with the same `Normal` form N. Whenever X occurs inside a larger expression, confluence allows us to replace X with N without affecting the overall result. Now imagine we instead replace X with Y: whatever that gives us, it will be unaffected (thanks to confluence) if we reduce Y to its `Normal` form. Since the `Normal` form of Y is also N, and we already deduced that using N is equivalent to using X, the result of using Y must also be equivalent to using X. Hence any expressions with the same `Normal` form are equivalent, in the sense that they can be interchanged inside any expression without altering its `Normal` form.

We’ll call this relationship “`Normal` equivalence”. It’s undecidable in general, so it also gets wrapped in `Delay`:

``````-- | Result of comparing two values
data Compared a = Same a | Diff a a deriving (Show)

same, diff :: Compared a -> Bool
same (Same _)   = True
same _          = False
diff (Diff _ _) = True
diff _          = False

-- | Uses (==) to find identical values
comparer :: Eq a => a -> a -> Compared a
comparer x y = if x == y then Same x else Diff x y

-- | Compare the 'Normal' forms of the given 'Com' values.
normalEq :: Com -> Com -> Delay (Compared Normal)
normalEq x y = comparer <\$> reduce x <*> reduce y``````

## Property-based testing

Let’s test this `normalEq` function, to see whether it actually behaves in the way our theory of SK predicts it should. For example, every equality relation should have the “reflexive property”, meaning that values should be equal to themselves; so any call like `normalEq foo foo` should always result in `True`. Unfortunately, such results are buried in a `Delay` which prevents us testing them directly. `Delay` values could be a never-ending chain of `Later` wrappers, like the following:

``````loop :: Delay a
loop = Later loop``````

Hence any attempt to extract a result from them must eventually give up. We’ll work around this using a parameter that counts down until we give up; an approach known as fuel:

``````-- | Represents a parameter for "when to give up"
type Fuel = Word``````

Now we can write a predicate (a function returning a boolean) to check whether its argument is `normalEq` to itself, before its `Fuel` runs out:

``````normalEqToItself :: (Fuel, Com) -> Bool
normalEqToItself (n, x) = runDelayOr False (same <\$> normalEq x x) n``````

We can use this predicate to make statements, by quantifying its argument. Tests commonly use existential quantification, which asserts that some argument satisfies `normalEqToItself` (AKA “example-based” testing). For example:

``main = assert (normalEqToItself (0, k)) (putStrLn "PASS")``
``````PASS
``````

However, that isn’t really what we want to say: reflexivity doesn’t just apply to a few hand-picked examples, it says that every argument satisfies `normalEqToItself`. Talking about “every argument” is universal quantification. Universally-quantified predicates are called “properties”, so this approach is known as property-based testing.

To test a property, we give it to a “property checker” which searches for counterexamples: argument values which cause the assertion to fail. If no counterexample can be found, the test passes. Search techniques range from simple enumeration all the way up to sophisticated AI algorithms. Haskell has many property checkers, beginning with the wonderful QuickCheck package. We’ll use the state-of-the-art for 2024, which is falsify.

### Data generators

`falsify` searches through argument values at random, sampling them from a given “generator” with the Haskell type `Gen a` (for generating values of some type `a`). We can build up generators using familiar interfaces (`Applicative`, `Monad`, etc.); from primitives such as `Gen.inRange`, which samples fixnums from a `Range` and is perfect for generating `Fuel`:

``````-- | Generates a (relatively small) amount of Fuel
genFuel = genFuelN limit

-- | Generates up to a certain amount of Fuel
genFuelN :: Fuel -> Gen Fuel
genFuelN = Gen.inRange . to

-- | A reasonable default for procedures requiring Fuel. Small enough to keep
-- | exponentially-growing terms from blowing up.
limit :: Fuel
limit = 20

-- | We usually want Ranges from zero upwards
to :: Fuel -> Range Fuel
to = Range.between . (0,)``````
Generating recursive `Com` values…

Generating `Com` values is more tricky, since they are recursive. A naïve generator that simply calls itself recursively will make `Com` values of exponential size: either blowing up memory (if the recursive case `App` is likely to be chosen) or being limited to a handful of tiny values (if the recursive case is unlikely to be chosen). To generate a diverse spread of values, without risking out-of-memory conditions, we can use the same `Fuel` trick as above; this time dividing it up (unevenly) between recursive calls.

This is such a useful approach that I’ve implemented it for `QuickCheck`, `ScalaCheck`, `Hypothesis`, etc. over the years. In contrast, `falsify` provides it out-of-the-box! The `Gen.tree` function generates binary `Tree` values of a given size, so we just need to transform those into the `Com` values we need:

``````-- | Create a Com from a binary Tree (with unit values () at the nodes)
treeToCom :: Tree () -> Com
treeToCom t = case t of
-- The smallest Trees have one and two leaves. Turn them into K, since that
-- has the simplest step rule.
Leaf               -> k
Branch _ Leaf Leaf -> k

-- There are two Trees with three leaves. Turn those into S.
Branch _ Leaf (Branch _ Leaf Leaf) -> s
Branch _ (Branch _ Leaf Leaf) Leaf -> s

-- For larger Trees, we recurse on their children and combine using App
Branch _ l r -> App (treeToCom l) (treeToCom r)

-- | Generate a Com, with size bounded by the given Fuel
genComN :: Fuel -> Gen Com
genComN n = treeToCom <\$> Gen.tree (to n) (pure ())

-- | Generate (relatively small) Com values
genCom = genComN limit``````

`falsify` integrates with Haskell’s `tasty` test framework. Normally a project would declare a big test suite to run all at once, but for this literate/active style we’ll be testing things as we go, using the following function:

``````-- | Turn the given predicate into a falsify property, universally quantified
-- | over the given generator's outputs. Check it on (at least) 100 samples.
check :: Show a => (a -> Bool) -> Gen a -> IO ()
check pred gen = do
name <- getEnv "NAME" -- Avoids repetition
defaultMain (testProperty name (testGen (satisfies (name, pred)) gen))``````

### Included middles

The observant among you may have noticed that `normalEqToItself` does not hold for all argument values! `falsify` can generate a counterexample to show us why:

``main = check normalEqToItself (tuple2 genFuel genCom)``
``````normalEqToItself: FAIL (0.74s)
failed after 1 successful test and 13 shrinks
not (normalEqToItself generated)
generated: (0,KKK)

Logs for failed test run:

Use --falsify-replay=0173551014ce694c11cb95ebbef835b711 to replay.

1 out of 1 tests failed (0.79s)
``````

The precise counterexample `falsify` finds may vary depending on the random seed, but they’ll all have the following in common: the `Fuel` will be `0`, whilst the `Com` expression will not be in normal form. For example, it may produce `KKK` (the Haskell value `App (App k k) k`). `stepK` will reduce that to a single `K`, so whilst the value `KKK` is equal to itself (as we expect), `normalEq` wraps it in a `Later` constructor, which may cause `normalEqToItself` to give up for some `Fuel` parameters: in particular, when given `0` `Fuel`. This disproves our claim that the property holds for any amount of `Fuel`.

We could alter our claim to existentially quantify the amount of `Fuel`: that for any SK expression `x`, there is some value of `n` where `runDelayOr False (same <\$> normalEq x x) n` holds. However that is also false, since there are infinite loops which have no `Normal` form; hence never produce a `Now` value; and therefore cannot have a result extracted regardless of how much `Fuel` we use. Even when a `Normal` form does exist, there is no way to bound the amount of `Fuel` required to find it.

The correct way to fix our claim is to universally quantify `x` and `n` as before, but add two negations to our predicate: instead of claiming every expression is equal to itself (regardless of `Fuel`), we claim that every expression is not unequal to itself (regardless of `Fuel`):

``````notUnnormalEqToItself :: (Fuel, Com) -> Bool
notUnnormalEqToItself (n, x) = runDelayOr True (not . diff <\$> normalEq x x) n``````
``main = check notUnnormalEqToItself (tuple2 genFuel genCom)``
``````notUnnormalEqToItself: OK (0.33s)
100 successful tests

All 1 tests passed (0.33s)
``````

You may have learned in school that double-negatives are redundant: that anything “not false” must be “true” (the law of the excluded middle). However, we’re not dealing with the true/false ideals of classical logic, which ignore computation and its associated undecidability. All we have are the more pragmatic falsified/unfalsified results of a property checker, where there are non-excluded middles such as “don’t know”, “timed out” and “gave up”!

## Extensional equality

My problems arose when trying to extend the equality of SK expressions even further to include extensional equality. To explain how this works, we’ll need a bit more terminology…

### Inputs and agreement

To understand the behaviour of a particular SK expression, we can see what happens when it’s applied to other SK expressions. We’ll call the latter “input values”, to emphasise that they are not part of the original expression we’re focused on. For example, we can understand how the expression `KS` behaves by applying it to an input value like `S`: giving `KSS` (which reduces to `Normal` form `S`). Note that `KS` contains head `K` and argument `S`; whilst the resulting expression `KSS` also contains the input value `S` as an extra argument.

If we apply two expressions to the same input value, and the results reduce to the same `Normal` form, we say those two expressions “agree on” that input value. For example, `KK` and `SKK` agree on the input value `K`, since `KKK` reduces to `K`; and `SKKK` reduces to `KK(KK)` then to `K`. Note that they disagree on the input value `S`, since `KKS` reduces to `K`; whilst `SKKS` reduces to `KS(KS)` then to `S`. Expressions can also agree on a sequence of input values, where we apply them both to the first value, then apply those results to the second value, and so on. We’ll represent such sequences using a `Stream`, which is like a list except there is no “nil” case, so it goes on forever:

``````type InputValue  = Com
type InputValues = Stream InputValue

-- | Apply the given 'Com' expressions to more and more of the 'InputValues',
-- | checking whether they reach the same 'Normal' form.
agree :: Com -> Com -> InputValues -> Stream (Delay (Compared Normal))
agree f g (Cons x xs) = Cons (normalEq f g) (agree (App f x) (App g x) xs)``````

Everything that satisfies `normalEq` should also satisfy `agree`, which we can state with the following property:

``````normalEqImpliesAgree :: (Fuel, Com, Com, InputValues) -> Bool
normalEqImpliesAgree (n, f, g, xs) =
case runDelay n (tuple2 (normalEq f g) (sHead (agree f g xs))) of
Now (Same _, Diff _ _) -> False
_                      -> True

-- | Generate a 'Stream' of Com values, with element size bounded by 'Fuel'
genComsN :: Fuel -> Gen InputValues
genComsN fuel = Cons <\$> genComN fuel <*> genComsN fuel

-- | Generate (relatively small) lists of Com values
genComs :: Gen InputValues
genComs = genComsN limit``````
``main = check normalEqImpliesAgree (tuple4 genFuel genCom genCom genComs)``
``````normalEqImpliesAgree: OK (0.91s)
100 successful tests

All 1 tests passed (0.91s)
``````

We say expressions “agree on N inputs” when they agree on every sequence of N input values; i.e. the input values are universally-quantified, but the length of the sequence is not. For example, `SK` and `S(K(SK))(KK)` agree on two inputs; which we can test by asserting that they never disagree:

``````skNeverDisagreesWithSKSKKK :: (Fuel, InputValues) -> Bool
skNeverDisagreesWithSKSKKK (n, xs) =
runDelayOr True (not . diff <\$> sAt (2 + n) (agree f g xs)) n
where f = App s k
g = App (App s (App k (App s k))) (App k k)``````
``main = check skNeverDisagreesWithSKSKKK (tuple2 genFuel genComs)``
``````skNeverDisagreesWithSKSKKK: OK (10.23s)
100 successful tests

All 1 tests passed (10.24s)
``````

Note that any expressions which agree on N inputs also agree on N + 1 inputs (and so on), since the left child of each root has the same `Normal` form (by definition of agreement on N inputs).

``````agreementIsMonotonic :: ((Fuel, Fuel), (Com, Com), InputValues) -> Bool
agreementIsMonotonic ((n, m), (f, g), xs) =
case runDelay n (tuple2 (sAt  n      (agree f g xs))
(sAt (n + m) (agree f g xs))) of
Now (Same _, Diff _ _) -> False
_                      -> True``````
``````main = check agreementIsMonotonic (tuple3 (tuple2 genFuel genFuel)
(tuple2 genCom  genCom )
genComs)``````
``````agreementIsMonotonic: OK (7.92s)
100 successful tests

All 1 tests passed (7.92s)
``````

### Extensionality

Now we’ve defined inputs and agreement, extensional equality becomes quite simple: SK expressions are extensionally equal iff they agree on some number of inputs. We saw that `SK` and `S(K(SK))(KK)` agree on some number of inputs (namely: on two inputs), so they are extensionally equal. Expressions which are normally equivalent are also extensionally equal, since they agree on zero inputs.

Notice that the number of inputs is existentially-quantified, whilst the values of those inputs are universally-quantified. This makes extensional equality difficult to determine:

• If we compare two expressions on N input values and see that they agree, that doesn’t mean they’re extensionally equal; since there might be other input values of length N for which they disagree.
• If we compare two expressions on N input values and see that they disagree, that doesn’t mean they’re not extensionally equal; since they might only agree on inputs longer than N.

In order to make more definitive assertions about extensional equality, we need to borrow techniques from the world of formal methods!

## Symbolic execution

The reason I like property-checking is that it can provide a lot of confidence for relatively little effort, compared to e.g. example-based testing (less confidence for a similar effort), manual testing (less confidence for more effort) or formal verification (more confidence for more effort). However, there are occasions when the scales tip in favour of other approaches, and extensional equality checking turns out to be particularly suited to a formal method called symbolic execution.

In this approach, we avoid using real, “concrete” SK expressions for our input values; and instead use abstract symbols. These can be copied, discarded or rearranged during program execution; which is perfect for seeing where different inputs end up in the resulting `Normal` form. They are otherwise inert or “uninterpreted”: causing reduction to stop if it would depend on the particular value of an input.

Implementing symbolic execution can often be laborious, but our definition of `Com` actually makes it trivial to graft on top: we can represent abstract symbols using the `C` constructor applied to any `String` (except `"S"` or `"K"`, which we’re treating specially). Execution of the `stepK` and `stepS` functions will treat these as atomic symbols, to be copied, discarded and rearranged as needed.

Once we’ve normalised a `Com` value containing symbols, we need a semantics, or abstract interpretation, to understand what they mean. Since we’re using each symbol to represent a distinct input, the way they appear in `Normal` forms tells us something about the overall behaviour of the expression and, crucially, how it depends on the particular concrete `Inputs` it may be applied to. It’s undecidable whether part of an SK expression will influence its reduction (or just get discarded by the `stepK` rule), so we’ll limit our ambitions to understanding a few simple situations.

### Symbolic agreement proves extensional equality

Expressions which agree on symbolic inputs will agree on all inputs, since symbols do not reduce, and hence those inputs must have been irrelevant for the reductions which lead to the agreement. Unlike property-checking, where we rely on double-negatives to “fail to disprove agreement”, this is real positive proof that expressions will always agree, and hence we can claim definitively that they are extensionally equal.

We don’t need to implement this check specially, since we can reuse `agree`, just using symbolic values as our inputs instead of concrete SK expressions:

``````-- | Synonym to indicate when we're dealing with uninterpreted symbolic values
type Symbol = String

-- | All Symbols except "S", "K" and "" (which doesn't show well)
symbols :: Stream Symbol
symbols = sFilter keep allStrings
where keep s = s /= "" && s /= "S" && s /= "K"``````

The following `agreeSym` function applies two expressions to more and more symbolic inputs, to see if they reduce to the same `Normal` form:

``````-- | Checks whether two Com values agree on more and more symbolic input values.
agreeSym :: Com -> Com -> Stream (Delay (Compared Normal))
agreeSym f g = agree f g (C <\$> symbols)``````

The return type of `agreeSym` is a little awkward, since it’s “two dimensional”: travelling further along the `Stream` applies more and more symbolic inputs; travelling further along any of those `Delay` values applies more and more steps to that expression. To get at the results inside (assuming any of those expressions has a `Normal` form) we need a linear path which traverses this structure. We can’t use breadth-first search since the `Stream` never ends; we also can’t use depth-first search, since a `Delay` might never end. The following `race` function is a bit smarter: it acts like a round-robin scheduler, each iteration running more expressions until one of them finishes (if ever):

``````-- | Runs every 'Delay' element in an interleaved fashion until one of them
-- | produces a 'Now'. Returns that result, its index in the 'Stream', and a
-- | 'Stream' of the remaining 'Delay' elements.
race :: Stream (Delay a) -> Delay (a, Natural, Stream (Delay a))
race s = go [] [] s
where go ran run next = case (run, next) of
(        [], Cons z zs) ->        go []      (reverse (z:ran)) zs
(Later y:ys,        zs) -> Later (go (y:ran) ys                zs)
(  Now y:ys,        zs) ->   Now ( y
, fromIntegral (length ran)
, sPrefix (reverse ran ++ ys) zs
)``````
Checking `race`

The `race` algorithm is a bit tricky, so we’ll double-check it with a few property tests. First some helper functions:

``````-- | Wrap the given number of 'Later' constructors around '()'
wrapLater n | n <= 0 = Now ()
wrapLater n          = Later (wrapLater (n - 1))

-- | Count the number of 'Later' wrappers around a 'Now' value. Gives up if the
-- | given 'Fuel' runs out.
countLaters :: Integral n => Fuel -> Delay a -> Maybe n
countLaters = go 0
where go !n _ (  Now _) = Just n
go  _ 0        _  = Nothing
go !n m (Later x) = go (n + 1) (m - 1) x

countMatchesWrap (n, m) = countLaters (n + m) (wrapLater n) == Just n``````
``main = check countMatchesWrap (tuple2 genFuel genFuel)``
``````countMatchesWrap: OK (0.10s)
100 successful tests

All 1 tests passed (0.11s)
``````
``````runUndoesWrap n = case runDelay n (wrapLater n) of
Now _ -> True
_     -> False``````
``main = check runUndoesWrap genFuel``
``````runUndoesWrap: OK (0.15s)
100 successful tests

All 1 tests passed (0.16s)
``````
``runStops (n, m) = countLaters m (runDelay n (wrapLater (n + m)) ) == Just m``
``main = check runStops (tuple2 genFuel genFuel)``
``````runStops: OK (0.41s)
100 successful tests

All 1 tests passed (0.42s)
``````

Now on to `race` itself: it searches for a `Now` value using a triangular “flood fill” approach, illustrated below:

The numbers in the first column (i.e. which steps `race` allocates to running the first element of the `Stream`) are precisely the triangular numbers. We can check this, by counting the number of `Later` wrappers `race` produces for a `Stream` whose first element has a known number of `Later` wrappers, and whose other elements are never-ending loops (and hence don’t contribute any `Now` values of their own):

``````-- | Calculates the nth triangular number https://oeis.org/A000217
triangle :: (Integral n, Integral m) => n -> m
triangle = fromIntegral . go . fromIntegral . max 0
where go (n :: Natural) = (n * (n + 1)) `div` 2

loops :: Stream (Delay a)
loops = Cons loop loops

raceFirstUsesTriangularFuel :: Fuel -> Bool
raceFirstUsesTriangularFuel n =
countLaters tn (race (Cons (wrapLater n) loops)) == Just tn
where tn = triangle n``````
``main = check raceFirstUsesTriangularFuel genFuel``
``````raceFirstUsesTriangularFuel: OK (0.51s)
100 successful tests

All 1 tests passed (0.51s)
``````

At the other extreme, `race` will run the first step of element `n` just before it goes back to the first element; i.e. at step `triangle (n + 1) - 1` (this is OEIS A000096):

``````raceHitsNthElementBeforeNextTriangularNumber n =
countLaters x (race (prog n)) == Just x
where x      = triangle (n + 1) - 1
prog 0 = Cons (Now ()) loops
prog m = Cons loop     (prog (m - 1))``````
``main = check raceHitsNthElementBeforeNextTriangularNumber genFuel``
``````raceHitsNthElementBeforeNextTriangularNumber: OK (0.40s)
100 successful tests

All 1 tests passed (0.40s)
``````

We can use `race` to check whether two expressions ever agree on symbolic inputs, and therefore whether they’re extensionally equal:

``````-- | 'True' iff the given expressions ever agree for any number of inputs.
everAgree :: Com -> Com -> Delay Bool
everAgree x y = race (agreeSym x y) >>= go
where go (Same _  , _, _) = Now True
go (Diff _ _, _, s) = race s >>= go``````

Note that `everAgree` is more general than `normalEq`, since the latter only checks for agreement on 0 inputs; although it requires more `Fuel` to account for running it interleaved with 1 input, 2 inputs, etc. (see the fold-out section above for more details):

``````normalEqImpliesEverAgree :: (Fuel, Com, Com) -> Bool
normalEqImpliesEverAgree (n, x, y) =
if      go (same <\$> normalEq x y)  n
then go (        everAgree x y) (triangle n)
else True
where go = runDelayOr False``````
``main = check normalEqImpliesEverAgree (tuple3 genFuel genCom genCom)``
``````normalEqImpliesEverAgree: OK (0.45s)
100 successful tests

All 1 tests passed (0.46s)
``````

However, `everAgree` is not yet a predicate for checking extensional equality, since the `Compared` values returned by `agreeSym` don’t represent “equal/unequal”; only “equal/unsure”. Hence the `everAgree` function claims to return `Delay Bool`, which we can think of as “at most one boolean”. Yet its result is not really a boolean, since it can never be `False`! It would be more accurate to return a unit value `Now ()`, of type `Delay ()`. That type is isomorphic to the natural numbers, counting how long it took to prove equality. That can’t be a predicate, since it begs the question!

A useful predicate for extensional equality not only needs to return `True` for some inputs it’s sure are equal; but also return `False` for some inputs it’s sure are unequal. When it’s unsure (which is unavoidable due to undecidability), it can use a never-ending chain of `Later` wrappers to avoid ever returning at all!

### Different symbolic heads prove disagreement

We can easily show that two expressions disagree for N inputs, by checking if the Nth element of `agreeSym` results in `Diff`. To be sure they are not extensionally equal, we also need to prove they will disagree for all inputs longer than N.

A simple way to prove this is when the head of each expression is a different symbolic input. In that case, their `Normal` forms depend entirely on those inputs, so disagreement can be “forced” by choosing input values which reduce to different results. For example, say `Xab` is some concrete SK expression applied to two symbolic inputs, which happens to reduce to `aS`; hence its result depends on the first input, and is independent of the second. If some other expression `Yab` reduces those inputs to `bS`, it depends on the second input, and is independent of the first. They can be forced to disagree by choosing concrete input values in place of `a` and `b`, which reduce to values that are known to be unequal. In this example we could use input values `KS` and `KK`, which `X` reduces to `S` but `Y` reduces to `K`. This works even if both inputs appear in a result, e.g. `ab` can still be forced independently of `b`, by choosing a concrete expression for `a` which discards its argument.

This ability to “force” the result of expressions which use an input as their head can be used to extend disagreements from N inputs to all inputs M > N. To see this, note that the result of applying an expression to M > N inputs is equivalent to first applying it to N inputs (which we can force to result in any value we like), then applying that result to the remaining M − N inputs. We can choose those intermediate results to be expressions that consume the remaining M − N inputs (e.g. through repeated use of `K`) and then disagree with each other. ∎

The following function will spot when two expressions have different symbols in head position, and are hence provably unable to agree, even with more inputs:

``````-- | Return the head (left-most leaf) of the given Com

-- | Whether a String represents an uninterpreted Symbol (i.e. not S or K)
isSym :: String -> Bool
isSym s = s /= "S" && s /= "K"

-- | Whether the given Com values have distinct symbolic values in head position
distinctSymbolicHeads :: Com -> Com -> Bool
distinctSymbolicHeads x y = isSym hX && isSym hY && hX /= hY

We can perform a few sanity checks, to confirm that our argument above holds:

``````-- | Generate String values to represent uninterpreted symbolic values
genSymN :: Fuel -> Gen Symbol
genSymN n = (`sAt` symbols) <\$> genFuelN n

-- | Generate (relatively small) symbolic values
genSym :: Gen Symbol
genSym = genSymN limit

-- | Generate Com values which may also contain symbols
genSymComN :: Fuel -> Gen Com
genSymComN n = toSymCom <\$> Gen.tree (to n) (genSymN n)
where toSymCom t = case t of
Leaf                               -> k
Branch _ Leaf Leaf                 -> k
Branch _ Leaf (Branch _ Leaf Leaf) -> s
Branch s (Branch _ Leaf Leaf) Leaf -> C s
Branch _ l r                       -> App (toSymCom l) (toSymCom r)

-- | Generate (relatively small) Com values which may contain symbols
genSymCom = genSymComN limit

-- | Shouldn't matter which order we compare two Com values
distinctSymbolicHeadsCommutes :: (Com, Com) -> Bool
``main = check distinctSymbolicHeadsCommutes (tuple2 genSymCom genSymCom)``
``````distinctSymbolicHeadsCommutes: OK (1.11s)
100 successful tests

All 1 tests passed (1.11s)
``````

### Different numbers of arguments prove disagreement

Expressions whose heads are the same symbolic input, but applied to a different number of arguments, can also be forced to disagree. For example, if `Xab` reduces to `aSb`, and `Yab` reduces to `aS`, then `distinctSymbolicHeads` can’t be sure if they’re different. Yet we can still force them to disagree, by giving them three inputs (`a`, `b` and `c`):

• We choose a value for `a` which consumes two arguments and reduces to the second, such as `SK`.
• We give `b` the value `Kd` (where `d` can represent any expression).
• We leave `c` unconstrained.

`X` and `Y` will reduce these inputs as follows:

``````X(SK)(Kd)       c    |    Y(SK)(Kd)c        | Xabc and Yabc
SKS(Kd)       c    |      SKS    c        | Reduced per example
K (Kd)(S(Kd))c    |       K     c(Sc)    | Applied S rule to each
Kd        c    |             c        | Applied K rule to each
d             |             c        | Applied K rule again to first``````

Since `X` and `Y` reduce to different symbols, we can use this scheme to force their results to any value, including unequal `Normal` forms or (as above) expressions which consume any number of subsequent inputs then become unequal `Normal` forms. ∎

The following `unequalArgCount` function will spot when two expressions apply a symbol to an unequal number of arguments, and are hence sure to be extensionally unequal:

``````-- | Split a Com value into its head and any arguments that's applied to
headAndArgs :: Com -> (String, [Com])
headAndArgs (C x) = (x, [])
(h, args) -> (h, args ++ [r])

-- | True iff the given expressions have the same Symbol in their head, but
-- | applied to a different number of arguments.
unequalArgCount :: Com -> Com -> Bool
unequalArgCount x y = isSym headX
&& length argsX /= length argsY

Again, we can sanity-check this:

``````-- | Order of Com values shouldn't affect result
unequalArgCountCommutes :: (Com, Com) -> Bool
unequalArgCountCommutes (x, y) = unequalArgCount x y
== unequalArgCount y x``````
``main = check distinctSymbolicHeadsCommutes (tuple2 genSymCom genSymCom)``
``````distinctSymbolicHeadsCommutes: OK (1.06s)
100 successful tests

All 1 tests passed (1.06s)
``````

### Disagreeing arguments prove disagreement

Finally, we can force expressions to disagree by forcing a disagreement inside them, then propagating it up to the overall result. This is useful, since it can be achieved when each expression has the same `Symbol` at their heads (avoiding `distinctSymbolicHeads`), and they apply it to the same number of arguments (avoiding `unequalArgCount`); as long as we can force the values of those arguments to disagree.

For example, let’s say `Xabc` reduces to `ab` and `Yabc` reduces to `ac`: these have the same symbol `a` as their heads, and both apply it to a single argument. However, the values of those arguments (`b` and `c`) are provably unequal (via `distinctSymbolicHeads`, in this case). We can hence use the first input `a` to propagate its unequal argument, e.g. using the input value `SKK`. Since `X(SKK)bc` reduces to `b`, and `Y(SKK)bc` reduces to `c`, we can force a disagreement by choosing unequal values for `b` and `c`; and this extends to more inputs by having them consume the remainder, as before. ∎

The following function checks for the situation described above. It checks whether argument values are unequal using a given `unEq` parameter, which is more general and useful than if we hard-coded some specific check:

``````-- | Whether the given Com values have matching symbols in their heads, but
-- | applied to unequal values (determined by the given unEq function)
symbolGivenUnequalArgs :: (Com -> Com -> Bool) -> Com -> Com -> Bool
symbolGivenUnequalArgs unEq x y = isSym headX
&& any id (zipWith unEq' argsX argsY)
unEq' a b      = unEq a b || unEq b a  -- Compare both ways round

-- | The order of arguments shouldn't alter the result
symbolGivenUnequalArgsCommutes :: (Com -> Com -> Bool) -> Com -> Com -> Bool
symbolGivenUnequalArgsCommutes f x y = symbolGivenUnequalArgs f x y
== symbolGivenUnequalArgs f y x``````
Generating `unEq` functions to test with…

The `symbolGivenUnequalArgs` function doesn’t care how arguments are deemed to be unequal, so it lets the caller decide by passing in a function. The choice of function shouldn’t affect commutativity, so the property `symbolGivenUnequalArgsCommutes` is universally quantified over that choice; allowing a counterexample search to try many different functions.

This may seem surprising if you’ve not encountered it before, but we can generate a function value just as easily as a “data-like” value. For example, we can generate functions which look up their arguments in a (generated) lookup table:

``````genViaTable :: Eq a => Range Fuel -> Gen a -> Gen b -> Gen (a -> b)
genViaTable range genA genB = do
def   <- genB
pairs <- Gen.list range (tuple2 genA genB)
pure (maybe def id . (`lookup` pairs))``````

However, such ordinary function values are opaque “black boxes”, which makes them less useful for property-checking: firstly, it’s hard to `show` them (e.g. if they form part of a counterexample); and secondly it’s hard to shrink them (to look for “simpler” alternatives). Property-checkers avoid these problems by replacing ordinary functions `a -> b` with their own alternative, whose constructors ‘remember’ how to `show` and shrink themselves. In `falsify` this alternative is `Gen.Fun a b`, so we’ll need to convert the `Com -> Com -> Bool` argument of `symbolGivenUnequalArgsCommutes` into either `Fun Com (Fun Com Bool)` (curried form) or, equivalently, `Fun (Com, Com) Bool` (uncurried form). We’ll use the uncurried form, since the tuples will be handled automatically by type class instance resolution. The following helpers let `symbolGivenUnequalArgsCommutes` take a `Fun` as argument:

``````-- | Transform a higher-order function to take a Fun instead
liftFun :: ((a -> b) -> c) -> Gen.Fun a b -> c
liftFun f = f . Gen.applyFun

-- | Lift a function with binary argument to take an uncurried Fun instead
liftFun2 :: ((a -> b -> c) -> d) -> Gen.Fun (a, b) c -> d
liftFun2 f = liftFun (f . curry)``````

To check (the lifted version of) `symbolGivenUnequalArgsCommutes`, we need a generator of type `Gen (Fun (Com, Com) Bool)`. We can get one from `Gen.fun`, but that requires an instance of `Gen.Function (Com, Com)`. There’s an existing instance for tuples, but we still need to implement `Gen.Function Com` ourselves:

``````instance Gen.Function Com where
function gen = Gen.functionMap comToPre preToCom <\$> Gen.function gen``````

This is piggybacking on an existing instance (the `Gen.function gen` call at the end) by converting our `Com` values back-and-forth to another type that already implements `Gen.Function`. The type I’ve chosen is `[Maybe (Either Bool Symbol)]`, and the conversions are implemented by `comToPre` & `preToCom`:

``````-- | Prefix notation for expressions. `Nothing` represents an 'apply' operation.
type Prefix = [Maybe (Either Bool Symbol)]

-- | Converts expressions from "applicative style" Com to "prefix style" Prefix.
comToPre :: Com -> Prefix
comToPre x = case x of
-- Treat K and S specially, so they're more likely to be generated
_ | x == k -> [Just (Left False)]
_ | x == s -> [Just (Left True )]
-- Any other symbol
C c        -> [Just (Right c)]
-- Nothing acts like an apply operator for two expressions that follow it
App l r    -> [Nothing] ++ comToPre l ++ comToPre r

-- | Parses "prefix style" Prefix back into an "applicative style" Com.
preToCom :: Prefix -> Com
preToCom = fst . go          -- Discard any remaining suffix
where go []     = (k, [])  -- Makes preToCom total, but overlaps 'Left False'
go (x:xs) = case x of
Just (Left False) -> (k  , xs)
Just (Left True ) -> (s  , xs)
Just (Right c   ) -> (C c, xs)
Nothing           -> let (l, ys) = go xs
(r, zs) = go ys
in (App l r, zs)

comToPreRoundtrips :: Com -> Bool
comToPreRoundtrips c = preToCom (comToPre c) == c``````
``main = check comToPreRoundtrips genCom``
``````comToPreRoundtrips: OK (1.01s)
100 successful tests

All 1 tests passed (1.01s)
``````
``````preToComAlmostRoundtrips :: Prefix -> Bool
preToComAlmostRoundtrips p = comToPre (preToCom p') == p'
where p' = comToPre (preToCom p)  -- Extra roundtrip to make p "correct"

genPre :: Gen Prefix
genPre = Gen.list (to limit) genEntry
where genEntry = Gen.choose (Just <\$> genLeaf) (pure Nothing)
genLeaf  = Gen.choose (Left <\$> Gen.bool False) (Right <\$> genSym)``````
``main = check preToComAlmostRoundtrips genPre``
``````preToComAlmostRoundtrips: OK (0.11s)
100 successful tests

All 1 tests passed (0.11s)
``````

The `Prefix` type is a prefix form for expressions, as opposed to the “applicative form” of `Com`. They’re equivalent, but the structure of an expression is less obvious in prefix form, which is why it only appears in this hidden section. This is the encoding used in binary combinatory logic, although we’re allowing arbitrary `Symbol` values rather than just `S` and `K`.

The functions `preToCom` and `comToPre` don’t quite form an isomorphism. This is essentially due to `Prefix` being a prefix(-free) code:

• The empty list `[]` does not correspond to a particular `Com` value. It a valid prefix, since appending it with anything will either form a complete expression or another valid prefix; but that’s not so useful for `preToCom`, which might have only been given a finite list and needs to return a `Com`. In this case, it just returns `K` (arbitrarily).
• `Prefix` values may contain “too many” `Nothing` values. These act like `App`, indicating that the following expression is applied to the one after. That “following expression” may itself use `Nothing` to consume subsequent list elements, and so on; corresponding to the arbitrary nesting which `App` allows. However, the given `Prefix` may “run out” before specifying what those following elements should be! Again, that’s a perfectly good prefix, but not a complete value. When this happens, we will hit the same `[]` case as above, and simply return `K` (potentially many times, as the applications get “popped off the stack”).
• Expressions encoded as `Prefix` values are self-delimiting, meaning that the encoding itself tells us when to finish parsing: if the value we parsed was `Just _`, we have finished that expression; if the value is `Nothing` we can finish after parsing exactly two more expressions. Hence any elements appearing after a correctly-encoded expression will be completely ignored, resulting in the same `Com`.
• Our hacky use of `K` as a fallback for empty `Prefix` values introduces some ambiguity, since an expression like `KK` could be encoded as `[Nothing, Just (Left False), Just (Left False)]` (the “correct” way); or as `[Nothing, Just (Left False)]` (relying on the fallback behaviour to introduce another `K`); or indeed as `[Nothing]` (relying on the fallback for both `K` expressions). Hence the fallback behaviour should not be relied upon; also, it breaks when composing expressions together (since missing `K` values will only be introduced at the end of a list).
• There’s also some redundancy between `Left False`/`Left True` and `Right "K"`/`Right "S"`. This was a deliberate choice, to make `falsify` generate more `S` and `K` expressions.

These aren’t a problem for our use of `Prefix` in generating values; but you may need to keep them in mind when using this encoding for other purposes (in which case I’d recommend embracing its nature as a prefix(-free) code, since it has all sorts of cool applications!)

``````main = check (uncurry3 (liftFun2 symbolGivenUnequalArgsCommutes))
(tuple3 (Gen.fun (Gen.bool False)) genSymCom genSymCom)``````
``````symbolGivenUnequalArgsCommutes: OK (1.40s)
100 successful tests

All 1 tests passed (1.40s)
``````

### Combining disagreement provers

We can never spot all disagreements, but the simple checks above can be combined into a single, reasonably-useful function. Not only is this easier to use than the individual checks, but we can also use it as the `unEq` argument of the `symbolGivenUnequalArgs` function, which lets us recurse through the given expressions looking for disagreement at any depth!

``````provablyDisagree :: Com -> Com -> Bool
provablyDisagree x y = distinctSymbolicHeads                   x y
|| unequalArgCount                         x y
|| symbolGivenUnequalArgs provablyDisagree x y

provablyDisagreeCommutes :: (Com, Com) -> Bool
provablyDisagreeCommutes (x, y) = provablyDisagree x y == provablyDisagree y x``````
``main = check provablyDisagreeCommutes (tuple2 genSymCom genSymCom)``
``````provablyDisagreeCommutes: OK (0.94s)
100 successful tests

All 1 tests passed (0.94s)
``````

## An extensional equality test

Now we have ways to spot both extensional equality and inequality (in certain situations), we can fold them over the result of `agreeSym`. We’ll begin by extracting the numerical index provided by `race`, which should tell us how many inputs it takes for the given expressions to start agreeing:

``````-- | 'Just n' if the given expressions agree on 'n' inputs (they might also
-- | agree on fewer). 'Nothing' if it is determined that they won't agree; if
-- | that cannot be determined, the 'Delay' will never end.
extensionalInputs :: Com -> Com -> Delay (Maybe Natural)
extensionalInputs x y = race (agreeSym x y) >>= go 0
where go !n (Same _  , m, _) = Now (Just (n+m))
go !n (Diff a b, _, s) = if provablyDisagree (toCom a) (toCom b)
then Now Nothing
else race s >>= go (n+1)

agreeOnExtensionalInputs :: (Fuel, Com, Com, InputValues) -> Bool
agreeOnExtensionalInputs (n, x, y, inputs) =
runDelayOr True (extensionalInputs x y >>= check) n
where check Nothing  = Now True
check (Just i) = same <\$> sAt i (agree x y inputs)``````
``main = check agreeOnExtensionalInputs (tuple4 genFuel genCom genCom genComs)``
``````agreeOnExtensionalInputs: OK (0.66s)
100 successful tests

All 1 tests passed (0.67s)
``````

This is easy to transform into a legitimate predicate for testing extensional equality, that’s able to answer both `True` and `False` (at least, some of the time):

``````-- | Whether the given expressions are extensionally equal, i.e. cannot be
-- | distinguished by an SK expression.
extEq :: Com -> Com -> Delay Bool
extEq x y = isJust <\$> extensionalInputs x y``````

`extEq` is a generalisation of our previous checks:

``````extEqGeneralisesEqAndNormalEqAndEverAgree :: (Fuel, Com, Com) -> Bool
extEqGeneralisesEqAndNormalEqAndEverAgree (n, x, y) =
case ( go (            extEq x y)
, go (        everAgree x y)
, go (same <\$> normalEq x y)
,                  (==) x y) of
(Now False, Now True, _       , _   ) -> False
(Now False, _       , Now True, _   ) -> False
(Now False, _       , _       , True) -> False
_                                     -> True
where go = runDelay n``````
``````main = check extEqGeneralisesEqAndNormalEqAndEverAgree
(tuple3 genFuel genCom genCom)``````
``````extEqGeneralisesEqAndNormalEqAndEverAgree: OK (0.67s)
100 successful tests

All 1 tests passed (0.67s)
``````

## Next time

We’ve seen how extensional equality works in theory, part 3 explores these definitions in more depth, to find out where I was going wrong in my egglog.