SK logic in egglog: part 2, property-checking extensionality

Posted on by Chris Warburton

Update: part 3 is here

This is a spiritual follow-on from part 1 of my experiments with SK logic in egglog: that post serves as the motivation for this one, but the content here is somewhat standalone. It’s the result of a rabbit hole I entered, after getting results that were so unexpected and confusing that it made me question some of the assumptions I’d been making in those experiments.

I was modelling a small programming language called SK (combinatory) logic; and whilst I’m no stranger to being wrong about my code, I had chosen SK since it’s very simple and very familiar. Implementing SK is like writing “hello world”: something I’ve done countless times in all sorts of languages and paradigms, hence why it was an obvious choice when I wanted to learn egglog. In fact, combinatory logic even appears in egglog’s examples/ folder! However, an innocuous-looking one-line extension brought the whole thing crashing down: all structure vanished and every term collapsed into one big blob, where everything equalled everything else.

Note for non-logicians: That’s not good!

Despite several rewrites, trying various different encodings, and much scouring of egglog’s Rust source, I kept running into the same problem. This made me seriously doubt my understanding, particularly of free variables. If I could spend years immersed in the worlds of mathematics, computer science, post-graduate academia, and software development; yet fail to grasp such a fundamental aspect in all that time; then what else might I be wrong about? Am I a complete fraud?

Thankfully I also trained as a physicist, so I know what to do when theoretical understanding goes awry: experiment! Or, as software engineers call it: test! I wanted to thoroughly test the assumptions I was making in those egglog experiments, so I went crawling back to the comfort of Haskell where I could use my favourite hammer: property-based testing. This would also let me try out the new falsify package (which I’ve blogged about before): in essence, to try and “falsify myself”, to figure out what I’d misunderstood.

This page gives the background of what I was attempting; along with executable Haskell code, properties it should satisfy, and falsify checks of those properties. The latter are defined using active code.

The setup

I’ll explain some of this jargon below, but in short: that problematic line of code which broke everything was my attempt to implement extensional equality; the idea that if two things always behave in the same way, then we can consider them equal. To specify what “always” means, we need some notion of universal quantification: that doesn’t quite fit into the formal system of egglog, so my idea was to approximate universally-quantified variables using symbolic execution. This should be a conservative approach: failing to spot some equalities, but never mistakenly claiming two things to be equal. Instead, it seemed to do the extreme opposite: always claiming that everything is equal!

Preamble boilerplate…

Here’s the initial boilerplate for the script; it just needs GHC with falsify in its package database. For more details on how it’s executed, see this page’s Markdown source.

module Main (main) where

import Control.Applicative ((<|>))
import Control.Exception (assert)
import Data.Maybe (isJust)
import GHC.Natural (Natural)
import System.Environment (getEnv)
import Test.Falsify.Generator (Gen, Tree(..))
import qualified Test.Falsify.Generator as Gen
import Test.Falsify.Predicate (satisfies)
import Test.Falsify.Property (testGen)
import Test.Falsify.Range (Range)
import qualified Test.Falsify.Range as Range
import Test.Tasty (defaultMain)
import Test.Tasty.Falsify (testProperty)

-- Useful helpers. These are available in libraries, but I want this code to be
-- self-contained (other than falsify)

-- | Like 'uncurry', but for three arguments
uncurry3 :: (a -> b -> c -> d) -> (a, b, c) -> d
uncurry3 f (a, b, c) = f a b c

-- | Combine two results into a single tuple
tuple2 :: Applicative f => f a -> f b -> f (a, b)
tuple2 a b = (,) <$> a <*> b

-- | Combine three results into a single tuple
tuple3 :: Applicative f => f a -> f b -> f c -> f (a, b, c)
tuple3 a b c = (,,) <$> a <*> b <*> c

-- | Combine four results into a single tuple
tuple4 :: Applicative f => f a -> f b -> f c -> f d -> f (a, b, c, d)
tuple4 a b c d = (,,,) <$> a <*> b <*> c <*> d

-- | Like a list, but never ends.
data Stream a = Cons { sHead :: !a, sTail :: Stream a } deriving (Functor)

instance Show a => Show (Stream a) where
  show (Cons x _) = "[" ++ show x ++ ",...]"

sPrefix :: [a] -> Stream a -> Stream a
sPrefix []     ys = ys
sPrefix (x:xs) ys = Cons x (sPrefix xs ys)

-- | Drop elements from a 'Stream' which don't satisfy the given predicate.
sFilter :: (a -> Bool) -> Stream a -> Stream a
sFilter p (Cons x xs) = (if p x then Cons x else id) (sFilter p xs)

-- | Extract the nth element of a 'Stream'.
sAt :: Integral n => n -> Stream a -> a
sAt i (Cons x xs) = if i <= 0 then x else sAt (i - 1) xs

-- | All prefixes of the given Stream, e.g. [], [x], [x, y], [x, y, z], ...
sInits :: Stream a -> Stream [a]
sInits (Cons x xs) = Cons [] ((x:) <$> sInits xs)

-- | All String values, in order of length
allStrings :: Stream String
allStrings = go 0
  where strs 0 = [""]
        strs n = [c:s | c <- [minBound..maxBound], s <- strs (n - 1)]
        go   n = sPrefix (strs n) (go (n + 1))

-- | Like a list, but only stores data at the end. Useful for representing
-- | general recursion.
data Delay a = Now a | Later (Delay a) deriving (Eq, Functor, Ord, Show)

instance Applicative Delay where
  (Now   f) <*> x =        f <$> x
  (Later f) <*> x = Later (f <*> x)
  pure            = Now

instance Monad Delay where
  (Now   x) >>= f = f x
  (Later x) >>= f = Later (x >>= f)

-- | Unwrap (up to) a given number of 'Later' wrappers from a 'Delay' value.
runDelay :: Integral n => n -> Delay a -> Delay a
runDelay n x | n <= 0 = x
runDelay _ x@(Now _)  = x
runDelay n (Later x)  = runDelay (n - 1) x

-- | Try to extract a value from the given 'Delay', or use the given default
runDelayOr :: Integral n => a -> Delay a -> n -> a
runDelayOr def x n = case runDelay n x of
  Now   x' -> x'
  Later _  -> def

SK (combinatory logic)

Combinatory logic, which I’ll shamelessly abbreviate as “SK”, is a particularly simple programming language. It can be written entirely with two symbols, traditionally called S and K, along with parentheses for grouping symbols together. We’ll represent SK expressions using the following Haskell datatype called Com (this representation also matches my definition in egglog):

data Com = C String | App Com Com deriving (Eq, Ord)

That deriving clause asks Haskell to automatically implement some useful interfaces:

We’ll implement the Show interface ourselves, to render expressions in traditional SK notation:

instance Show Com where
  show (C c)             = c
  show (App x (App y z)) = show x ++ "(" ++ show (App y z) ++ ")"
  show (App x y)         = show x ++ show y

We represent the “primitives” S and K using the C constructor (annoyingly, value names in Haskell must begin with a lowercase letter; so we call these s and k instead):

-- | Primitive SK expressions
s, k :: Com
s = C "S"
k = C "K"

App groups expressions together two at a time, like a pair of parentheses. Traditional SK notation only shows parentheses “on the right”, e.g. App s (App s k) is written S(SK), but App (App s k) k is simply SKK. The following functions extract the left and right children of an App value (returning Maybe, in case they’re given a primitive expression):

left, right :: Com -> Maybe Com
left  (App x _) = Just x
left  _         = Nothing
right (App _ y) = Just y
right _         = Nothing

Anatomy of SK expressions

We can refer to various parts of an SK expression using the following terminology:

Examples can be seen in the figure below. We will often describe an expression as “applying 〈its head〉 to 〈its arguments〉”.

          𝔸𝕡𝕡
         ╔═╝─┐
        App  S
     ╔═══╝───┐
    App     App
   ╔═╝─┐   ┌─┴─┐
   𝑺   K  App  S
         ┌─┴─┐
         K   K
    𝔸𝕡𝕡
   ╔═╝─┐
  App  S
 ╔═╝─┐
 𝑲   K
Structure of SK expressions: the head is shown in 𝙞𝙩𝙖𝙡𝙞𝙘, the root is 𝕕𝕠𝕦𝕓𝕝𝕖-𝕤𝕥𝕣𝕦𝕔𝕜 and the spine is shown with a double═line. Top: Structure of the expression SK(KKS)S, represented by the Haskell value App (App (App s k) (App (App k k) s)) s. The head is 𝑺 and arguments are K, KKS and S. Bottom: Structure of that second argument, KKS; with head 𝑲 and arguments K and S. Leaf expressions (K and S) are their own root and head, with no spine or arguments.

Running SK expressions

We can “run” SK expressions using two rewriting rules:

We can implement these as Haskell functions, returning Nothing if the rule didn’t match:

-- | Replaces Kxy with x, otherwise 'Nothing'
stepK :: Com -> Maybe Com
stepK (App (App (C "K") x) _) = Just x
stepK _                       = Nothing

-- | Replaces Sxyz with xz(yz), otherwise 'Nothing'
stepS :: Com -> Maybe Com
stepS (App (App (App (C "S") x) y) z) = Just (App (App x z) (App y z))
stepS _                               = Nothing

Notice that arguments can be any Com value, represented using the Haskell variables x, y and z; but only App, s and k have any effect on the output (this will be important for symbolic execution later!).

Remarkably, these two rules make SK a universal (Turing-complete) programming language! The following step function tries to apply both of these rules. It also applies the rules recursively to sub-expressions of App: note that we try stepping both children at once, to prevent any reducible parts of an expression getting “stuck” waiting for their sibling to finish. Nothing is returned when neither rule matched and no sub-expressions were rewritten; we say the expression is in normal form:

-- | Attempt to reduce a K or S combinator, or the children of an 'App'.
-- | 'Nothing' if the argument is in normal form.
step :: Com -> Maybe Com
step c = stepK c <|> stepS c <|> app l' r' <|> app l' r <|> app l r'
  where app x y  = App <$> x <*> y
        (l , r ) = (left c    , right c   )
        (l', r') = (l >>= step, r >>= step)

Next we’ll need to iterate the step function until its argument reaches normal form. We’ll distinguish normalised expressions using a separate type called Normal:

-- | Wraps Com values which are assumed to be in normal form
newtype Normal = N Com deriving (Eq, Ord)

instance Show Normal where
  show = show . toCom

-- | Coerce a 'Normal' back to a 'Com', i.e. forget that it's in normal form.
toCom (N c) = c

-- | 'step' the given 'Com': if it worked, returns that in 'Left'; otherwise
-- | returns the original value (now 'Normal') in 'Right'.
toNormal :: Com -> Either Com Normal
toNormal c = case step c of
  Just c' -> Left     c'
  Nothing -> Right (N c)
Understanding Normal

The Normal type has the invariant that it only contains Com values which are in normal form. More specifically, we can say step . toCom is (extensionally) equal to const Nothing. Values of type Normal act as evidence that the Com they contain is in normal form (i.e. that it will return Nothing when given to step).

This invariant is not guaranteed to be the case; e.g. we can violate it by writing code like N (App (App k s) k) (since KSK reduces to S via the stepK rule). To prevent such violations, we avoid using the N constructor directly: instead always using the toNormal function, which only returns a Normal value when it is found to return Nothing from step (i.e. when our invariant holds).

toNormal is a smart constructor: a function which only returns values of the desired type when it’s checked they’re valid. Ideally we would enforce usage of toNormal, by encapsulating the N constructor, making the Normal type opaque. That’s usually achieved in Haskell using modules, but I want the code for this post to all live in one module (to make compilation and execution easier).

toNormal tells us when to stop iterating, but since SK is a universal programming language, there’s no way to know beforehand if it ever will. We’ll account for this by wrapping such undecidable computations in a Delay type:

-- | Step the given 'Com' until it reaches 'Normal' form.
reduce :: Com -> Delay Normal
reduce c = case toNormal c of
  Left c' -> Later (reduce c')
  Right n -> Now n

Normal equivalence

The rules of SK are confluent, meaning that every expression has at most one Normal form, and applying the reduction rules in any order, to any parts of an expression, will not change its Normal form or the ability to reach it using those rules. As a consequence, all SK expressions with the same Normal form are equivalent and interchangable.

To see this, consider two expressions X and Y with the same Normal form N. Whenever X occurs inside a larger expression, confluence allows us to replace X with N without affecting the overall result. Now imagine we instead replace X with Y: whatever that gives us, it will be unaffected (thanks to confluence) if we reduce Y to its Normal form. Since the Normal form of Y is also N, and we already deduced that using N is equivalent to using X, the result of using Y must also be equivalent to using X. Hence any expressions with the same Normal form are equivalent, in the sense that they can be interchanged inside any expression without altering its Normal form.

We’ll call this relationship “Normal equivalence”. It’s undecidable in general, so it also gets wrapped in Delay:

-- | Result of comparing two values
data Compared a = Same a | Diff a a deriving (Show)

same, diff :: Compared a -> Bool
same (Same _)   = True
same _          = False
diff (Diff _ _) = True
diff _          = False

-- | Uses (==) to find identical values
comparer :: Eq a => a -> a -> Compared a
comparer x y = if x == y then Same x else Diff x y

-- | Compare the 'Normal' forms of the given 'Com' values.
normalEq :: Com -> Com -> Delay (Compared Normal)
normalEq x y = comparer <$> reduce x <*> reduce y

Property-based testing

Let’s test this normalEq function, to see whether it actually behaves in the way our theory of SK predicts it should. For example, every equality relation should have the “reflexive property”, meaning that values should be equal to themselves; so any call like normalEq foo foo should always result in True. Unfortunately, such results are buried in a Delay which prevents us testing them directly. Delay values could be a never-ending chain of Later wrappers, like the following:

loop :: Delay a
loop = Later loop

Hence any attempt to extract a result from them must eventually give up. We’ll work around this using a parameter that counts down until we give up; an approach known as fuel:

-- | Represents a parameter for "when to give up"
type Fuel = Word

Now we can write a predicate (a function returning a boolean) to check whether its argument is normalEq to itself, before its Fuel runs out:

normalEqToItself :: (Fuel, Com) -> Bool
normalEqToItself (n, x) = runDelayOr False (same <$> normalEq x x) n

We can use this predicate to make statements, by quantifying its argument. Tests commonly use existential quantification, which asserts that some argument satisfies normalEqToItself (AKA “example-based” testing). For example:

main = assert (normalEqToItself (0, k)) (putStrLn "PASS")
PASS

However, that isn’t really what we want to say: reflexivity doesn’t just apply to a few hand-picked examples, it says that every argument satisfies normalEqToItself. Talking about “every argument” is universal quantification. Universally-quantified predicates are called “properties”, so this approach is known as property-based testing.

To test a property, we give it to a “property checker” which searches for counterexamples: argument values which cause the assertion to fail. If no counterexample can be found, the test passes. Search techniques range from simple enumeration all the way up to sophisticated AI algorithms. Haskell has many property checkers, beginning with the wonderful QuickCheck package. We’ll use the state-of-the-art for 2024, which is falsify.

Data generators

falsify searches through argument values at random, sampling them from a given “generator” with the Haskell type Gen a (for generating values of some type a). We can build up generators using familiar interfaces (Applicative, Monad, etc.); from primitives such as Gen.inRange, which samples fixnums from a Range and is perfect for generating Fuel:

-- | Generates a (relatively small) amount of Fuel
genFuel = genFuelN limit

-- | Generates up to a certain amount of Fuel
genFuelN :: Fuel -> Gen Fuel
genFuelN = Gen.inRange . to

-- | A reasonable default for procedures requiring Fuel. Small enough to keep
-- | exponentially-growing terms from blowing up.
limit :: Fuel
limit = 20

-- | We usually want Ranges from zero upwards
to :: Fuel -> Range Fuel
to = Range.between . (0,)
Generating recursive Com values…

Generating Com values is more tricky, since they are recursive. A naïve generator that simply calls itself recursively will make Com values of exponential size: either blowing up memory (if the recursive case App is likely to be chosen) or being limited to a handful of tiny values (if the recursive case is unlikely to be chosen). To generate a diverse spread of values, without risking out-of-memory conditions, we can use the same Fuel trick as above; this time dividing it up (unevenly) between recursive calls.

This is such a useful approach that I’ve implemented it for QuickCheck, ScalaCheck, Hypothesis, etc. over the years. In contrast, falsify provides it out-of-the-box! The Gen.tree function generates binary Tree values of a given size, so we just need to transform those into the Com values we need:

-- | Create a Com from a binary Tree (with unit values () at the nodes)
treeToCom :: Tree () -> Com
treeToCom t = case t of
  -- The smallest Trees have one and two leaves. Turn them into K, since that
  -- has the simplest step rule.
  Leaf               -> k
  Branch _ Leaf Leaf -> k

  -- There are two Trees with three leaves. Turn those into S.
  Branch _ Leaf (Branch _ Leaf Leaf) -> s
  Branch _ (Branch _ Leaf Leaf) Leaf -> s

  -- For larger Trees, we recurse on their children and combine using App
  Branch _ l r -> App (treeToCom l) (treeToCom r)

-- | Generate a Com, with size bounded by the given Fuel
genComN :: Fuel -> Gen Com
genComN n = treeToCom <$> Gen.tree (to n) (pure ())

-- | Generate (relatively small) Com values
genCom = genComN limit

falsify integrates with Haskell’s tasty test framework. Normally a project would declare a big test suite to run all at once, but for this literate/active style we’ll be testing things as we go, using the following function:

-- | Turn the given predicate into a falsify property, universally quantified
-- | over the given generator's outputs. Check it on (at least) 100 samples.
check :: Show a => (a -> Bool) -> Gen a -> IO ()
check pred gen = do
  name <- getEnv "NAME" -- Avoids repetition
  defaultMain (testProperty name (testGen (satisfies (name, pred)) gen))

Included middles

The observant among you may have noticed that normalEqToItself does not hold for all argument values! falsify can generate a counterexample to show us why:

main = check normalEqToItself (tuple2 genFuel genCom)
normalEqToItself: FAIL (0.74s)
  failed after 1 successful test and 13 shrinks
  not (normalEqToItself generated)
  generated: (0,KKK)
  
  Logs for failed test run:
  
  Use --falsify-replay=0173551014ce694c11cb95ebbef835b711 to replay.

1 out of 1 tests failed (0.79s)

The precise counterexample falsify finds may vary depending on the random seed, but they’ll all have the following in common: the Fuel will be 0, whilst the Com expression will not be in normal form. For example, it may produce KKK (the Haskell value App (App k k) k). stepK will reduce that to a single K, so whilst the value KKK is equal to itself (as we expect), normalEq wraps it in a Later constructor, which may cause normalEqToItself to give up for some Fuel parameters: in particular, when given 0 Fuel. This disproves our claim that the property holds for any amount of Fuel.

We could alter our claim to existentially quantify the amount of Fuel: that for any SK expression x, there is some value of n where runDelayOr False (same <$> normalEq x x) n holds. However that is also false, since there are infinite loops which have no Normal form; hence never produce a Now value; and therefore cannot have a result extracted regardless of how much Fuel we use. Even when a Normal form does exist, there is no way to bound the amount of Fuel required to find it.

The correct way to fix our claim is to universally quantify x and n as before, but add two negations to our predicate: instead of claiming every expression is equal to itself (regardless of Fuel), we claim that every expression is not unequal to itself (regardless of Fuel):

notUnnormalEqToItself :: (Fuel, Com) -> Bool
notUnnormalEqToItself (n, x) = runDelayOr True (not . diff <$> normalEq x x) n
main = check notUnnormalEqToItself (tuple2 genFuel genCom)
notUnnormalEqToItself: OK (0.33s)
  100 successful tests

All 1 tests passed (0.33s)

You may have learned in school that double-negatives are redundant: that anything “not false” must be “true” (the law of the excluded middle). However, we’re not dealing with the true/false ideals of classical logic, which ignore computation and its associated undecidability. All we have are the more pragmatic falsified/unfalsified results of a property checker, where there are non-excluded middles such as “don’t know”, “timed out” and “gave up”!

Extensional equality

My problems arose when trying to extend the equality of SK expressions even further to include extensional equality. To explain how this works, we’ll need a bit more terminology…

Inputs and agreement

To understand the behaviour of a particular SK expression, we can see what happens when it’s applied to other SK expressions. We’ll call the latter “input values”, to emphasise that they are not part of the original expression we’re focused on. For example, we can understand how the expression KS behaves by applying it to an input value like S: giving KSS (which reduces to Normal form S). Note that KS contains head K and argument S; whilst the resulting expression KSS also contains the input value S as an extra argument.

If we apply two expressions to the same input value, and the results reduce to the same Normal form, we say those two expressions “agree on” that input value. For example, KK and SKK agree on the input value K, since KKK reduces to K; and SKKK reduces to KK(KK) then to K. Note that they disagree on the input value S, since KKS reduces to K; whilst SKKS reduces to KS(KS) then to S. Expressions can also agree on a sequence of input values, where we apply them both to the first value, then apply those results to the second value, and so on. We’ll represent such sequences using a Stream, which is like a list except there is no “nil” case, so it goes on forever:

type InputValue  = Com
type InputValues = Stream InputValue

-- | Apply the given 'Com' expressions to more and more of the 'InputValues',
-- | checking whether they reach the same 'Normal' form.
agree :: Com -> Com -> InputValues -> Stream (Delay (Compared Normal))
agree f g (Cons x xs) = Cons (normalEq f g) (agree (App f x) (App g x) xs)

Everything that satisfies normalEq should also satisfy agree, which we can state with the following property:

normalEqImpliesAgree :: (Fuel, Com, Com, InputValues) -> Bool
normalEqImpliesAgree (n, f, g, xs) =
  case runDelay n (tuple2 (normalEq f g) (sHead (agree f g xs))) of
    Now (Same _, Diff _ _) -> False
    _                      -> True

-- | Generate a 'Stream' of Com values, with element size bounded by 'Fuel'
genComsN :: Fuel -> Gen InputValues
genComsN fuel = Cons <$> genComN fuel <*> genComsN fuel

-- | Generate (relatively small) lists of Com values
genComs :: Gen InputValues
genComs = genComsN limit
main = check normalEqImpliesAgree (tuple4 genFuel genCom genCom genComs)
normalEqImpliesAgree: OK (0.91s)
  100 successful tests

All 1 tests passed (0.91s)

We say expressions “agree on N inputs” when they agree on every sequence of N input values; i.e. the input values are universally-quantified, but the length of the sequence is not. For example, SK and S(K(SK))(KK) agree on two inputs; which we can test by asserting that they never disagree:

skNeverDisagreesWithSKSKKK :: (Fuel, InputValues) -> Bool
skNeverDisagreesWithSKSKKK (n, xs) =
    runDelayOr True (not . diff <$> sAt (2 + n) (agree f g xs)) n
  where f = App s k
        g = App (App s (App k (App s k))) (App k k)
main = check skNeverDisagreesWithSKSKKK (tuple2 genFuel genComs)
skNeverDisagreesWithSKSKKK: OK (10.23s)
  100 successful tests

All 1 tests passed (10.24s)

Note that any expressions which agree on N inputs also agree on N + 1 inputs (and so on), since the left child of each root has the same Normal form (by definition of agreement on N inputs).

agreementIsMonotonic :: ((Fuel, Fuel), (Com, Com), InputValues) -> Bool
agreementIsMonotonic ((n, m), (f, g), xs) =
  case runDelay n (tuple2 (sAt  n      (agree f g xs))
                          (sAt (n + m) (agree f g xs))) of
    Now (Same _, Diff _ _) -> False
    _                      -> True
main = check agreementIsMonotonic (tuple3 (tuple2 genFuel genFuel)
                                          (tuple2 genCom  genCom )
                                          genComs)
agreementIsMonotonic: OK (7.92s)
  100 successful tests

All 1 tests passed (7.92s)

Extensionality

Now we’ve defined inputs and agreement, extensional equality becomes quite simple: SK expressions are extensionally equal iff they agree on some number of inputs. We saw that SK and S(K(SK))(KK) agree on some number of inputs (namely: on two inputs), so they are extensionally equal. Expressions which are normally equivalent are also extensionally equal, since they agree on zero inputs.

Notice that the number of inputs is existentially-quantified, whilst the values of those inputs are universally-quantified. This makes extensional equality difficult to determine:

In order to make more definitive assertions about extensional equality, we need to borrow techniques from the world of formal methods!

Symbolic execution

The reason I like property-checking is that it can provide a lot of confidence for relatively little effort, compared to e.g. example-based testing (less confidence for a similar effort), manual testing (less confidence for more effort) or formal verification (more confidence for more effort). However, there are occasions when the scales tip in favour of other approaches, and extensional equality checking turns out to be particularly suited to a formal method called symbolic execution.

In this approach, we avoid using real, “concrete” SK expressions for our input values; and instead use abstract symbols. These can be copied, discarded or rearranged during program execution; which is perfect for seeing where different inputs end up in the resulting Normal form. They are otherwise inert or “uninterpreted”: causing reduction to stop if it would depend on the particular value of an input.

Implementing symbolic execution can often be laborious, but our definition of Com actually makes it trivial to graft on top: we can represent abstract symbols using the C constructor applied to any String (except "S" or "K", which we’re treating specially). Execution of the stepK and stepS functions will treat these as atomic symbols, to be copied, discarded and rearranged as needed.

Once we’ve normalised a Com value containing symbols, we need a semantics, or abstract interpretation, to understand what they mean. Since we’re using each symbol to represent a distinct input, the way they appear in Normal forms tells us something about the overall behaviour of the expression and, crucially, how it depends on the particular concrete Inputs it may be applied to. It’s undecidable whether part of an SK expression will influence its reduction (or just get discarded by the stepK rule), so we’ll limit our ambitions to understanding a few simple situations.

Symbolic agreement proves extensional equality

Expressions which agree on symbolic inputs will agree on all inputs, since symbols do not reduce, and hence those inputs must have been irrelevant for the reductions which lead to the agreement. Unlike property-checking, where we rely on double-negatives to “fail to disprove agreement”, this is real positive proof that expressions will always agree, and hence we can claim definitively that they are extensionally equal.

We don’t need to implement this check specially, since we can reuse agree, just using symbolic values as our inputs instead of concrete SK expressions:

-- | Synonym to indicate when we're dealing with uninterpreted symbolic values
type Symbol = String

-- | All Symbols except "S", "K" and "" (which doesn't show well)
symbols :: Stream Symbol
symbols = sFilter keep allStrings
  where keep s = s /= "" && s /= "S" && s /= "K"

The following agreeSym function applies two expressions to more and more symbolic inputs, to see if they reduce to the same Normal form:

-- | Checks whether two Com values agree on more and more symbolic input values.
agreeSym :: Com -> Com -> Stream (Delay (Compared Normal))
agreeSym f g = agree f g (C <$> symbols)

The return type of agreeSym is a little awkward, since it’s “two dimensional”: travelling further along the Stream applies more and more symbolic inputs; travelling further along any of those Delay values applies more and more steps to that expression. To get at the results inside (assuming any of those expressions has a Normal form) we need a linear path which traverses this structure. We can’t use breadth-first search since the Stream never ends; we also can’t use depth-first search, since a Delay might never end. The following race function is a bit smarter: it acts like a round-robin scheduler, each iteration running more expressions until one of them finishes (if ever):

-- | Runs every 'Delay' element in an interleaved fashion until one of them
-- | produces a 'Now'. Returns that result, its index in the 'Stream', and a
-- | 'Stream' of the remaining 'Delay' elements.
race :: Stream (Delay a) -> Delay (a, Natural, Stream (Delay a))
race s = go [] [] s
  where go ran run next = case (run, next) of
          (        [], Cons z zs) ->        go []      (reverse (z:ran)) zs
          (Later y:ys,        zs) -> Later (go (y:ran) ys                zs)
          (  Now y:ys,        zs) ->   Now ( y
                                           , fromIntegral (length ran)
                                           , sPrefix (reverse ran ++ ys) zs
                                           )
Checking race

The race algorithm is a bit tricky, so we’ll double-check it with a few property tests. First some helper functions:

-- | Wrap the given number of 'Later' constructors around '()'
wrapLater n | n <= 0 = Now ()
wrapLater n          = Later (wrapLater (n - 1))

-- | Count the number of 'Later' wrappers around a 'Now' value. Gives up if the
-- | given 'Fuel' runs out.
countLaters :: Integral n => Fuel -> Delay a -> Maybe n
countLaters = go 0
  where go !n _ (  Now _) = Just n
        go  _ 0        _  = Nothing
        go !n m (Later x) = go (n + 1) (m - 1) x

countMatchesWrap (n, m) = countLaters (n + m) (wrapLater n) == Just n
main = check countMatchesWrap (tuple2 genFuel genFuel)
countMatchesWrap: OK (0.10s)
  100 successful tests

All 1 tests passed (0.11s)
runUndoesWrap n = case runDelay n (wrapLater n) of
  Now _ -> True
  _     -> False
main = check runUndoesWrap genFuel
runUndoesWrap: OK (0.15s)
  100 successful tests

All 1 tests passed (0.16s)
runStops (n, m) = countLaters m (runDelay n (wrapLater (n + m)) ) == Just m
main = check runStops (tuple2 genFuel genFuel)
runStops: OK (0.41s)
  100 successful tests

All 1 tests passed (0.42s)

Now on to race itself: it searches for a Now value using a triangular “flood fill” approach, illustrated below:

     ║Cons│Cons│Cons│Cons│Cons│
═════╬════╪════╪════╪════╪════╪┄
Later║  0 │  2 │  5 │  9 │ 14 │
─────╫────┼────┼────┼────┼────┼┄
Later║  1 │  4 │  8 │ 13 │ 19 │
─────╫────┼────┼────┼────┼────┼┄
Later║  3 │  7 │ 12 │ 18 │ 25 │
─────╫────┼────┼────┼────┼────┼┄
Later║  6 │ 11 │ 17 │ 24 │ 32 │
─────╫────┼────┼────┼────┼────┼┄
Later║ 10 │ 16 │ 23 │ 31 │ 40 │
─────╫────┼────┼────┼────┼────┼┄
     ┆    ┆    ┆    ┆    ┆    ┆
The order that race searches a given Stream of Delay values, looking for a Now. Columns are elements of the Stream, starting on the left; rows are steps into those Delay values, starting at the top. The search begins at the top left (the outer constructor of the first element) and proceeds in diagonal stripes: after the top row is reached, the next stripe begins in the left column.

The numbers in the first column (i.e. which steps race allocates to running the first element of the Stream) are precisely the triangular numbers. We can check this, by counting the number of Later wrappers race produces for a Stream whose first element has a known number of Later wrappers, and whose other elements are never-ending loops (and hence don’t contribute any Now values of their own):

-- | Calculates the nth triangular number https://oeis.org/A000217
triangle :: (Integral n, Integral m) => n -> m
triangle = fromIntegral . go . fromIntegral . max 0
  where go (n :: Natural) = (n * (n + 1)) `div` 2

loops :: Stream (Delay a)
loops = Cons loop loops

raceFirstUsesTriangularFuel :: Fuel -> Bool
raceFirstUsesTriangularFuel n =
    countLaters tn (race (Cons (wrapLater n) loops)) == Just tn
  where tn = triangle n
main = check raceFirstUsesTriangularFuel genFuel
raceFirstUsesTriangularFuel: OK (0.51s)
  100 successful tests

All 1 tests passed (0.51s)

At the other extreme, race will run the first step of element n just before it goes back to the first element; i.e. at step triangle (n + 1) - 1 (this is OEIS A000096):

raceHitsNthElementBeforeNextTriangularNumber n =
    countLaters x (race (prog n)) == Just x
  where x      = triangle (n + 1) - 1
        prog 0 = Cons (Now ()) loops
        prog m = Cons loop     (prog (m - 1))
main = check raceHitsNthElementBeforeNextTriangularNumber genFuel
raceHitsNthElementBeforeNextTriangularNumber: OK (0.40s)
  100 successful tests

All 1 tests passed (0.40s)

We can use race to check whether two expressions ever agree on symbolic inputs, and therefore whether they’re extensionally equal:

-- | 'True' iff the given expressions ever agree for any number of inputs.
everAgree :: Com -> Com -> Delay Bool
everAgree x y = race (agreeSym x y) >>= go
  where go (Same _  , _, _) = Now True
        go (Diff _ _, _, s) = race s >>= go

Note that everAgree is more general than normalEq, since the latter only checks for agreement on 0 inputs; although it requires more Fuel to account for running it interleaved with 1 input, 2 inputs, etc. (see the fold-out section above for more details):

normalEqImpliesEverAgree :: (Fuel, Com, Com) -> Bool
normalEqImpliesEverAgree (n, x, y) =
    if      go (same <$> normalEq x y)  n
       then go (        everAgree x y) (triangle n)
       else True
  where go = runDelayOr False
main = check normalEqImpliesEverAgree (tuple3 genFuel genCom genCom)
normalEqImpliesEverAgree: OK (0.45s)
  100 successful tests

All 1 tests passed (0.46s)

However, everAgree is not yet a predicate for checking extensional equality, since the Compared values returned by agreeSym don’t represent “equal/unequal”; only “equal/unsure”. Hence the everAgree function claims to return Delay Bool, which we can think of as “at most one boolean”. Yet its result is not really a boolean, since it can never be False! It would be more accurate to return a unit value Now (), of type Delay (). That type is isomorphic to the natural numbers, counting how long it took to prove equality. That can’t be a predicate, since it begs the question!

A useful predicate for extensional equality not only needs to return True for some inputs it’s sure are equal; but also return False for some inputs it’s sure are unequal. When it’s unsure (which is unavoidable due to undecidability), it can use a never-ending chain of Later wrappers to avoid ever returning at all!

Different symbolic heads prove disagreement

We can easily show that two expressions disagree for N inputs, by checking if the Nth element of agreeSym results in Diff. To be sure they are not extensionally equal, we also need to prove they will disagree for all inputs longer than N.

A simple way to prove this is when the head of each expression is a different symbolic input. In that case, their Normal forms depend entirely on those inputs, so disagreement can be “forced” by choosing input values which reduce to different results. For example, say Xab is some concrete SK expression applied to two symbolic inputs, which happens to reduce to aS; hence its result depends on the first input, and is independent of the second. If some other expression Yab reduces those inputs to bS, it depends on the second input, and is independent of the first. They can be forced to disagree by choosing concrete input values in place of a and b, which reduce to values that are known to be unequal. In this example we could use input values KS and KK, which X reduces to S but Y reduces to K. This works even if both inputs appear in a result, e.g. ab can still be forced independently of b, by choosing a concrete expression for a which discards its argument.

This ability to “force” the result of expressions which use an input as their head can be used to extend disagreements from N inputs to all inputs M > N. To see this, note that the result of applying an expression to M > N inputs is equivalent to first applying it to N inputs (which we can force to result in any value we like), then applying that result to the remaining M − N inputs. We can choose those intermediate results to be expressions that consume the remaining M − N inputs (e.g. through repeated use of K) and then disagree with each other. ∎

The following function will spot when two expressions have different symbols in head position, and are hence provably unable to agree, even with more inputs:

-- | Return the head (left-most leaf) of the given Com
headPos :: Com -> Symbol
headPos (C c)     = c
headPos (App l r) = headPos l

-- | Whether a String represents an uninterpreted Symbol (i.e. not S or K)
isSym :: String -> Bool
isSym s = s /= "S" && s /= "K"

-- | Whether the given Com values have distinct symbolic values in head position
distinctSymbolicHeads :: Com -> Com -> Bool
distinctSymbolicHeads x y = isSym hX && isSym hY && hX /= hY
  where (hX, hY) = (headPos x, headPos y)

We can perform a few sanity checks, to confirm that our argument above holds:

-- | Generate String values to represent uninterpreted symbolic values
genSymN :: Fuel -> Gen Symbol
genSymN n = (`sAt` symbols) <$> genFuelN n

-- | Generate (relatively small) symbolic values
genSym :: Gen Symbol
genSym = genSymN limit

-- | Generate Com values which may also contain symbols
genSymComN :: Fuel -> Gen Com
genSymComN n = toSymCom <$> Gen.tree (to n) (genSymN n)
  where toSymCom t = case t of
          Leaf                               -> k
          Branch _ Leaf Leaf                 -> k
          Branch _ Leaf (Branch _ Leaf Leaf) -> s
          Branch s (Branch _ Leaf Leaf) Leaf -> C s
          Branch _ l r                       -> App (toSymCom l) (toSymCom r)

-- | Generate (relatively small) Com values which may contain symbols
genSymCom = genSymComN limit

-- | Shouldn't matter which order we compare two Com values
distinctSymbolicHeadsCommutes :: (Com, Com) -> Bool
distinctSymbolicHeadsCommutes (x, y) = distinctSymbolicHeads x y
                                    == distinctSymbolicHeads y x
main = check distinctSymbolicHeadsCommutes (tuple2 genSymCom genSymCom)
distinctSymbolicHeadsCommutes: OK (1.11s)
  100 successful tests

All 1 tests passed (1.11s)

Different numbers of arguments prove disagreement

Expressions whose heads are the same symbolic input, but applied to a different number of arguments, can also be forced to disagree. For example, if Xab reduces to aSb, and Yab reduces to aS, then distinctSymbolicHeads can’t be sure if they’re different. Yet we can still force them to disagree, by giving them three inputs (a, b and c):

X and Y will reduce these inputs as follows:

X(SK)(Kd)       c    |    Y(SK)(Kd)c        | Xabc and Yabc
  SKS(Kd)       c    |      SKS    c        | Reduced per example
   K (Kd)(S(Kd))c    |       K     c(Sc)    | Applied S rule to each
      Kd        c    |             c        | Applied K rule to each
       d             |             c        | Applied K rule again to first

Since X and Y reduce to different symbols, we can use this scheme to force their results to any value, including unequal Normal forms or (as above) expressions which consume any number of subsequent inputs then become unequal Normal forms. ∎

The following unequalArgCount function will spot when two expressions apply a symbol to an unequal number of arguments, and are hence sure to be extensionally unequal:

-- | Split a Com value into its head and any arguments that's applied to
headAndArgs :: Com -> (String, [Com])
headAndArgs (C x) = (x, [])
headAndArgs (App l r) = case headAndArgs l of
  (h, args) -> (h, args ++ [r])

-- | True iff the given expressions have the same Symbol in their head, but
-- | applied to a different number of arguments.
unequalArgCount :: Com -> Com -> Bool
unequalArgCount x y = isSym headX
                   && headX == headY
                   && length argsX /= length argsY
  where (headX, argsX) = headAndArgs x
        (headY, argsY) = headAndArgs y

Again, we can sanity-check this:

-- | Order of Com values shouldn't affect result
unequalArgCountCommutes :: (Com, Com) -> Bool
unequalArgCountCommutes (x, y) = unequalArgCount x y
                              == unequalArgCount y x
main = check distinctSymbolicHeadsCommutes (tuple2 genSymCom genSymCom)
distinctSymbolicHeadsCommutes: OK (1.06s)
  100 successful tests

All 1 tests passed (1.06s)

Disagreeing arguments prove disagreement

Finally, we can force expressions to disagree by forcing a disagreement inside them, then propagating it up to the overall result. This is useful, since it can be achieved when each expression has the same Symbol at their heads (avoiding distinctSymbolicHeads), and they apply it to the same number of arguments (avoiding unequalArgCount); as long as we can force the values of those arguments to disagree.

For example, let’s say Xabc reduces to ab and Yabc reduces to ac: these have the same symbol a as their heads, and both apply it to a single argument. However, the values of those arguments (b and c) are provably unequal (via distinctSymbolicHeads, in this case). We can hence use the first input a to propagate its unequal argument, e.g. using the input value SKK. Since X(SKK)bc reduces to b, and Y(SKK)bc reduces to c, we can force a disagreement by choosing unequal values for b and c; and this extends to more inputs by having them consume the remainder, as before. ∎

The following function checks for the situation described above. It checks whether argument values are unequal using a given unEq parameter, which is more general and useful than if we hard-coded some specific check:

-- | Whether the given Com values have matching symbols in their heads, but
-- | applied to unequal values (determined by the given unEq function)
symbolGivenUnequalArgs :: (Com -> Com -> Bool) -> Com -> Com -> Bool
symbolGivenUnequalArgs unEq x y = isSym headX
                               && headX == headY
                               && any id (zipWith unEq' argsX argsY)
  where (headX, argsX) = headAndArgs x
        (headY, argsY) = headAndArgs y
        unEq' a b      = unEq a b || unEq b a  -- Compare both ways round

-- | The order of arguments shouldn't alter the result
symbolGivenUnequalArgsCommutes :: (Com -> Com -> Bool) -> Com -> Com -> Bool
symbolGivenUnequalArgsCommutes f x y = symbolGivenUnequalArgs f x y
                                    == symbolGivenUnequalArgs f y x
Generating unEq functions to test with…

The symbolGivenUnequalArgs function doesn’t care how arguments are deemed to be unequal, so it lets the caller decide by passing in a function. The choice of function shouldn’t affect commutativity, so the property symbolGivenUnequalArgsCommutes is universally quantified over that choice; allowing a counterexample search to try many different functions.

This may seem surprising if you’ve not encountered it before, but we can generate a function value just as easily as a “data-like” value. For example, we can generate functions which look up their arguments in a (generated) lookup table:

genViaTable :: Eq a => Range Fuel -> Gen a -> Gen b -> Gen (a -> b)
genViaTable range genA genB = do
  def   <- genB
  pairs <- Gen.list range (tuple2 genA genB)
  pure (maybe def id . (`lookup` pairs))

However, such ordinary function values are opaque “black boxes”, which makes them less useful for property-checking: firstly, it’s hard to show them (e.g. if they form part of a counterexample); and secondly it’s hard to shrink them (to look for “simpler” alternatives). Property-checkers avoid these problems by replacing ordinary functions a -> b with their own alternative, whose constructors ‘remember’ how to show and shrink themselves. In falsify this alternative is Gen.Fun a b, so we’ll need to convert the Com -> Com -> Bool argument of symbolGivenUnequalArgsCommutes into either Fun Com (Fun Com Bool) (curried form) or, equivalently, Fun (Com, Com) Bool (uncurried form). We’ll use the uncurried form, since the tuples will be handled automatically by type class instance resolution. The following helpers let symbolGivenUnequalArgsCommutes take a Fun as argument:

-- | Transform a higher-order function to take a Fun instead
liftFun :: ((a -> b) -> c) -> Gen.Fun a b -> c
liftFun f = f . Gen.applyFun

-- | Lift a function with binary argument to take an uncurried Fun instead
liftFun2 :: ((a -> b -> c) -> d) -> Gen.Fun (a, b) c -> d
liftFun2 f = liftFun (f . curry)

To check (the lifted version of) symbolGivenUnequalArgsCommutes, we need a generator of type Gen (Fun (Com, Com) Bool). We can get one from Gen.fun, but that requires an instance of Gen.Function (Com, Com). There’s an existing instance for tuples, but we still need to implement Gen.Function Com ourselves:

instance Gen.Function Com where
  function gen = Gen.functionMap comToPre preToCom <$> Gen.function gen

This is piggybacking on an existing instance (the Gen.function gen call at the end) by converting our Com values back-and-forth to another type that already implements Gen.Function. The type I’ve chosen is [Maybe (Either Bool Symbol)], and the conversions are implemented by comToPre & preToCom:

-- | Prefix notation for expressions. `Nothing` represents an 'apply' operation.
type Prefix = [Maybe (Either Bool Symbol)]

-- | Converts expressions from "applicative style" Com to "prefix style" Prefix.
comToPre :: Com -> Prefix
comToPre x = case x of
  -- Treat K and S specially, so they're more likely to be generated
  _ | x == k -> [Just (Left False)]
  _ | x == s -> [Just (Left True )]
  -- Any other symbol
  C c        -> [Just (Right c)]
  -- Nothing acts like an apply operator for two expressions that follow it
  App l r    -> [Nothing] ++ comToPre l ++ comToPre r

-- | Parses "prefix style" Prefix back into an "applicative style" Com.
preToCom :: Prefix -> Com
preToCom = fst . go          -- Discard any remaining suffix
  where go []     = (k, [])  -- Makes preToCom total, but overlaps 'Left False'
        go (x:xs) = case x of
          Just (Left False) -> (k  , xs)
          Just (Left True ) -> (s  , xs)
          Just (Right c   ) -> (C c, xs)
          Nothing           -> let (l, ys) = go xs
                                   (r, zs) = go ys
                                in (App l r, zs)

comToPreRoundtrips :: Com -> Bool
comToPreRoundtrips c = preToCom (comToPre c) == c
main = check comToPreRoundtrips genCom
comToPreRoundtrips: OK (1.01s)
  100 successful tests

All 1 tests passed (1.01s)
preToComAlmostRoundtrips :: Prefix -> Bool
preToComAlmostRoundtrips p = comToPre (preToCom p') == p'
  where p' = comToPre (preToCom p)  -- Extra roundtrip to make p "correct"

genPre :: Gen Prefix
genPre = Gen.list (to limit) genEntry
  where genEntry = Gen.choose (Just <$> genLeaf) (pure Nothing)
        genLeaf  = Gen.choose (Left <$> Gen.bool False) (Right <$> genSym)
main = check preToComAlmostRoundtrips genPre
preToComAlmostRoundtrips: OK (0.11s)
  100 successful tests

All 1 tests passed (0.11s)

The Prefix type is a prefix form for expressions, as opposed to the “applicative form” of Com. They’re equivalent, but the structure of an expression is less obvious in prefix form, which is why it only appears in this hidden section. This is the encoding used in binary combinatory logic, although we’re allowing arbitrary Symbol values rather than just S and K.

The functions preToCom and comToPre don’t quite form an isomorphism. This is essentially due to Prefix being a prefix(-free) code:

  • The empty list [] does not correspond to a particular Com value. It a valid prefix, since appending it with anything will either form a complete expression or another valid prefix; but that’s not so useful for preToCom, which might have only been given a finite list and needs to return a Com. In this case, it just returns K (arbitrarily).
  • Prefix values may contain “too many” Nothing values. These act like App, indicating that the following expression is applied to the one after. That “following expression” may itself use Nothing to consume subsequent list elements, and so on; corresponding to the arbitrary nesting which App allows. However, the given Prefix may “run out” before specifying what those following elements should be! Again, that’s a perfectly good prefix, but not a complete value. When this happens, we will hit the same [] case as above, and simply return K (potentially many times, as the applications get “popped off the stack”).
  • Expressions encoded as Prefix values are self-delimiting, meaning that the encoding itself tells us when to finish parsing: if the value we parsed was Just _, we have finished that expression; if the value is Nothing we can finish after parsing exactly two more expressions. Hence any elements appearing after a correctly-encoded expression will be completely ignored, resulting in the same Com.
  • Our hacky use of K as a fallback for empty Prefix values introduces some ambiguity, since an expression like KK could be encoded as [Nothing, Just (Left False), Just (Left False)] (the “correct” way); or as [Nothing, Just (Left False)] (relying on the fallback behaviour to introduce another K); or indeed as [Nothing] (relying on the fallback for both K expressions). Hence the fallback behaviour should not be relied upon; also, it breaks when composing expressions together (since missing K values will only be introduced at the end of a list).
  • There’s also some redundancy between Left False/Left True and Right "K"/Right "S". This was a deliberate choice, to make falsify generate more S and K expressions.

These aren’t a problem for our use of Prefix in generating values; but you may need to keep them in mind when using this encoding for other purposes (in which case I’d recommend embracing its nature as a prefix(-free) code, since it has all sorts of cool applications!)

main = check (uncurry3 (liftFun2 symbolGivenUnequalArgsCommutes))
             (tuple3 (Gen.fun (Gen.bool False)) genSymCom genSymCom)
symbolGivenUnequalArgsCommutes: OK (1.40s)
  100 successful tests

All 1 tests passed (1.40s)

Combining disagreement provers

We can never spot all disagreements, but the simple checks above can be combined into a single, reasonably-useful function. Not only is this easier to use than the individual checks, but we can also use it as the unEq argument of the symbolGivenUnequalArgs function, which lets us recurse through the given expressions looking for disagreement at any depth!

provablyDisagree :: Com -> Com -> Bool
provablyDisagree x y = distinctSymbolicHeads                   x y
                    || unequalArgCount                         x y
                    || symbolGivenUnequalArgs provablyDisagree x y

provablyDisagreeCommutes :: (Com, Com) -> Bool
provablyDisagreeCommutes (x, y) = provablyDisagree x y == provablyDisagree y x
main = check provablyDisagreeCommutes (tuple2 genSymCom genSymCom)
provablyDisagreeCommutes: OK (0.94s)
  100 successful tests

All 1 tests passed (0.94s)

An extensional equality test

Now we have ways to spot both extensional equality and inequality (in certain situations), we can fold them over the result of agreeSym. We’ll begin by extracting the numerical index provided by race, which should tell us how many inputs it takes for the given expressions to start agreeing:

-- | 'Just n' if the given expressions agree on 'n' inputs (they might also
-- | agree on fewer). 'Nothing' if it is determined that they won't agree; if
-- | that cannot be determined, the 'Delay' will never end.
extensionalInputs :: Com -> Com -> Delay (Maybe Natural)
extensionalInputs x y = race (agreeSym x y) >>= go 0
  where go !n (Same _  , m, _) = Now (Just (n+m))
        go !n (Diff a b, _, s) = if provablyDisagree (toCom a) (toCom b)
                                    then Now Nothing
                                    else race s >>= go (n+1)

agreeOnExtensionalInputs :: (Fuel, Com, Com, InputValues) -> Bool
agreeOnExtensionalInputs (n, x, y, inputs) =
    runDelayOr True (extensionalInputs x y >>= check) n
  where check Nothing  = Now True
        check (Just i) = same <$> sAt i (agree x y inputs)
main = check agreeOnExtensionalInputs (tuple4 genFuel genCom genCom genComs)
agreeOnExtensionalInputs: OK (0.66s)
  100 successful tests

All 1 tests passed (0.67s)

This is easy to transform into a legitimate predicate for testing extensional equality, that’s able to answer both True and False (at least, some of the time):

-- | Whether the given expressions are extensionally equal, i.e. cannot be
-- | distinguished by an SK expression.
extEq :: Com -> Com -> Delay Bool
extEq x y = isJust <$> extensionalInputs x y

extEq is a generalisation of our previous checks:

extEqGeneralisesEqAndNormalEqAndEverAgree :: (Fuel, Com, Com) -> Bool
extEqGeneralisesEqAndNormalEqAndEverAgree (n, x, y) =
    case ( go (            extEq x y)
         , go (        everAgree x y)
         , go (same <$> normalEq x y)
         ,                  (==) x y) of
      (Now False, Now True, _       , _   ) -> False
      (Now False, _       , Now True, _   ) -> False
      (Now False, _       , _       , True) -> False
      _                                     -> True
  where go = runDelay n
main = check extEqGeneralisesEqAndNormalEqAndEverAgree
             (tuple3 genFuel genCom genCom)
extEqGeneralisesEqAndNormalEqAndEverAgree: OK (0.67s)
  100 successful tests

All 1 tests passed (0.67s)

Next time

We’ve seen how extensional equality works in theory, part 3 explores these definitions in more depth, to find out where I was going wrong in my egglog.