haskell-te: 2bfa101c0a881277e204df5c91e581fc6ad1e87e
1: # Haskell Theory Exploration #
2:
3: Contact: chriswarbo@gmail.com
4:
5: Homepage: http://chriswarbo.net/git/haskell-te
6:
7: Mirrors:
8:
9: - /ipns/QmWv958VzBJQchGjYKiSaxLC9ugrjvXkqMpVrmjp9AonXq
10: - https://github.com/Warbo/haskell-te
11:
12: This program is free software: you can redistribute it and/or modify it under
13: the terms of the GNU Affero General Public License as published by the Free
14: Software Foundation, either version 3 of the License, or (at your option) any
15: later version.
16:
17: This program is distributed in the hope that it will be useful, but WITHOUT ANY
18: WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
19: PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
20:
21: You should have received a copy of the GNU Affero General Public License along
22: with this program. If not, see <http://www.gnu.org/licenses/>.
23:
24: ## Introduction: A Wrapper Around QuickSpec ##
25:
26: This repository provides commands for performing "theory exploration", mostly in
27: the Haskell programming language. Theory exploration describes the task of
28: taking in some function definitions and outputting conjectures about those
29: functions. For example, given the list functions `append` and `nil`, we might
30: get the following conjectures:
31:
32: append v0 nil = v0
33: append nil v0 = v0
34: append v0 (append v1 v2) = append (append v0 v1) v2
35:
36: In which case, we'd find that `append` and `nil` form a monoid for lists.
37:
38: Currently, we're limited to finding equations. We rely on the excellent
39: QuickSpec library to do most of the work, but we also use and provide many
40: complementary tools. We use Nix to tame the menagerie of languages and systems
41: 'under the hood'.
42:
43: ## Overview ##
44:
45: We can think of these tools as "dynamic analysers" (as opposed to static
46: analysers), since we discover properties of the given code by *running it*, over
47: and over, in many combinations. Haskell is well suited to this, since its
48: "purity" prevents functions relying on any ambient 'context': everything a
49: Haskell function needs should be taken in as an argument. The type system tells
50: us which functions we can combine together, and what arguments they should be
51: given.
52:
53: QuickSpec uses this knowledge, along with data generators for the popular test
54: framework QuickCheck, to find combinations of the given functions which seem to
55: behave the same way on many inputs. We conjecture that such expressions are
56: (beta) equivalent.
57:
58: Whilst QuickSpec is very useful, it has a few weaknesses which we seek to
59: address:
60:
61: - We provide a wrapper script around the QuickSpec library, allowing it to be
62: run as a standalone program.
63: - We provide a tool to extract all functions from a Haskell package, and
64: generate a suitable signature for QuickSpec, so the user doesn't have to.
65: - To generate our signatures, we automatically pick suitable names, types,
66: arities, data generators, comparison functions and free variables for all
67: relevant functions.
68: - We call out to Nix to ensure that all necessary packages (QuickSpec and
69: whatever is to be explored) are available.
70: - We replace QuickSpec's custom pretty-printer with a JSON format which is more
71: amenable to subsequent processing.
72: - We split apart QuickSpec's exploration phase from its 'filter out redundant
73: conjectures' phase, in case users want to explore without filtering, or wants
74: to filter some separate data.
75:
76: ## Commands ##
77:
78: Here, we use "Haskell package" to mean a directory containing a Haskell project,
79: with a `.cabal` file, etc.
80:
81: The Nix package defined in `default.nix` provides the following commands:
82:
83: - `quickspec` takes the path to a Haskell package as an argument, explores the
84: code it finds inside, and writes the conjectures it discovers to stdout.
85: - `renderEqs` reads conjectures from stdin and pretty-prints a more
86: human-friendly syntax to stdout.
87: - `reduce-equations` reads conjectures from stdin and writes to stdout only
88: those which aren't "redundant" (i.e. it removes anything which is just a
89: special case of some other conjecture).
90: - `haskellPkgToAsts` takes the path to a Haskell package as an argument and
91: outputs a JSON array detailing the function definitions it finds inside.
92: - `quickspecAsts` will explore the equations described by a JSON array given on
93: stdin, in the same format outputted by `haskellPkgToAsts`. The relevant
94: Haskell packages should also be given as an argument, so GHC knows where to
95: find them.
96: - `concurrentQuickspec` takes a "2D array" of JSON on stdin, where each element
97: of the outer array is a JSON array as per `quickspecAsts`. A QuickSpec
98: process is launched for each, and the conjectures from all of them are
99: pooled.
100:
101: The intermediate JSON data is useful if we want to remove particular functions
102: from being explored, or if we want to explore functions from multiple packages
103: together. Tools like `jq` are handy for manipulating this JSON.
104:
105: ## Examples ##
106:
107: We can start a shell with these commands in scope as follows:
108:
109: $ nix-shell --show-trace -p 'import ./. {}'
110:
111: After much building and testing, this should drop us to a "nix shell" with an
112: augmented `PATH`:
113:
114: [nix-shell]$ command -v quickspecAsts
115: /nix/store/...-haskell-theory-exploration/bin/quickspecAsts
116:
117: Now we need some code to explore; you'll probably be using your own projects,
118: but here we'll fetch a package from Hackage:
119:
120: [nix-shell]$ cabal get dlist
121: Downloading dlist-0.8.0.2...
122: Unpacking to dlist-0.8.0.2/
123:
124: First let's try exploring it, by passing the directory path to `quickspec`:
125:
126: [nix-shell]$ quickspec dlist-0.8.0.2 | tee dlist.eqs
127: ...the usual messages from nix, haddock, and cabal...
128: Checking module availability
129: Found modules
130: Building type-extraction command
131: Extracting types
132: Building scope-checking command
133: Checking scope
134: Outputting JSON
135: Finished output
136: Set DEBUG=1 if you want to see gory GHCi output.
137:
138: This stage is tricky. Set DEBUG=1 to see the debug info.
139: Getting ASTs
140: Getting types
141: Getting scope
142: Tagging
143: Tagged
144: ...more messages from nix, haddock and cabal...
145: [
146: {
147: "relation": "~=",
148: ...
149: },
150: ...
151: ]
152:
153: This can take a while, and you might want to keep an eye on GHC's memory usage.
154: The messages in the middle are from our AST extraction system as it finds
155: functions definitions in the `dlist` packages, along with their types, arities,
156: etc.
157:
158: The result is a bunch of conjectured equations involving functions from the
159: `dlist` package. These are in our JSON format, suitable for piping into other
160: tools. Since we only want to read them, we'll pipe them into `renderEqs`:
161:
162: [nix-shell]$ renderEqs < dlist.eqs
163: ((foldr v0) v1) empty ~= v1
164: (cons v0) empty ~= singleton v0
165: (map v0) empty ~= empty
166: (snoc empty) v0 ~= singleton v0
167: (apply empty) v0 ~= v0
168: head empty ~= undefined
169: tail empty ~= undefined
170: (append empty) empty ~= empty
171: (snoc ((replicate v0) v1)) v1 ~= (cons v1) ((replicate v0) v1)
172: (snoc (singleton v0)) v1 ~= (cons v0) (singleton v1)
173: head (singleton v0) ~= v0
174: tail (singleton v0) ~= empty
175: toList (fromList v0) ~= v0
176: (append ((unfoldr v0) v1)) (singleton v2) ~= (snoc ((unfoldr v0) v1)) v2
177: (append ((replicate v0) v1)) (singleton v2) ~= (snoc ((replicate v0) v1)) v2
178: (append ((replicate v0) v1)) ((replicate v2) v1) ~= (append ((replicate v2) v1)) ((replicate v0) v1)
179: (append (singleton v0)) ((unfoldr v1) v2) ~= (cons v0) ((unfoldr v1) v2)
180: (append (singleton v0)) ((replicate v1) v2) ~= (cons v0) ((replicate v1) v2)
181: (append empty) ((unfoldr v0) v1) ~= (unfoldr v0) v1
182: (append empty) ((replicate v0) v1) ~= (replicate v0) v1
183: (append ((unfoldr v0) v1)) empty ~= (unfoldr v0) v1
184: (append ((replicate v0) v1)) empty ~= (replicate v0) v1
185: (append (singleton v0)) (singleton v1) ~= (cons v0) (singleton v1)
186: (append (singleton v0)) (fromList v1) ~= (cons v0) (fromList v1)
187: (append (fromList v0)) (singleton v1) ~= (snoc (fromList v0)) v1
188: (append empty) (singleton v0) ~= singleton v0
189: (append empty) (fromList v0) ~= fromList v0
190: (append (singleton v0)) empty ~= singleton v0
191: (append (fromList v0)) empty ~= fromList v0
192: fromList (toList empty) ~= empty
193: (apply ((unfoldr v0) v1)) (toList empty) ~= toList ((unfoldr v0) v1)
194: (apply ((replicate v0) v1)) (toList empty) ~= toList ((replicate v0) v1)
195: (apply (singleton v0)) (toList empty) ~= toList (singleton v0)
196: (apply (fromList v0)) (toList empty) ~= v0
197:
198: Here we've found that `toList` is the inverse of `fromList`, that `head` is the
199: inverse of `singleton`, etc.
200:
201: We use the symbol `~=` to indicate that these are not definitions or proven
202: theorems, they're just conjectures based on many empirical tests. The terms
203: `v0`, `v1`, etc. are free variables, which are numbered from left to right per
204: equation; the prefix contains enough `v`s to prevent clashing with any function
205: name (e.g. if we had a function called `verbose`, or even `v0`, the variables
206: would render as `vv0`, `vv1`, etc.)
207:
208: Notice that there are several equations of the form `(append empty) x = x`, but
209: that more general statement itself wasn't conjectured. The reason is that `x`
210: would be a variable of type `DList a` (in fact `DList Integer`, since we
211: monomorphise), but the `dlist` package doesn't expose a `DList a` instance of
212: QuickCheck's `Arbitrary` type class. Without this, we can't test equations
213: involving such variables, so we don't include them and hence the general
214: equation wasn't found.
215:
216: If we want more control over what gets explored, we can use `haskellPkgToAsts`
217: to dump out a package's function definitions as JSON, without exploring them
218: immediately:
219:
220: [nix-shell]$ haskellPkgToAsts dlist-0.8.0.2 | tee dlist.asts
221: ...output from nix, haddock, cabal, ...
222: [{"name":"replicate","version":"0.8.0.2",...},...]
223:
224: Tools like `jq` can manipulate this JSON. Here we'll take functions from the
225: `LazyLength` module of the `list-extras` package, and functions from `dlist`
226: which have `List` in their name, and combine them into one array:
227:
228: [nix-shell]$ cabal get list-extras
229: [nix-shell]$ haskellPkgToAsts list-extras-0.4.1.4 > list-extras.asts
230: [nix-shell]$ jq -n --argfile le list-extras.asts \
231: --argfile dl dlist.asts \
232: '($dl | map(select(.name | contains("List")))) +
233: ($le | map(select(.module | contains("LazyLength"))))' |
234: tee combined.asts
235:
236: We can explore these together by using `quickspecAsts`. Note that we provide
237: *both* package directories as arguments, so GHC can find all of the necessary
238: code:
239:
240: [nix-shell]$ quickspecAsts dlist-0.8.0.2 \
241: list-extras-0.4.1.4 < combined.asts | renderEqs
242: (lengthCompare v0) v0 ~= (lengthCompare v1) v1
243: v0 ~= toList (fromList v0)
244:
245: In this case we didn't find any equations involving *both* `lengthCompare` and
246: `toList`/`fromList`, so there's no need to explore them together. Instead, we
247: could have explored them in separate processes using `concurrentQuickspec`.
248:
249: We need a different `jq` command, since the one above uses `(...) + (...)` to
250: append two arrays together, resulting in:
251:
252: [{"name":"lengthCompare",...},{"name":"toList",...},{"name":"fromList",...}]
253:
254: This time we want multiple arrays, so we use `[...] + [...]` instead to get:
255:
256: [[{"name":"lengthCompare",...}],
257: [{"name":"toList",...},{"name":"fromList",...}]]
258:
259: Here's the call we use:
260:
261: [nix-shell]$ jq -n --argfile le list-extras.asts \
262: --argfile dl dlist.asts \
263: '[$dl | map(select(.name | contains("List")))] +
264: [$le | map(select(.module | contains("LazyLength")))]' |
265: tee separate.asts
266:
267: We send these nested arrays to `concurrentQuickspec` instead of `quickspecAsts`,
268: remembering to provide both directory paths as before. We also send the
269: resulting conjectures through `reduce-equations` to remove duplicates and
270: redundancies (`quickspec` and `quickspecAsts` currently do this automatically):
271:
272: [nix-shell]$ concurrentQuickspec dlist-0.8.0.2 \
273: list-extras-0.4.1.4 < separate.asts |
274: reduce-equations | renderEqs
275: (lengthCompare v0) v0 ~= (lengthCompare v1) v1
276: v0 ~= toList (fromList v0)
277:
278: The idea of `concurrentQuickspec` is to work around QuickSpec's exponential
279: worst-case complexity: turning O(2^N) into O(k*2^(N/k)). How we should divide up
280: one big array into multiple smaller ones, without losing good conjectures in the
281: process, is currently an open question!
282:
283: ## Design Goals ##
284:
285: The design of this codebase is influenced by the following considerations:
286:
287: *Exploring arbitrary (Haskell) code*
288:
289: To be as useful as possible, we want to accept as much "normal" Haskell code as
290: we can, rather than placing too many requirements on users. Since the vast
291: majority of Haskell code only works with GHC, due to language extensions,
292: dependencies, etc. we're pretty much forced to use it.
293:
294: Likewise, most code won't build without taking specific instructions into
295: account, like ensuring dependencies are available, preprocessors are executed,
296: commandline flags are provided, etc. This forces us to use a build system, so
297: we've opted for Cabal since it's been around for long enough to "just work" in
298: most situations, especially with Nix.
299:
300: We're also forced to use an "eval" mechanism, since we need to run code supplied
301: by the user at runtime, but GHC is an ahead-of-time compiler. We've opted for
302: `nix-eval`, which spawns `runghc` in a subprocess and pipes in the given code.
303: As the name suggests, `nix-eval` runs this subprocess in an environment provided
304: by Nix, which lets us fetch, build and install any packages needed by the user's
305: code.
306:
307: *Measuring performance*
308:
309: We want a way to reliably measure and compare different exploration algorithms
310: and approaches. This requires the code to be:
311:
312: - Highly automated, so many repetitions can be executed, against many different
313: inputs, with as little human intervention as possible.
314: - Modular, so that algorithms can be swapped out and separate components can be
315: run independently: in particular so we can time the exploration without
316: including things like function extraction.
317: - Deterministic and reproducible, as far as possible, so experiments can be
318: replicated and results can be compared.
319: - Fast at running a "happy path", allowing us to perform any expensive setup
320: (like compiling dependencies and creating environments) beforehand, so they
321: don't contribute to the measured runtime. In particular, we need a way to
322: prevent `nix-eval` from compile dependencies during an `eval` call.
323:
324: ## Caveats ##
325:
326: We abstract over a lot of implementation details. These abstractions are leaky,
327: so you may encounter problems.
328:
329: Barring dependency clashes, the packages most likely to work are "helper
330: libraries" for built-in types, like `list-extras`, since they can use the
331: `Arbitrary` instances bundled with QuickCheck (`Bool`, `[a]`, `Maybe a`, etc.).
332:
333: Packages which define combinator libraries and simple 'sums of products'
334: datatypes should usually work, but you may need to define/expose some
335: `Arbitrary` instances, and maybe be selective about which functions get explored
336: together to avoid running out of memory.
337:
338: Libraries making heavy use of type classes and records may struggle, as will
339: those whose "data" can't be serialised or compared (e.g. functions).
340:
341: It goes without saying, but anything which invokes `IO` may cause damage to your
342: system. This should only be possible with `unsafePerformIO` and friends, but
343: don't blindly trust third-party code. A welcome feature would be to enforce
344: GHC's "safe haskell" mode when exploring!
345:
346: Here are some known difficulties and workarounds:
347:
348: *Function extraction uses a GHC compiler plugin*
349:
350: This means we must be able to compile your code with GHC. We do this by invoking
351: Cabal, with specially crafted arguments in a specially crafted environment. If
352: your code doesn't have an accompanying `.cabal` file we *cannot* extract
353: function information from it. We have no plans to support alternative build
354: tools at this time.
355:
356: Note that it is still be possible to *explore* non-Cabal packages (e.g. with
357: `quickspecAsts` or `concurrentQuickspec`), but `haskellPkgToAsts` cannot be used
358: to extract the function information; you'll have to provide it some other way.
359:
360: *Packages must be installable by Nix*
361:
362: We use Nix to set up our working environments, e.g. for compiling, for querying
363: GHCi, for exploring, etc. If your package can't be built by Nix then *it will
364: not work*. Unlike the Cabal requirement, this cannot be worked around.
365:
366: There are two ways to make a package work with Nix. The easiest is to have a
367: working `.cabal` file, in which case we will automatically run `cabal2nix` to
368: convert it to Nix. If this doesn't work, or you don't have a `.cabal` file, then
369: you can provide your own definition in a `default.nix` file instead. This should
370: follow the same convention as those produced by `cabal2nix`.
371:
372: *Dependency hell*
373:
374: We use a few Haskell packages internally. Some, like `reduce-equations`, provide
375: standalone commands which don't interfere with the packages being explored.
376: Others, like `QuickSpec` and `AstPlugin`, must be installed into the same
377: `ghc-pkg` database as the packages being explored, so there is potential for
378: dependencies to clash.
379:
380: Clashes with `AstPlugin` can be worked around by avoiding `quickspec` and
381: `haskellPkgToAsts`, and instead providing your own JSON to `quickspecAsts` or
382: `concurrentQuickspec`.
383:
384: Clashes with packages used during exploration (e.g. `QuickSpec`) cannot be
385: worked around, due to the nature of the algorithm: functions from one must be
386: invoked by the other, so their dependencies must be compatible.
387:
388: *Overrides*
389:
390: We prefer to hard-code our dependency versions, to make results more
391: deterministic and reproducible, but this may prevent some user packages from
392: building, so we provide the following override mechanisms:
393:
394: - Our package accepts overridable parameters, including `stable` which defaults
395: to `true`. If set to `false`, we will check out the latest revision of each
396: git repository we access, rather than the known-good revision we've
397: hard-coded. For example: `nix-shell -p 'import ./. { stable = false; }'`
398: - Some dependencies can be given in the `NIX_PATH` environment variable, e.g.
399: most of the custom Haskell packages in `nix-support/hsOverride.nix`.
400: - If the `HASKELL_PACKAGES` environment variable is set to a `.nix` file, that
401: will be used as the Haskell package set when exploring. Note that this will
402: not affect things *other* than exploration, like function extraction or setup
403: code.
404: - Note that we use the `nix-eval` package, but we *don't* expose its
405: `NIX_EVAL_HASKELL_PKGS` override. That's because we need to use it ourselves,
406: but the `HASKELL_PACKAGES` we provide acts in a similar way.
407:
408: *Extraction caveats*
409:
410: `haskellPkgToAsts` looks at all of the functions defined in the given package.
411: In particular, it *does not* extract anything which is re-exported from a
412: *different* package, and it does not include constuctors or type class methods.
413:
414: If a package defines things in a "private" (non-exposed) module, those functions
415: may be deemed unexplorable, since we can't import the module; *even if* an
416: exposed module happens to re-export them elsewhere.
417:
418: We can work around these problems by defining wrapper functions (e.g. in a
419: "helper" package). For example:
420:
421: -- Wrap constructors in functions
422: nil = []
423: cons = (:)
424:
425: -- Wrap methods in functions
426: listEq :: Eq a => [a] -> [a] -> Bool
427: listEq = (==)
428:
429: -- Wrap unexposed definitions in functions
430: publicFoo = privateFoo
431:
432: -- Wrap re-exported names in functions
433: definedInPublic = reexportedPrivateName
434:
435: There is also an edge case if we try to explore a function whose module doesn't
436: expose the relevant types. We won't be able to add variables for these types,
437: since the type names won't be in scope. Unfortunately this will die ungracefully
438: with a GHC error.
439:
440: We can work around this by re-exporting the desired types from a module which we
441: know will get imported. For example, we could wrap the relevant functions, and
442: also expose the types:
443:
444: module ExplorationHelper (MyType1, MyType2, myFunc) where
445:
446: import MyPkg.Types (MyType1, MyType2)
447: import qualified MyPkg.Funcs (myFunc)
448:
449: myFunc = MyPkg.Funcs.myFunc
450:
451: *Polymorphism, higher kinds, Arbitrary, Ord, etc.*
452:
453: QuickSpec's exploration algorithm relies on the use of free variables, and
454: testing by plugging many different values into these variables to see what
455: happens.
456:
457: To do this, we need instances of QuickCheck's `Arbitrary` typeclass for each
458: function's inputs, so we can generate random values. We also need the outputs to
459: be comparable, either by having an instance of `Ord` (we could use `Eq`, but it
460: would require O(n^2) comparisons) or an instance of `Serialize`.
461:
462: We use the latter if possible, since serialised data can be easily hashed to a
463: fixed-size value, which may save time and memory as our function outputs grow.
464:
465: Another consideration is that we need code to be monomorphic: we can't generate
466: random inputs if we don't know which particular type to generate values of. For
467: example, to explore `id :: (Arbitrary a) => a -> a`, how do we choose which `a`
468: to use?
469:
470: We solve this by 'monomorphising': replacing type variables with a particular
471: concrete type. Currently, we replace variables of kind `*` with `Integer` and
472: those of kind `* -> *` with `[]` (i.e. lists). If the variable has constraints,
473: and these aren't satisfied by `Integer` or `[]`, then we use Template Haskell to
474: look up a valid instance in the module's scope; if there are multiple
475: possibilities, one is chosen non-deterministically.
476:
477: These instance-finding mechanisms are rather basic and could certainly be
478: improved, but more sophisticated approaches (e.g. logic programming) would be
479: better off being developed as a separate project.
480:
481: It's easy to work around most of these mechanisms' failures, by simply writing
482: down monomorphic versions of your functions, e.g.
483:
484: myCrazyFunc :: Heavy Wizardry..
485: myCrazyFunc = ...
486:
487: myMonoFunc :: Bool -> String
488: myMonoFunc = myCrazyFunc
489:
490: One annoyance is that many projects use QuickCheck "internally", and hence their
491: test suites define `Arbitrary` instances, but these aren't provided by their
492: library interface, so we can't use them.
493:
494: *Resource usage*
495:
496: We use QuickSpec version 1, which has a rather slow algorithm. In particular,
497: memory usage may explode when exploring some functions and not for others.
498: Characterising this behaviour is itself an interesting question.
499:
500: To prevent GHC eating all of your resources, it can be killed after a time limit
501: by setting the `MAX_SECS` env var, or memory limit by setting `MAX_KB`.
502:
503: ## Other Details ##
504:
505: - Tested on NixOS Linux, but other Linux systems should also work
506: - The main "user facing" functionality is the package provided by `default.nix`
507: - The main "researcher facing" functionality is the benchmark suite, which is
508: explained in more detail by `benchmarks/README`
509: - We track issues using [Artemis](http://www.mrzv.org/software/artemis/). Issue
510: data is stored in `.issues` and managed by Git.
511: - Bug reports, fixes and enhancements are welcome. These can be emailed to the
512: author for wider dissemination. Note that Git can be used to email changes,
513: if you like.
Generated by git2html.