haskell-te: 9e601274eb8fc8cfabd3e2f97dd60636228f5442
1: # Haskell Theory Exploration Benchmarks #
2:
3: This directory contains benchmarks for automated theory exploration tools. There
4: are two sorts of benchmarks:
5:
6: ### Standalone Theories ##
7:
8: These are the files ending `.smt2`, and are written in the TIP format:
9:
10: - `benchmarks/nat-simple.smt2` is a simple theory of Natural numbers, with
11: addition and multiplication, comparable to that used in [1] and [2]
12: - `benchmarks/nat-full.smt2` is similar to `nat-simple.smt2` but also contains
13: an exponentiation function, comparable to that used in [3]
14: - `benchmarks/list-full.smt2` is a theory of lists, comparable to that used
15: in [2]
16:
17: The standalone benchmarks have a corresponding file in `ground-truth/`
18: containing the statements considered "interesting" for that theory (these are
19: taken from [1]).
20:
21: ### Theory Exploration Benchmark ##
22:
23: We use the Theory Exploration Benchmark project, which includes a corpus of
24: definitions and statements. Subsets of these definitions are sampled
25: (deterministically), and the applicable statements are used as the ground truth.
26:
27: ## Running Benchmarks ##
28:
29: We use `asv` to run the benchmarks and manage the results. A suitable
30: environment can be entered by running `nix-shell benchmarkEnv.nix` from the root
31: directory of this repository (i.e. the directory above this `benchmarks/` one).
32:
33: The usual `asv` commands can be used: `asv run`, `asv publish`, etc.
34:
35: Note that benchmarking can take a while. In particular, we do all of the
36: exploration in the 'setup' phase rather than in the benchmarks themselves; this
37: makes the setup phase slow, but the benchmarks which follow are almost instant.
38:
39: Our policy is to commit benchmark results (which include the raw input/output
40: data and specs of the machine) to git to ensure reproducibility. We do two
41: things to save resources:
42:
43: - We don't commit any "derived" data. In particular, we don't include any HTML
44: reports, since they can be regenerated automatically.
45: - When we want to store a benchmark run, we first compress it with lzip. This
46: *drastically* reduces the file size, and doesn't negatively affect git usage
47: since these results will never change (that would be tampering!)
48:
49: To store a result, commit any `benchmarks.json` and `machine.json` files as-is,
50: and lzip the benchmark output using a command like:
51:
52: lzip < .asv/results/<machine-name>/<commit-id>-<args>.json \
53: > benchmarks/results/<machine-name>/<commit-id>-<args>.json.lz
54:
55: Commit the resulting `.json.lz` file, but not the original `.json` file. When
56: committing new results, keep in mind that the raw data can get quite large, and
57: these will hang around forever in git. Hence only include those which are
58: reliable (e.g. don't run benchmarks at the same time as other resource-intensive
59: programs).
60:
61: To use this lzipped data with asv, it can simply be unzipped into place. The
62: benchmarking environment provides an `unzipBenchmarks` command which will do
63: this for you.
64:
65: ## References ##
66:
67: [1]: Automated discovery of inductive lemmas, Moa Johansson 2009
68:
69: [2]: Automating inductive proofs using theory exploration, Koen Claessen, Moa
70: Johansson, Dan Rosén and Nicholas Smallbone 2013
71:
72: [3]: Scheme-based theorem discovery and concept invention, Omar Montano-Rivas,
73: Roy McCasland, Lucas Dixon and Alan Bundy 2012
Generated by git2html.