haskell-te: 9e601274eb8fc8cfabd3e2f97dd60636228f5442

     1: # Haskell Theory Exploration Benchmarks #
     2: 
     3: This directory contains benchmarks for automated theory exploration tools. There
     4: are two sorts of benchmarks:
     5: 
     6: ### Standalone Theories ##
     7: 
     8: These are the files ending `.smt2`, and are written in the TIP format:
     9: 
    10:  - `benchmarks/nat-simple.smt2` is a simple theory of Natural numbers, with
    11:    addition and multiplication, comparable to that used in [1] and [2]
    12:  - `benchmarks/nat-full.smt2` is similar to `nat-simple.smt2` but also contains
    13:    an exponentiation function, comparable to that used in [3]
    14:  - `benchmarks/list-full.smt2` is a theory of lists, comparable to that used
    15:    in [2]
    16: 
    17: The standalone benchmarks have a corresponding file in `ground-truth/`
    18: containing the statements considered "interesting" for that theory (these are
    19: taken from [1]).
    20: 
    21: ### Theory Exploration Benchmark ##
    22: 
    23: We use the Theory Exploration Benchmark project, which includes a corpus of
    24: definitions and statements. Subsets of these definitions are sampled
    25: (deterministically), and the applicable statements are used as the ground truth.
    26: 
    27: ## Running Benchmarks ##
    28: 
    29: We use `asv` to run the benchmarks and manage the results. A suitable
    30: environment can be entered by running `nix-shell benchmarkEnv.nix` from the root
    31: directory of this repository (i.e. the directory above this `benchmarks/` one).
    32: 
    33: The usual `asv` commands can be used: `asv run`, `asv publish`, etc.
    34: 
    35: Note that benchmarking can take a while. In particular, we do all of the
    36: exploration in the 'setup' phase rather than in the benchmarks themselves; this
    37: makes the setup phase slow, but the benchmarks which follow are almost instant.
    38: 
    39: Our policy is to commit benchmark results (which include the raw input/output
    40: data and specs of the machine) to git to ensure reproducibility. We do two
    41: things to save resources:
    42: 
    43:  - We don't commit any "derived" data. In particular, we don't include any HTML
    44:    reports, since they can be regenerated automatically.
    45:  - When we want to store a benchmark run, we first compress it with lzip. This
    46:    *drastically* reduces the file size, and doesn't negatively affect git usage
    47:    since these results will never change (that would be tampering!)
    48: 
    49: To store a result, commit any `benchmarks.json` and `machine.json` files as-is,
    50: and lzip the benchmark output using a command like:
    51: 
    52:     lzip <       .asv/results/<machine-name>/<commit-id>-<args>.json \
    53:          > benchmarks/results/<machine-name>/<commit-id>-<args>.json.lz
    54: 
    55: Commit the resulting `.json.lz` file, but not the original `.json` file. When
    56: committing new results, keep in mind that the raw data can get quite large, and
    57: these will hang around forever in git. Hence only include those which are
    58: reliable (e.g. don't run benchmarks at the same time as other resource-intensive
    59: programs).
    60: 
    61: To use this lzipped data with asv, it can simply be unzipped into place. The
    62: benchmarking environment provides an `unzipBenchmarks` command which will do
    63: this for you.
    64: 
    65: ## References ##
    66: 
    67: [1]: Automated discovery of inductive lemmas, Moa Johansson 2009
    68: 
    69: [2]: Automating inductive proofs using theory exploration, Koen Claessen, Moa
    70:      Johansson, Dan Rosén and Nicholas Smallbone 2013
    71: 
    72: [3]: Scheme-based theorem discovery and concept invention, Omar Montano-Rivas,
    73:      Roy McCasland, Lucas Dixon and Alan Bundy 2012
Generated by git2html.