chriswarbo-net: 2308cd45e87508db71e797c1f96fd48b4b24d6ac

     1: ---
     2: title: Querying Impurities
     3: ---
     4: 
     5: If a computation is "pure", it has no side-effects: the only thing it does is to
     6: create its output, and that output depends only on the computation's input. Pure
     7: computations can't, for example, read data from a network, format a hard drive
     8: or alter their behaviour based on some internal state.
     9: 
    10: One reason pure computations are nice is that they're easy to reason about, both
    11: for humans and for algorithms. The prevalence of pure computations in Haskell is
    12: a major reason why the [Glasgow Haskell Compiler](https://www.haskell.org/ghc/)
    13: is able to perform many heavy-duty optimisations: re-ordering, combining and
    14: even throwing away large amounts of code, whilst preserving the original's
    15: behaviour.
    16: 
    17: Many languages aren't so pure; any expression can make important
    18: globally-visible, behaviour-determining effects at any time. It is important
    19: that the implementation of such languages don't interfere with the order of
    20: these effects, since in general effects don't commute (i.e. "X then Y" isn't
    21: always equivalent to "Y then X"). Hence most languages can't make such sweeping
    22: optimisations that GHC is able to.
    23: 
    24: Such implementations are *conservative*: since two expressions (e.g. procedure
    25: calls) *could* be performing arbitrary effects which *could* be non-commuting,
    26: such changes aren't attempted. By avoiding an optimisation completely, we
    27: *definitely* avoid all problematic cases; but we unfortunately avoid all
    28: non-problematic cases too.
    29: 
    30: It would be interesting if we could *query* such language implementations to ask
    31: what the potential impurities are in some piece of code. For example, we might
    32: ask a Python implementation what the impurities are in `x() + y()`, and it would
    33: tell us that looking up `x` and `y` in the environment may not be pure; that
    34: (attempting to) call their values as procedures may not be pure (e.g. if they're
    35: not callable, an exception will be thrown, and the handler may be impure); that
    36: if they are procedures, those procedures may not be pure; that looking up the
    37: `__add__` method on the first result may not be pure and that calling such an
    38: `__add__` method may not be pure. Whew!
    39: 
    40: This looks intimidating, but is mostly a matter of syntax: knowing which
    41: language constructs can lead to effectful code being executed. In the above
    42: example, we know that `+` is syntactic sugar for calling an `__add__` method,
    43: and we know that looking up methods can be effectful. There's actually no need
    44: to look through abstractions, e.g. if an expression contains a procedure call,
    45: we can assume it's effectful without looking at the procedure.
    46: 
    47: What would such an ability buy us? For a typical codebase, probably not much.
    48: However, we could combine this with *purity annotations*: if we annotate a
    49: construct as being pure, it won't appear in our impurity queries. Pure
    50: constructs might include e.g. simple arithmetic, but may also code with
    51: *unobservable* effects; for example, a procedure which includes a memo table to
    52: avoid recalculating outputs. We can't mark the *pieces* as pure, since they
    53: involve mutating a persistent state, but the construct *as a whole* is pure.
    54: 
    55: *That* would be an interesting ability to have, since it lets us take
    56: information about code out of a developer's head and put it into the machine.
    57: One immediate benefit would be providing extra knowledge to those reading the
    58: code at a later date: a reassurance that no matter what crazy, dynamic stuff is
    59: going on inside a piece of code, that it shouldn't leak that nastiness to the
    60: outside. Perhaps another would be to test, prove, disprove or infer such
    61: annotations, to aid the developer in understanding their code.
    62: 
    63: Going back to the idea of optimisation, it would give those same reassurances to
    64: the *machine* as well, allowing it to reason more deeply about the code,
    65: replacing its "just in case" wariness with the confidence to make more invasive
    66: changes, such as rearranging, supercompiling, etc.
    67: 
    68: One place that might benefit from such "impurity queries" and "pure from the
    69: outside" annotations is build/packaging systems like Nix. Many Nix packages use
    70: shell scripts extensively, although they're run in an isolated environment,
    71: filesystem references are hard-coded and the results are put in a read-only
    72: filesystem. For example, a script may look up various binaries in its `PATH`,
    73: which is impure and subject to change; however, Nix may hard-code the `PATH`
    74: during installation, that path itself may be read-only and be derived from a
    75: content-based hash. Such measures turn the lookup into a pure construct, since
    76: it will always find the same file. An optimiser could utilise this, e.g. to
    77: inline the file, or to collapse pipelines based on their producer/consumer
    78: behaviour.
    79: 
    80: As it stands, shells are far too dynamic for such optimisations to be sound; for
    81: example, all sorts of dynamic hacks can be used to execute arbitrary code at
    82: various points in a script's execution. If we could query for which ones affect
    83: our code, we could take measures to disable them, annotate the result as pure,
    84: and build a new optimised package out of the old one.
Generated by git2html.