chriswarbo-net: cb95b74710cf0d4c845d71a7c7dbb508d1d6b79b
1: Originally from https://stackoverflow.com/questions/43830085/what-it-means-lambda-calculus-is-equivalent-to-turing-machine?rq=1
2:
3: > What it actually means that lambda calculus is equivalent to turing machine,
4: > and where it actually manifest itself?
5:
6: It means that:
7:
8: 1) We can construct a Turing machine which can reduce any lambda calculus
9: expression (encoded on the tape in some way)
10:
11: 2) We can construct a lambda calculus expression which can simulate any Turing
12: machine (encoded as an expression in some way)
13:
14: For (1), we need to pick some way to encode lambda calculus expressions as
15: symbols on a Turing machine tape. We can write the symbols `λ`, `(`, `)`, `.`
16: and ` ` (space) directly; the only tricky part is variable names, since there
17: are infinitely many of them (`x`, `y`, ...). My favourite way to encode
18: variables is using [de Bruijn
19: indexes](https://en.wikipedia.org/wiki/De_Bruijn_index), where we don't write
20: any argument names, and we use numbers instead of variable names: a variable
21: number N refers to the argument of the Nth lambda we're "wrapped in". For
22: example `λx. x` becomes `λ. 0`, whilst `λx. (λy. (x (x y)))` becomes `λ. (λ. (1
23: (1 0)))`, etc. This encoding is tricky to understand, but the point is that we
24: can write any lambda calculus term with a small, fixed set of symbols: the
25: "syntax" symbols `λ() .` plus the digits `0123456789` (or `01` if you prefer
26: binary). We can write these symbols one after another on a Turing machine's tape
27: (perhaps followed by another symbol to indicate "the end" (AKA `EOF` or
28: end-of-file)).
29:
30: The equivalence tells us that there is a Turing machine (a table of
31: state-transition rules) which, when run on such a tape, will either:
32:
33: - Halt, with the tape containing the normal form of the given lambda
34: expression, iff the given expression has a normal form.
35:
36: - Loop forever, iff the given expression has no normal form.
37:
38: The transition rules for such a machine would be very complicated, so I don't
39: dare try to write them down here! The point is that Turing machines can compute
40: everything that lambda calculus can compute, since Turing machines can evaluate
41: every lambda calculus expression; i.e. if I can compute something using lambda
42: calculus, then I can write that same lambda expression on to a Turing machine
43: tape and run the machine to compute the same result. Hence Turing machines are
44: *at least* as powerful as lambda calculus, in terms of what they can compute.
45:
46: For (2), we need to pick some way to encode Turing machines as lambda calculus
47: expressions. There are systematic ways to encode data as lambda expressions,
48: e.g. [Church encoding](https://en.wikipedia.org/wiki/Church_encoding) and
49: [Morgensen-Scott
50: encoding](https://en.wikipedia.org/wiki/Mogensen%E2%80%93Scott_encoding). We can
51: tackle this in a few steps.
52:
53: First we need a way to represent any Turing machine as a piece of data. We can
54: do this by storing a [tuple](https://en.wikipedia.org/wiki/Tuple) containing the
55: table of state-transition rules, the current state, the tape and the read/write
56: head position. Symbols and states can both be represented as [natural
57: ("counting") numbers](https://en.wikipedia.org/wiki/Natural_number) and the
58: direction (left or right) is a
59: [boolean](https://en.wikipedia.org/wiki/Boolean_data_type).
60:
61: We can actually represent the tape and read/write head position together, in an
62: elegant data structure called a ["list with a
63: zipper"](https://en.wikipedia.org/wiki/Zipper_(data_structure)), which is just a
64: tuple containing a single symbol and two [(singly-linked) lists of
65: symbols](https://en.wikipedia.org/wiki/Linked_list#Singly_linked_list). Let's
66: say we have a Turing machine tape with contents like `...abchxyz...`, which
67: extends forever in both directions, where the read/write head is currently at
68: the symbol 'h'. We can imagine the tape being draped over the read/write head,
69: like a chain over a pulley (here I've drawn the head/pulley using ●):
70:
71: ```
72: h
73: c●x
74: b y
75: a z
76: . .
77: . .
78: . .
79: ```
80:
81: This is where the idea for our zipper comes from: the single symbol is
82: whatever's over the read/write head, `h` in this case; whilst the two lists are
83: the two 'strands' dangling from the read/write head, `[c, b, a, ...]` and `[x,
84: y, z, ...]` respectively. Notice that the left 'strand' is in reverse order, due
85: to the way it hangs off the read/write head. Hence this zipper would be a tuple
86: like `(h, [c, b, a, ...], [x, y, z, ...])`.
87:
88: The reason this "zipper" is so elegant is that we can 'move' the whole
89: (infinite!) tape left and right by doing a tiny amount of computation: adjusting
90: only three values (`c`, `h` and `x` in this example. Note that, due to the way
91: singly-linked lists are constructed, the sections `[b, a, ...]` and `[y, z,
92: ...]` are unchanged, so we can re-use the old values without having to perform
93: any computation, even if they're "infinite")
94:
95: ```
96: Moved left Original Moved right
97: c h x
98: b●h c●x h●y
99: a x b y c z
100: . y a z b .
101: . z . . a .
102: . . . . . .
103: . . . . . .
104: ```
105:
106: We can treat the state-transition table as another singly-linked list, where
107: each element is a rule. A rule can be represented as a tuple containing the
108: state and symbol which trigger the rule, the symbol to write when triggered, the
109: state to transition to when triggered, and the direction to move the read/write
110: head when triggered. That's just four numbers and a boolean.
111:
112: Hence to represent a whole Turing machine in lambda calculus, we need some way
113: to represent booleans, natural numbers, tuples and singly-linked lists.
114:
115: Booleans are the simplest of these. We need one lambda term to represent "move
116: left" and one to represent "move right" (remember, these are just entries in our
117: rule table; they don't need to actually perform the tape-moving work, just
118: represent which direction to go!). Both lambda terms need the same
119: [interface](https://en.wikipedia.org/wiki/Application_programming_interface),
120: since whatever code ends up handling the rules will need to work for both 'left'
121: and 'right' terms. The simplest lambda terms which can distinguish between two
122: possibilities are the [Church
123: booleans](https://en.wikipedia.org/wiki/Church_encoding#Church_Booleans): these
124: both take two arguments, but each return a different one of them. Hence we can
125: encode "move left" as `λx. λy. x` and "move right" as `λx. λy. y`. If we're
126: given one of these, we can apply it to two arguments, and we'll get back the
127: first if we're meant to move left, and the second if we're meant to move
128: right. For example, let's say we have:
129:
130: - A lambda calculus expression `L`, which moves a given tape to the left (by
131: manipulating the first few elements of the zipper, as visualised above)
132: - Another expression `R`, which moves a given tape to the right (ditto)
133: - An expression `T`, representing a tape (see below for what this might look
134: like)
135: - A Church boolean `D`, representing which direction to move
136:
137: Now consider the expression `(((D L) R) T)`:
138:
139: - If `D` is the "move left" expression, this will be `((((λx. λy. x) L) R) T)`,
140: which reduces to `(((λy. L) R) T)`, then to `(L T)`, which (according to our
141: bullet point assumptions above) will produce a new tape, which is like `T`
142: but moved to the left.
143: - If `D` is the "move right" expression, this will instead be `((((λx. λy. y)
144: L) R) T)`, which reduces to `(((λy. y) R) T)`, then to `(R T)`, which is like
145: `T` but moved to the right.
146:
147: So that's how we encode booleans. What about (natural) numbers (which we're
148: using for states and symbols)? These are more complicated, since there are
149: infinitely many of them, but it's not too hard. The usual way we represent
150: numbers in lambda calculus is using ["Church
151: numerals"](https://en.wikipedia.org/wiki/Church_encoding#Church_numerals). These
152: come from the idea that natural numbers are either "zero" or "one more than some
153: other natural number". We can represent this formally by using the symbol `Z` to
154: represent the number zero, and stating that the expression `(S n)` is a number
155: iff `n` is a number (the choice of symbols is arbitrary, but we traditionally
156: use `Z` for zero and `S` for
157: ["successor"](https://en.wikipedia.org/wiki/Successor_function)). In this way we
158: can write the number 0 as `Z`, the number 1 as `(S Z)`, the number 2 as `(S (S
159: Z))`, and so on. These are called ["Peano
160: numbers"](https://wiki.haskell.org/Peano_numbers) (based on their use in [Peano
161: arithmetic](https://en.wikipedia.org/wiki/Peano_axioms)), or
162: ["unary"](https://en.wikipedia.org/wiki/Unary_numeral_system) or ["tally
163: marks"](https://en.wikipedia.org/wiki/Tally_marks).
164:
165: Church numerals represent these Peano numbers as lambda terms. We will represent
166: the number 0 (i.e. `Z`) as the term `λx. λy. y`; the number 1 (i.e. `(S Z)`) as
167: the term `λx. λy. (x y)`; the number 2 (i.e. `(S (S Z))`) as the term
168: `λx. λy. (x (x y))`; and so on. In every case we take two arguments (`x` and
169: `y`), and we apply the first argument (`x`) to the second argument (`y`) N times
170: to represent the number N.
171:
172: Remember that our choice of variable name doesn't matter in lambda calculus:
173: `λx. x` and `λy. y` are "the same" (technically they're
174: ["α-equivalent"](https://en.wikipedia.org/wiki/Lambda_calculus#Alpha_equivalence). Side
175: note: α-equivalent terms are identical when written using de Bruijn
176: indexes!). Hence we can change the traditional `x` and `y` argument names into
177: `s` and `z` and it means the same thing, but the connection to Peano numerals
178: should hopefully be clearer:
179:
180: ```
181: ╭────────╥──────────────┬───────────────────╮
182: │ Number ║ Peano number │ Church numeral │
183: ╞════════╬══════════════╪═══════════════════╡
184: │ 0 ║ Z │ λs. λz. z │
185: │ 1 ║ (S Z) │ λs. λz. (s z) │
186: │ 2 ║ (S (S Z)) │ λs. λz. (s (s z)) │
187: │ ... ║ ... │ ... │
188: └────────╨──────────────┴───────────────────┘
189: ```
190:
191: If we have some Church numeral `x`, we can get an equivalent expression via
192: [η-expansion](https://en.wikipedia.org/wiki/Lambda_calculus#%CE%B7-reduction):
193: `λs. λz. ((x s) z)`. This expression just accepts the two arguments `s` and `z`,
194: and passes them on to `x`, so has no observable difference from `x`. If we
195: instead wrap the result of `x` in an extra application of `s`, we get a Church
196: numeral which is identical to 'the successor of `x`' (AKA 'one more than `x`'):
197: `λs. λz. (s ((x s) z))`. If we accept `x` as an argument, we get an encoding of
198: the `S` from our Peano numbers:
199:
200: ```
201: λx. λs. λz. (s ((x s) z))
202: ```
203:
204: We can apply `s` as many times as we like to get larger and larger
205: increments. (In contrast, [getting the predecessor is more
206: difficult!](https://en.wikipedia.org/wiki/Lambda_calculus#Arithmetic_in_lambda_calculus))
207:
208: The way I like to think about Church numerals is that they're like Peano
209: numbers, but where `Z` and `S` are abstracted out using [dependency
210: injection](https://en.wikipedia.org/wiki/Dependency_injection). To "use" a
211: Church numeral, we just apply it to two expressions: one for `z` and one for
212: `s`. We can think of it like a loop, where the value we give for `z` is used as
213: the "initial value", and the value we give for `s` performs one "step" of the
214: loop. In fact, Church numerals are very similar to the [`times` method of Ruby
215: integers](https://ruby-doc.org/core-2.5.0/Integer.html#method-i-times).
216:
217: As a simple example of how we might "use" such numbers, consider a simple
218: function EVEN? which calculates whether a given number is even (divisible by
219: 2). We can write this as a loop: zero is even, so our "initial value" is 'true';
220: then for each "step" of the loop, we negate the previous value (i.e. boolean
221: NOT). We can already encode booleans as lambda terms (using Church booleans),
222: but how might we define a NOT function? Since Church booleans just pick between
223: their arguments, we can get the opposite result by swapping those arguments!
224:
225: ```
226: λb. λx. λy. ((b y) x)
227: ```
228:
229: Once it's been applied to a Church boolean (`b`), this function will take two
230: arguments (`x` and `y`) and return one of them; i.e. it will act like a Church
231: boolean. It decides which argument to return by passing them both into `b`, but
232: in the opposite order to how they were received, so this will behave in the
233: opposite way to `b`. Hence it's a NOT function for Church booleans.
234:
235: We can use this NOT function to define our EVEN? function for Church numerals:
236: the looping is provided by the Church numeral itself, so we just need to supply
237: the initial value 'true' and the 'step' function NOT:
238:
239: ```
240: λn. ((n NOT) TRUE)
241:
242: AKA
243:
244: λn. ((n (λb. λx. λy. ((b y) x))) TRUE)
245:
246: AKA
247:
248: λn. ((n (λb. λx. λy. ((b y) x))) (λx. λy. x))
249: ```
250:
251: Now we know how to encode booleans and numbers as lambda terms; that only leaves
252: the 'container' types of tuples and lists (remember: zippers are just tuples
253: containing two lists and a value).
254:
255: Tuples are pretty easy: given a pair of terms `A` and `B`, we can encode the
256: pair as `λp. ((p A) B)`. This takes a single argument, which I've called `p`,
257: and applies it to both terms `A` and `B`. For example, to encode a pair of
258: values 0 and 'false', we first encode each separately to get `λs. λz. z` and
259: `λx. λy. y`, then we put those terms into our pair "template" above, to get:
260:
261: ```
262: λp. ((p (λs. λz. z)) (λx. λy. y))
263: ```
264:
265: Side note: our encoding of 0 is
266: [equivalent](https://en.wikipedia.org/wiki/Lambda_calculus#Alpha_equivalence) to
267: our encoding of false!
268:
269: Hopefully it's clear how to "use" such a pair: we apply it to an expression, and
270: that expression will be applied to both values (similar to ["argument unpacking"
271: in Python](https://www.python.org/dev/peps/pep-0448)).
272:
273: This just leaves us with lists. Lists are actually [a lot like (natural)
274: numbers](/blog/2014-12-04-Nat_like_types.html), so we can encode them in a
275: similar way: as a "loop". The only difference is that each "step" of the loop
276: will also be given an "element" of the list, as well as the loop state (this
277: sort of loop is usually called [a "fold" or
278: "reduce"](https://en.wikipedia.org/wiki/Fold_(higher-order_function))). We set
279: this up in a similar way to Peano numbers by saying that every list is either
280: empty, represented symbolically as `nil` (like the `Z` of Peano numbers), or
281: consists of an element `x` prepended to another list `xs`, represented
282: symbolically as `(cons x xs)` (like the `(S n)` of Peano numbers). Hence a list
283: like `[1, 2, 3]` will be represented as `(cons 1 (cons 2 (cons 3 nil)))`.
284:
285: Side note: this presentation of lists is at the foundation of the [Lisp
286: programming
287: language](https://en.wikipedia.org/wiki/Lisp_(programming_language)), whose name
288: comes from "LISt Processing", and was directly inspired by lambda calculus. Lisp
289: originated in the 1950s, although back then it [implemented variables
290: incorrectly](https://en.wikipedia.org/wiki/Scope_(computer_science)#Dynamic_scoping);
291: this was
292: [fixed](https://en.wikipedia.org/wiki/Scope_(computer_science)#Lexical_scoping)
293: in the [Scheme
294: language](https://en.wikipedia.org/wiki/Scheme_(programming_language)) in the
295: 1970s.
296:
297: Encoding `nil` in lambda calculus is identical to `Z` and 'false', except I'll
298: call the arguments `c` and `n` to represent "dependency injection" of `cons` and
299: `nil, respectively: `λc. λn. n`
300:
301: We encode `(cons x xs)` in a similar way to `(S n)`, but we use an extra
302: application to provide `c` with our `x` element. Here are some simple lists and
303: their lambda calculus encoding:
304:
305: ```
306: ╭───────────╥──────────────────────────┬───────────────────────────────────────────────────────────────╮
307: │ List ║ Lisp notation │ Lambda encoding │
308: ╞═══════════╬══════════════════════════╪═══════════════════════════════════════════════════════════════╡
309: │ [] ║ nil │ λc. λn. n │
310: │ [0] ║ (cons 0 nil) │ λc. λn. ((c (λs. λz. z)) n) │
311: │ [S, true] ║ (cons S (cons true nil)) │ λc. λn. ((c (λx. λs. λz. (s ((x s) z)))) ((c (λx. λy. x)) n)) │
312: │ ... ║ ... │ ... │
313: └───────────╨──────────────────────────┴───────────────────────────────────────────────────────────────┘
314: ```
315:
316: Lists encoded in this way act like ["fold" or "reduce"
317: functions](https://en.wikipedia.org/wiki/Fold_(higher-order_function)), [which
318: are widely used in many
319: languages](https://en.wikipedia.org/wiki/Fold_(higher-order_function)#In_various_languages),
320: so there's no shortage of examples of their usefulness.
321:
322: Finally, we might wonder how a Turing machine's infinite tape can be
323: represented. This isn't actually a problem in lambda calculus, since our
324: encoding for lists will work perfectly well for infinite lists; we just write an
325: expression which will keep calling the `c` argument forever. For example, here's
326: an infinite list of zeros:
327:
328: ```
329: λc. λn. ((λx. ((c (λs. λz. z)) (x x))) (λx. ((c (λs. λz. z)) (x x))))
330: ```
331:
332: This is just the [Y
333: combinator](https://en.wikipedia.org/wiki/Fixed-point_combinator#Fixed-point_combinators_in_lambda_calculus)
334: applied to the expression `(c (λs. λz. z))` (i.e. `(c 0)`, or `Cons` applied to
335: the number zero). Since the Y combinator keeps applying an expression to itself
336: forever, the result is `((c 0) ((c 0) ((c 0) (...))))`, i.e. equivalent to
337: `(Cons 0 (Cons 0 (Cons 0 (...))))` forever. Note that we can actually simplify
338: this by getting rid of the `n` argument, since it will never appear in an
339: infinite list. The resulting datatype is known as a
340: ["stream"](https://wiki.c2.com/?CoinductiveDataType), and it's slightly simpler
341: (but maybe less intuitive) to use a "stream with a zipper" to represent our
342: tape, instead of a "list with a zipper".
343:
344: We've just seen that any Turing machine can be represent as a lambda calculus
345: expression. The equivalence between Turing machines and lambda calculus tells us
346: that there is a lambda calculus expression which, when applied to such an
347: encoding of a Turing machine (plus tape), will either:
348:
349: - Reduce to a normal form, iff that Turing machine halts; and the resulting
350: expression will be an encoding of the Turing machine's final configuration
351: (including state, head position and tape contents).
352: - Keep reducing forever, without a normal form; iff that Turing machine doesn't
353: halt.
354:
355: The existence of such an "intepreter" expression shows that lambda calculus can
356: compute everything that Turing machines can compute, since lambda calculus can
357: simulate every Turing machine; i.e. if I can compute something using a Turing
358: machine, then I can encode that same Turing machine as a lambda expression and
359: apply the interpreter to it to compute the same result. Hence lambda calculus is
360: *at least* as powerful as Turing machines, in terms of what they can compute.
361:
362: Since each of these (Turing machines and lambda calculus) is at least as
363: powerful as the other, it must be the case that neither is more powerful than
364: the other; and hence they're equally powerful in terms of what they can compute.
365:
366: With this central issue made concrete, we can approach some of your other
367: questions (note that I've already touched on Lisp and its relation to lambda
368: calculus a little):
369:
370: > I don't understand how lambda calculus could supersede turing machine as a
371: > theoretical model of computation.
372:
373: Neither supercedes the other, but likewise neither is more fundamental. The only
374: reason to prefer one to the other is whether it makes some particular task
375: easier. For example, I find it much easier to read, write and think about
376: programs in lambda calculus (just look at how much I did it above; whilst I
377: couldn't bring myself to write a single transition rule for a Turing
378: machine!). On the other hand, it's much easier to keep track of a Turing
379: machine's time complexity, so they're useful for situations where that's more
380: important than understanding what the program is doing (e.g. in [Levin
381: search](http://www.scholarpedia.org/article/Universal_search)).
382:
383: > [Lambda calculus] is more abstract, like a programming language of it's own,
384: > not the model of how to practically compute something, make things happen.
385:
386: It depends! Both are just mathematical models, but you're asking about their
387: semantics/interpretation, but there are many ways to interpret them (AKA "assign
388: a semantics"). It's true that the most "obvious" interpretation of a Turing
389: machine is a collection of nuts and bolts, transistors and capacitors, reading
390: and writing a (magnetic?) tape; and the most "obvious" interpretation of lambda
391: calculus is via an interpreter program running on some electronic computer (my
392: hand-waved encoding of lambda calculus to a Turing machine tape is one of these;
393: there are loads of "real", runnable interpreters out there, e.g. [here's my
394: first Google hit](https://jacksongl.github.io/files/demo/lambda)). Both of these
395: are known as ["operational
396: semantics"](https://en.wikipedia.org/wiki/Operational_semantics), since they
397: ascribe "meaning" by asking "how does it make this machine behave?". Yet those
398: aren't their only interpretations!
399:
400: In particular, my translation of Turing machines into lambda calculus
401: expressions gives a *different* semantics to Turing machines, namely: Turing
402: machines are lambda expressions, which can be reduced if a suitable
403: "interpreter" expression is applied to them. From this perspective, it's Turing
404: machines that are more abstract, since they're being "compiled down to" lambda
405: calculus!
406:
407: Lambda calculus actually has *many* interpretations, and some of them are very
408: useful (not just theoretical toys, like the encodings between Turing machines
409: and lambda calculus I gave above). For example, let's say that we built a
410: physical Turing machine; we would presumably implement the state-machine as a
411: *digital circuit* (using transistors, capacitors, etc.). Well, one way of
412: interpreting what lambda calculus expressions are is [*digital
413: circuits*](http://conal.net/papers/compiling-to-categories) (to be clear: that's
414: not "circuits which evaluate lambda expressions", that's "circuits which *are*
415: lambda expressions", similar to hardware description languages like
416: [Verilog](https://en.wikipedia.org/wiki/Verilog)). In other words, even if we
417: made a physical Turing machine, we'd still be building it out of (physical)
418: lambda expressions! Note that this semantics isn't just a coincidence, it was
419: discovered in a systematic way (via category theory), so any alternative method
420: of controlling our Turing machine (e.g. [billiard
421: balls](https://en.wikipedia.org/wiki/Billiard-ball_computer), [gas
422: dynamics](http://lambda-the-ultimate.org/node/4120) or
423: [crabs](https://www.complex-systems.com/abstracts/v20_i02_a02)) would presumably
424: also turn out to be an implementation of lambda calculus.
425:
426: On a purely personal, subjective note: since lambda calculus "turns up" in more
427: situations than Turing machines (e.g. in [cartesian closed
428: categories](https://en.wikipedia.org/wiki/Cartesian_closed_category) and
429: [natural
430: deduction](https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspondence#Natural_deduction_and_lambda_calculus)),
431: I consider it to be more important and fundamental, mathematically and
432: physically; whilst Turing machines are more of a human invention, whose major
433: use is philosophical (e.g. as Turing originally used them, to argue that
434: universal computation encompasses all "effective methods" of
435: mathematics/logic). Whilst it's true that the resource usage of a Turing machine
436: is easier to track than a lambda calculus reduction, that doesn't mean Turing
437: machines are the best model we have. RAM machines are more representative of
438: "traditional" computers, whilst everyday consumer hardware is becoming
439: multi-processor and massively-parallel (e.g. with GPUs, and maybe soon TPUs);
440: someone's always trying to sell FPGA-enabled machines; plus resources like power
441: consumption are becoming more important, that aren't modelled too well by Turing
442: machines. On the "language" side we have the growing field of [cost
443: semantics](http://lambda-the-ultimate.org/node/5021) to figure our
Generated by git2html.