Debugging random Haskell
I was asked to look at some Haskell code the other day, from some kind of online roleplaying game. The code is given, along with the briefing that it should find a valid password amongst a file of about 35,000. Its creator supposedly fired, and managed to mangle it before he was kicked out. Our task is to find the correct password.
Rather than look at the file, let’s dive straight into the code. I found this particularly enjoyable, since it’s essentially an exercise in debugging and refactoring, which is how I tend to write code. I’ll write a naïve version which is generally very inefficient, over-complicated, repetitive, hard-coded and fragile. I’ll then repeatedly refactor it until it becomes efficient, simple, terse, generic and robust (I hope!). I assume no knowledge of Haskell, but knowing a language with first-class functions like Javascript may be useful.
import ParseLibImport an unspecified parsing library; this isn’t too significant, but some Googling will find it if you really care.
runner :: [Char] -> [a]This is a type annotation. It’s only a sanity check, since Haskell infers types. Thus there’s no point debugging it, so we remove it. I’ll ignore type annotations from now on, as a mangled annotation will just confuse us.
runner a | (length (test (a ++ "z")) == (sum [product (3..5), product (6..9), 1])) = False =
mzero
| otherwise =
result [papply defB a]This is an invalid function definition. Let’s clean up the line breaks:
runner a | (length (test (a ++ "z")) == (sum [product (3..5), product (6..9), 1])) = False = mzero
| otherwise = result [papply defB a]Each line is a different ‘clause’ of the function; they have the
following form (| means
‘where’):
myFunction myArgument | myCondition = iWillRunIfMyConditionIsTrue
| otherwise = iWillRunIfMyConditionIsFalseThis syntax can simplify complicated function definitions, but in our case I think it’s clearer to use an equivalent if condition:
runner a = if (length (test (a ++ "z")) == (sum [product (3..5), product (6..9), 1])) = False
then mzero
else result [papply defB a]There’s a rookie error at the end of the if-condition, common from
many languages: it uses = (definition)
in the if condition, instead of ==
(comparison). Let’s swap it:
runner a = if (length (test (a ++ "z")) == (sum [product (3..5), product (6..9), 1])) == False
then mzero
else result [papply defB a]Notice that we’re comparing something to False; we can
get rid of the == False
if we swap the then and else branches
around:
runner a = if length (test (a ++ "z")) == (sum [product (3..5), product (6..9), 1])
then result [papply defB a]
else mzeroThe right-hand-side of the condition doesn’t contain a, so it must be constant (due to
‘referential transparency’: the output of a function is completely
determined by its input, and here there are no inputs). Let’s calculate
it:
sum [product (3..5), product (6..9), 1]There’s no such thing as (3..5)
or (6..9).
It could mean 3.5 and 6.9, but that
wouldn’t type check, since product wants
a list of numbers. This makes it more likely to be [3..5]
and [6..9]
which is shorthand for enumFromTo 3 5
and enumFromTo 6 9,
which are, respectively, [3, 4, 5]
and [6, 7, 8, 9]:
sum [product [3, 4, 5], product [6, 7, 8, 9], 1]The products are easy to calculate:
sum [60, 3024, 1]As is the sum:
3084If we put this simplified condition back into our original function we get:
runner a = if length (test (a ++ "z")) == 3084
then result [papply defB a]
else mzero++ is
a built-in function, defined like this (where [] is an empty list and x : y is a
singly-linked list starting with element x and followed by list y):
[] ++ list2 = list2 -- Concatenating an empty list does nothing
(elem : list1) ++ list2 = elem : (list1 ++ list2) -- RecurseWhat about test?
test x = init xThe function test, of
argument x, is defined as init applied
to x. Clearly:
test = initThat transformation is known as eta-reduction. The init function
is built in to Haskell’s standard library too. It looks like the
following:
init (elem : []) = [] -- init of a one-element list is an empty list
init (elem : xs) = elem : init xs -- Recurseinit
chops an element off the end of a list. Since we’re concatenating with a
non-empty list, init will
recurse over a and end up at
"z".
Let’s simplify:
length (init (a ++ "z"))
length (a ++ init "z")
length (a ++ init ('z' : [])) -- "z" is a one-element list of Chars
length (a ++ [])
length aThis gives us a much simpler version of the original function:
runner a = if length a == 3084
then result [papply defB a]
else mzeroNow what’s result?
result [] = "Nope"
result ((x,xs):ys) = x(foo, bar) is a pair, so
result is extracting the first
element of a pair, which itself is the first element of a list. We know
that this element will be a String, since
it has to be of the same type as "Nope".
We can split this into two separate element extractions, using the
built-in functions fst (get the
first element of a pair) and head (get the
first element of a list):
result [] = "Nope"
result xs = fst (head xs)Notice that runner calls
result on a one-element list
[papply defB x], therefore:
result [papply defB x]
result ((papply defB x) : []) -- [foo] is sugar for foo:[]
fst (head ((papply defB x) : [])) -- Bring 'result' in-line
fst (papply defB x) -- 'head (x:y)' is just 'x'This gives us a simplified runner:
runner x = if length x == 3084
then fst (papply defB x)
else mzeromzero is funny; it’s a
‘method’ of a ‘type class’ called MonadPlus. A
type class is basically an interface, and types can be instances of type
classes in the same way that Object Oriented classes can implement OO
interfaces (confusing re-use of existing terminology, I know!).
To know which implementation of mzero we’re dealing with, we need to
know it’s type. This must be the same as the then branch,
which we just discovered must be a String
(otherwise we couldn’t have put "Nope"
as a possible return value). As we saw above, String is just
a synonym for a list of Chars (ie.
[Char]),
and indeed lists are an instance of the type class MonadPlus.
Their implementation of mzero is
the empty list [], which (since
we’re dealing with [Char] AKA
String)
we can also write as "":
runner x = if length x == 3084
then fst (papply defB x)
else ""That’s as simple as we can get with runner. Although we can rearrange it
back to the two-clause version:
runner x | length x == 3084 = fst (papply defB x)
| otherwise = ""Next we look at the parsers:
defB = do char 'X'
baz <- digit
char 'X'
char 'W'
foo <- defC
bar <- defD
char 'a'
fooo <- many (do char 'f'
digit)
char 'Y'
char 'Z'
return (foo ++ bar)This is ‘do notation’,
which is nice syntactic sugar for Haskell’s famously scary feature,
Monads. A Monad is just
a way to chain functions together. Here we’re chaining together simple
parsers into more complex ones. We just write do followed by
a bunch of parsers and the Monad will do
all of the plumbing, composition, backtracking, shortcutting, etc. for
us.
A few points to mention:
- Haskell’s layout rule lets us do without braces and semicolons in favour of indenting and dedenting; the indentation above needs straightening out.
- By default the intermediate return-values are ignored. To keep them
we use an arrow, eg.
foo <- barwill store the result ofbarinfoo. - Many of the results above are ignored, so there’s not point using
the arrows (eg.
bazandfoooare ignored, butfooandbarare used at the end). - A
doblock is just a regular function, hence the nested argument tomany - The return value of a
doblock is the return value of the last function in the chain; to override this, we can use thereturnfunction which doesn’t perform any action (in our case any parsing) but does provide a return value.
defB = do char 'X'
digit
char 'X'
char 'W'
foo <- defC
bar <- defD
char 'a'
many (do char 'f'
digit)
char 'Y'
char 'Z'
return (foo ++ bar)It turns out that defD is
only used once in the whole program, so we may as well insert its
definition straight in here (suitably indented):
defB = do char 'X'
digit
char 'X'
char 'W'
foo <- defC
bar <- do foo <- defC
do baar <- defC
digit
fooo <- many (do digit)
q <- defE
return (q : "d")
char 'a'
many (do char 'f'
digit)
char 'Y'
char 'Z'
return (foo ++ bar)Note that nested chains can be flattened; do A; (do B; C)
is the same as do A; B; C,
as long as we keep the correct return values assigned to any
intermediate results. Likewise do digit is
the same as digit, since we’re
not chaining anything on to it:
defB = do char 'X'
digit
char 'X'
char 'W'
foo <- defC
defC
defC
digit
many digit
q <- defE
char 'a'
many (do char 'f'
digit)
char 'Y'
char 'Z'
return (foo ++ q ++ "d")Likewise, defE is never used
anywhere else. We may as well inline it too:
defB = do char 'X'
digit
char 'X'
char 'W'
foo <- defC
defC
defC
digit
many digit
q <- do foo <- defC
do fooo <- defC
do foooo <- defC
return "i"
char 'a'
many (do char 'f'
digit)
char 'Y'
char 'Z'
return (foo ++ q ++ "d")Simplifying out (we now know the value of q is "i"):
defB = do char 'X'
digit
char 'X'
char 'W'
foo <- defC
defC
defC
digit
many digit
defC
defC
defC
char 'a'
many (do char 'f'
digit)
char 'Y'
char 'Z'
return (foo ++ "id")Now let’s look at defC. We
keep this separate since it’s used many times:
defC = do char '!'
baz <- digit
char 's'
do s <- many (do char 's')
return "Val"
+++ do char '@'
baz <- char 's'
digit
do s <- many (do digit)
return 'Val'This is actually in two parts, combined using +++. I’ll
split them into separate functions to make it clearer:
defCA = do char '!'
baz <- digit
char 's'
do s <- many (do char 's')
return "Val"
defCB = do char '@'
baz <- char 's'
digit
do s <- many (do digit)
return "Val"
defC = defCA +++ defCBAccording to ParseLib, +++ is a
choice, so defC will try to
match defCA first. If it fails,
the input will be wound back and defCB will be tried. defC will only fail to match if both
defCA and defCB fail to match. Let’s clean it
up:
defCA = do char '!'
digit
char 's'
many (char 's')
return "Val"
defCB = do char '@'
char 's'
digit
many digit
return "Val"
defC = defCA +++ defCBWe can see that both branches will return "Val"
if successful, so we no longer need to use the foo variable in defB, we can just use "Val"
directly:
defB = do char 'X'
digit
char 'X'
char 'W'
defC
defC
defC
digit
many digit
defC
defC
defC
char 'a'
many (do char 'f'
digit)
char 'Y'
char 'Z'
return "Valid"
defCA = do char '!'
digit
char 's'
many (char 's')
defCB = do char '@'
char 's'
digit
many digitThat’s about all the ‘simplifying’ we can do, but with a little more work we can make it more succinct, if less simple.
First we can collect together individual characters into strings:
string (c:cs) = do char c
string csThis reduces defB and defCB to:
defB = do char 'X'
digit
string "XW"
defC
defC
defC
digit
many digit
defC
defC
defC
char 'a'
many (do char 'f'
digit)
string "YZ"
return "Valid"
defCB = do string "@s"
digit
many digitWe can also special-case the foo; many foo pattern (+ in
regular-expression syntax). This works by recursing on the left until
the pattern p fails to match,
backtracking once, then matching p without recursing:
plus p = (do p
plus p) +++ pThis lets us write defB,
3A
and 3B
like this:
defB = do char 'X'
digit
string "XW"
defC
defC
defC
plus digit
defC
defC
defC
char 'a'
many (do char 'f'
digit)
string "YZ"
return "Valid"
defCA = do char '!'
digit
plus (char 's')
defCB = do string "@s"
plus digitNotice that defC is always
used three times in a row? We can hard-code that, to get:
defC3 = do defC
defC
defC
defB = do char 'X'
digit
string "XW"
defC3
plus digit
defC3
char 'a'
many (do char 'f'
digit)
string "YZ"
return "Valid"We can make this a bit more compact if we use explicit bind functions
>>
rather than do notation in a few places (do a; b is
sugar for a >> b),
and sprinkle some standard Haskell functions around to reduce
redundancy. This gives the final, complete program as:
runner x | length x == 3084 = ""
| otherwise = fst (papply defB x)
defB = do string "X"
digit
string "XW"
defC3
plus digit
defC3
string "a"
many (string "f" >> digit)
string "YZ"
return "Valid"
string = join . (map char)
plus p = (p >> plus p) +++ p
defC3 = defC >> defC >> defC
defC = (string "!" >> digit >> plus (string "s")) +++
(string "@s" >> plus digit)It’s straightforward enough to see what this is doing even without running it. If you do want to run it, say as a shell command, you can add this:
main = do stdInput <- getContents
let inputLines = lines stdInput
successes = filter (("Valid" ==) . runner) inputLines
formatted = unlines successes
putStr formattedPipe passwords in to stdin and you’ll get valid ones out of stdout. Note that Haskell is non-strict, meaning your program will handle pipes properly, and not sit there buffering.