Haskell’s evaluation isn’t magic

One common complaint with lazy functional programming is that reasoning about the performance of your code is harder than in a strict language. Will an expression be repeatedly evaluated or a closure repeatedly accessed? Will code use up more or less heap space? For that matter, many people have no clue as to how lazy evaluation works in the first place.
Frustrated Lady

This blog post show that execution in a lazy language is both understandable and predictable.

Haskell is in essence a heavily sugared and extended version of a typed λ-calculus (i.e. λ-calculus with a type system attached). If you don’t understand simple λ-calculus, you should read a tutorial, such as the first two chapters of Introduction to Lambda Calculus[pdf]. We won’t be doing anything too complicated with it, but you should understand the basic syntax and ideas behind it. If we want to understand Haskell’s execution, we should should first understand call-by-need evaluation of λ-calculus.

In 1993 Launchbury wrote A Natural Semantics for Lazy Evaluation [pdf], which provided a simple operational semantics (I’ll explain what that is in a second) for understanding the lazy evaluation of an extended λ-calculus. The most important extension is the let statements: “lets are irreplaceable in lazy functional programming in that they may create cyclic structures in the heap. This cannot be achieved without them (or without something essentially equivalent), and cyclic structures have a huge implication for the amount of computation that may be performed” . He also includes constructors, case statements, and primitives in an extension to the base semantics. Operational semantics is very appropriately named: it provides semantics, or a meaning, to the code in terms of its operation on some abstract machine. What precisely is an abstract machine? Basically, it’s a mathematical model of a machine. Rather than create a physical machine with moving or electronic parts, you create a transition function. This function takes the current state (e.g. the current heap, stack, and the expression you want to reduce), and produces the next state (say, the heap remained the same, but you pushed something onto the stack and you’re evaluating some subexpression). Launchbury’s semantics are reasonably high-level and simple, but still capture sharing and heap usage.

Before we continue, let’s remind ourselves how the λ-calculus is evaluated. There are several strategies for reducing a λ-term, which have different pros and cons. In normal order reduction, the outermost leftmost redex (reducible expression, i.e. something of the form (λx.y)z is reduced. This means that you don’t evaluate the arguments to a function until after they’re plugged into the function. Call-by-name is similar to normal order, except that you don’t reduce anything inside a λ-abstraction. Call-by-need is very similar to call-by-name, but with one key difference: call-by-name implements sharing (I’ll explain what that is momentarily). Call-b-name and call-by-need stand in contrast with call-by-value reduction, where you reduce the outermost redex only after you reduce its right hand side (i.e. the arguments to the function get evaluated before the function does).

What does sharing mean? It means that arguments to a function are only evaluated once; the calculated values are “shared” between their uses. For example, consider the expression (λx. mult x x) (ackermann 4 1). Under call-by-value, you first reduce this to (λx. mult x x) 65533, and then to mult 65533 65533, then to 4,294,574,089. In call-by-name, you first reduce it to mult (ackermann 4 1) (ackermann 4 1). You then evaluate (ackermann 4 1) twice, because it appears twice. Call-by-need gets rid of this repeated evaluation: the first time you evaluate (ackermann 4 1), you replace both occurrences of it with 65533. Obviously, this is a very important optimization. It’s important to point out here that sharing doesn’t mean Common Subexpression Elimination; it only means that arguments are evaluated only once.

Before you can apply the reduction rules, Launchbury’s semantics require a term to be normalized. A Launchbury-normalized term is one in which every name is unique (so scope isn’t important because something in an inner scope cannot shadow something in an outer scope), and which every application involves applying a variable to an expression. Normalization is simple: just replace every variable with a fresh variable (so everything becomes distinctly named), and replace constructs of the form “e1 e2″ with “let x = e2 in e1 x”. So, in our previous example, (λx. mult x x) (ackermann 4 1) is normalized to let z = expensive 4 1 in (λx. mult x x) z. Since we will store our let-bound variables on the heap, this means that implementing sharing is trivial: we update the reference on the heap the first time we evaluate it, and just use that value every other time.

In order to help illustrate the semantics, I’ve written up a small interpreter which takes in a normalized expression and reduces it, returning both the reduced expression and a log of how it got there. I didn’t implement a normalizer; instead expressions need to be normalized by hand.

Let’s go over the interpreter. I’ll leave out constructors, primitives and case statements, although they’re on hackage. I’ll also leave out the implementation of the various helper functions; their details aren’t important to understanding the semantics.

To start out with, we should define some types to represent an expression:

type Name = String 
 
data Expr =  Lambda Name Expr
           | Apply Expr Name
           | Var Name
           | Let Bindings Expr
           deriving (Eq, Show)
 
type Binding = (Name,Expr)
type Bindings = [Binding])
 
type Heap = [(Var, Expr)]
 
type ReduceM a =  ...
 
--record that current state
data ReduceState = ...

Expr is fairly simple: It represents the syntax tree of lambda expressions extended with let. We’ll represent variables as strings, because it’s convenient. A Heap, with respect to the paper, is “a partial function from variables to expressions”, however, you can also think of it as “an unordered set of variable/expression pairs, binding distinct variable names to expressions”. The latter interpretation is easier to work with, so we’ll be using it. I’m leaving out the implementation of Show, but the Hackage code includes an instance of show that outputs reasonable looking code. ReduceM keeps track of the state of the heap, the fresh variables, and the log, as well as allowing reduce to fail.

Here are the functions to make working with ReduceM reasonable:

 
 
rmRun :: ReduceM a → ReduceState → (Either String a, ReduceState)
 
-- hides the implementation detail of failing; makes it easier to
-- swap out the underlying monad.
rmErr :: String → ReduceM Expr
 
-- |Like sub, but for a list of things to substite
-- useful for implementing recursive lets (i.e. letrec)
subs :: [(Name,Name)](Expr → Expr)
 
-- |e[x/y] in Launchbury's notation
--  [x ↦ y]e in Pierce's notation in TaPL
--  recursively descend expression tree to substitute a free variable
sub ::  Name → Name → (Expr → Expr)
 
-- |freshen takes an expression, and returns the same expression with every 
-- bound variable substituted for a fresh variable. 
freshen :: Expr → ReduceM Expr
 
type ErrorOr a = Either String a
type Log = String
 
-- look up a binding on the heap
heapLookup :: Name → ReduceM Expr
 
-- remove a binding from the heap
heapRemove :: Name → ReduceM ()
 
-- create a new binding on the heap
heapAdd :: Name → Expr → ReduceM ()
 
--  get the next fresh variable.  Fresh variables are of the form $23, for random
-- values of 23
getFreshVar :: ReduceM Name
 
-- reduces an expression.
reduceM :: Expr → ReduceM Expr

Freshen and sub are important, but the real work of the program is done in reduceM; that’s where the reduction actually happens. So what does the code for it look like?

reduceM :: Expr → ReduceM Expr
reduceM e =
    case e of
        Lambda e' x → return (Lambda e' x)  
        Apply e' x  →  do Lambda y' e'' ← reduceM e'
                          reduceM (sub y' x e'')
        Var x →       do e' ← heapLookup x
                         heapRemove x
                         z ← reduceM e'
                         heapAdd x z
                         freshen z
        Let bs e' →  do mapM_ (uncurry heapAdd) bs
                        reduceM e'

Let’s go through each of these cases one by one.

First: Lambda. The only sensible thing to do here is to just return the λ-abstraction, because we’re not allowed to reduce inside of a λ-abstraction.

Second: Apply. In order to actually apply a variable to a subexpression, we need a function to apply it to. However, the subexpression might not be a λ-abstraction. Instead, it could be something that returns a λ-abstraction – a variable, a let, or even another apply! The only sensible thing to do here is to reduce our subexpression, and then apply the variable to it.

Third: Variable. In order to reduce a variable x, we must first know what it is bound to on the heap. So, we look up the expression e that it is bound to. The next step is less obvious: we reduce e with x’s binding removed from the heap, before returning a freshened version of the result, to avoid unwanted variable capture. Is this correct?

Obviously, if the expression isn’t recursive, we’re fine – it’s never going to look at its own binding. What happens in that case if x is recursive? Well, there are two possibilities – either the sub-expression depends on itself before it reduces, or it depends on itself after it reduces. If it depends on itself after it reduces, then we’re obviously fine – the binding will already be back on the heap. What about the other case? We’re actually still fine: If you depend on yourself before you reduce to a value, then you must be an infinite loop. In this case, we fail outright, because we’re trying to look up a binding which doesn’t exist. Failing outright and having an infinite loop are the same thing, semantically speaking, so we’re actually still fine.

Here’s some examples of recursion, both good and bad. Assume the appropriate functions have been defined for the third example. It also hasn’t been properly normalized, but ignore that as well. Can you figure out what these will do? The answers will be at the bottom of the post

let x = x in x 
let f = λx.f x in f 2
let len = λlist. if (empty? list) 0 (1 + (len (cdr list))) in len some-list

Fourth: Let. Since everything is distinctly named, we don’t need to worry about scope; names will never clash because each name can only be used once. Therefore, it’s obviously correct to just add the bindings to the heap.

Now that we’ve explained this, let’s look at the output of some runthroughs:

Reducing let: let u = 3 + 2, v = u + 1 in v + v : {}

How should we interpret this? Well, it’s fairly simple. In the beginning,we start to reduce the expression ‘let u = 3 + 2, v = u + 1 in v + v’, using an empty heap (denoted ‘{}’).

|Reducing primitive: v + v : {v → u + 1, u → 3 + 2}

We then add the bindings to the heap and take a look at the addition ‘v + v’.

||Reducing variable: v : {v → u + 1, u → 3 + 2}

In order to reduce ‘v + v’, you first need to evaluate v.

|||Reducing primitive: u + 1 : {u → 3 + 2}
||||Reducing variable: u : {u → 3 + 2}

To calculate v, we need to evaluate u.

|||||Reducing primitive: 3 + 2 : {}
||||||Returning constructor: 3 : {}
||||||Returning constructor: 2 : {}
|||||Primitive evaluated to 5
||||Rebinding var u to 5
||||Returning constructor: 1 : {u → 5}
|||Primitive evaluated to 6
||Rebinding var v to 6

So we evaluate u, then evaluate v with the result of u.

||Reducing variable: v : {v → 6, u → 5}

We’re re-grabbing v here for the second usage of v in v+v

|||Returning constructor: 6 : {u → 5}
||Rebinding var v to 6
|Primitive evaluated to 12
Ans: 12

Note that we access v two times in this, but we only access u a single time.

For another two test cases, let’s take a look at two examples from Launchbury’s paper:

First: let u = 2 + 3, f = \x.let v = u + 1 in v + x, a = 2, b = 3 in f a + f b

and

Second: let u = 2 + 3, f = let v = u + 1 in \x.v + x, a = 2, b = 3 in f a + f b

Note that the expressions are the same except for f’s definition; in the first, f is “\x.let v = u + 1 in v + x,” and in the second f is “let v = u + 1 in \x.v + x”. Obviously, these two are going to evaluate to the same thing. However, what’s going to be the difference in terms of how they reduce? Does one share sub-computations more effectively than the other to reduce the amount of computation needed? Let’s take a look at the log produced by running them to figure that out.

Reducing let: let u = 2 + 3, f = \x.let v = u + 1 in v + x, a = 2, b = 3 in f a + f b : {}
|Reducing primitive: f a + f b : {b → 3, a → 2, f → \x.let v = u + 1 in v + x, u → 2 + 3}
||Reducing apply: f a : {b → 3, a → 2, f → \x.let v = u + 1 in v + x, u → 2 + 3}
|||Reducing variable: f : {b → 3, a → 2, f → \x.let v = u + 1 in v + x, u → 2 + 3}
||||Returning lambda: \x.let v = u + 1 in v + x : {b → 3, a → 2, u → 2 + 3}
|||Rebinding var f to \x.let v = u + 1 in v + x
|||Reducing let: let $2 = u + 1 in $2 + a : {f → \x.let v = u + 1 in v + x, b → 3, a → 2, u → 2 + 3}
||||Reducing primitive: $2 + a : {$2 → u + 1, f → \x.let v = u + 1 in v + x, b → 3, a → 2, u → 2 + 3}
|||||Reducing variable: $2 : {$2 → u + 1, f → \x.let v = u + 1 in v + x, b → 3, a → 2, u → 2 + 3}
||||||Reducing primitive: u + 1 : {f → \x.let v = u + 1 in v + x, b → 3, a → 2, u → 2 + 3}
|||||||Reducing variable: u : {f → \x.let v = u + 1 in v + x, b → 3, a → 2, u → 2 + 3}
||||||||Reducing primitive: 2 + 3 : {f → \x.let v = u + 1 in v + x, b → 3, a - > 2}
|||||||||Returning constructor: 2 : {f → \x.let v = u + 1 in v + x, b → 3, a - > 2}
|||||||||Returning constructor: 3 : {f → \x.let v = u + 1 in v + x, b → 3, a - > 2}
||||||||Primitive evaluated to 5
|||||||Rebinding var u to 5
|||||||Returning constructor: 1 : {u → 5, f → \x.let v = u + 1 in v + x, b → 3, a → 2}
||||||Primitive evaluated to 6
|||||Rebinding var $2 to 6
|||||Reducing variable: a : {$2 → 6, u → 5, f → \x.let v = u + 1 in v + x, b → 3, a → 2}
||||||Returning constructor: 2 : {$2 → 6, u → 5, f → \x.let v = u + 1 in v + x, b → 3}
|||||Rebinding var a to 2
||||Primitive evaluated to 8
||Reducing apply: f b : {a → 2, $2 → 6, u → 5, f → \x.let v = u + 1 in v + x , b → 3}
|||Reducing variable: f : {a → 2, $2 → 6, u → 5, f → \x.let v = u + 1 in v + x, b → 3}
||||Returning lambda: \x.let v = u + 1 in v + x : {a → 2, $2 → 6, u → 5, b → 3}
|||Rebinding var f to \x.let v = u + 1 in v + x
|||Reducing let: let $4 = u + 1 in $4 + b : {f → \x.let v = u + 1 in v + x, a - > 2, $2 → 6, u → 5, b → 3}
||||Reducing primitive: $4 + b : {$4 → u + 1, f → \x.let v = u + 1 in v + x, a → 2, $2 → 6, u → 5, b → 3}
|||||Reducing variable: $4 : {$4 → u + 1, f → \x.let v = u + 1 in v + x, a → 2, $2 → 6, u → 5, b → 3}
||||||Reducing primitive: u + 1 : {f → \x.let v = u + 1 in v + x, a → 2, $2 → 6, u → 5, b → 3}
|||||||Reducing variable: u : {f → \x.let v = u + 1 in v + x, a → 2, $2 → 6, u → 5, b → 3}
||||||||Returning constructor: 5 : {f → \x.let v = u + 1 in v + x, a → 2, $2 → 6, b → 3}
|||||||Rebinding var u to 5
|||||||Returning constructor: 1 : {u → 5, f → \x.let v = u + 1 in v + x, a → 2, $2 → 6, b → 3}
||||||Primitive evaluated to 6
|||||Rebinding var $4 to 6
|||||Reducing variable: b : {$4 → 6, u → 5, f → \x.let v = u + 1 in v + x, a → 2, $2 → 6, b → 3}
||||||Returning constructor: 3 : {$4 → 6, u → 5, f → \x.let v = u + 1 in v + x, a → 2, $2 → 6}
|||||Rebinding var b to 3
||||Primitive evaluated to 9
|Primitive evaluated to 17
Ans: 17
 
Reducing let: let u = 2 + 3, f = let v = u + 1 in \x.v + x, a = 2, b = 3 in f a + f b : {}
|Reducing primitive: f a + f b : {b → 3, a → 2, f → let v = u + 1 in \x.v + x , u → 2 + 3}
||Reducing apply: f a : {b → 3, a → 2, f → let v = u + 1 in \x.v + x, u → 2 + 3}
|||Reducing variable: f : {b → 3, a → 2, f → let v = u + 1 in \x.v + x, u → 2 + 3}
||||Reducing let: let v = u + 1 in \x.v + x : {b → 3, a → 2, u → 2 + 3}
|||||Returning lambda: \x.v + x : {v → u + 1, b → 3, a → 2, u → 2 + 3}
|||Rebinding var f to \x.v + x
|||Reducing primitive: v + a : {f → \x.v + x, v → u + 1, b → 3, a → 2, u → 2 + 3}
||||Reducing variable: v : {f → \x.v + x, v → u + 1, b → 3, a → 2, u → 2 + 3}
|||||Reducing primitive: u + 1 : {f → \x.v + x, b → 3, a → 2, u → 2 + 3}
||||||Reducing variable: u : {f → \x.v + x, b → 3, a → 2, u → 2 + 3}
|||||||Reducing primitive: 2 + 3 : {f → \x.v + x, b → 3, a → 2}
||||||||Returning constructor: 2 : {f → \x.v + x, b → 3, a → 2}
||||||||Returning constructor: 3 : {f → \x.v + x, b → 3, a → 2}
|||||||Primitive evaluated to 5
||||||Rebinding var u to 5
||||||Returning constructor: 1 : {u → 5, f → \x.v + x, b → 3, a → 2}
|||||Primitive evaluated to 6
||||Rebinding var v to 6
||||Reducing variable: a : {v → 6, u → 5, f → \x.v + x, b → 3, a → 2}
|||||Returning constructor: 2 : {v → 6, u → 5, f → \x.v + x, b → 3}
||||Rebinding var a to 2
|||Primitive evaluated to 8
||Reducing apply: f b : {a → 2, v → 6, u → 5, f → \x.v + x, b → 3}
|||Reducing variable: f : {a → 2, v → 6, u → 5, f → \x.v + x, b → 3}
||||Returning lambda: \x.v + x : {a → 2, v → 6, u → 5, b → 3}
|||Rebinding var f to \x.v + x
|||Reducing primitive: v + b : {f → \x.v + x, a → 2, v → 6, u → 5, b → 3}
||||Reducing variable: v : {f → \x.v + x, a → 2, v → 6, u → 5, b → 3}
|||||Returning constructor: 6 : {f → \x.v + x, a → 2, u → 5, b → 3}
||||Rebinding var v to 6
||||Reducing variable: b : {v → 6, f → \x.v + x, a → 2, u → 5, b → 3}
|||||Returning constructor: 3 : {v → 6, f → \x.v + x, a → 2, u → 5}
||||Rebinding var b to 3
|||Primitive evaluated to 9
|Primitive evaluated to 17
Ans: 17

As you can immediately see, the first one has a larger heap than the second one does. The reason for that becomes clear when you look at it more closely: In the second one, when we reduce f we add the let bindings to the heap once, because it’s outside the enclosing lambda. In the first one, on the other hand, f doesn’t reduce at all, because the lambda is the outermost thing. On account of that, we end up adding “v = u + 1″ to the heap twice in the first case, which leads to performing that calculation twice. In the second expression, on the other hand, v is added to the heap the first time we call f, so both invocations share the same heap space for v, which means that u is only accessed a single time.

Based off of those logs, it is reasonable to expect that the first expression will run faster. So, lets test that. Here are those two functions, translated straightforwardly into Haskell. We’ll run them in Hugs, since Hugs will print the number of heap cells used and reductions made. Smaller numbers for both are better.

slowExprHaskell = let u = 3+2
                      f = \x → let v = u+1 in v + x
                  in f 2 + f 3
 
fastExprHaskell = let u = 3+2
                      f = let v = u+1 in \x → v + x
                  in f 2 + f 3- results from Hugs:
Main$ slowExprHaskell
17
(47 reductions, 86 cells)
Main$ fastExprHaskell
17
(43 reductions, 78 cells)

This is what we expected.

What key lessons can you take from this to use in your everyday Haskell coding? First, always prefer to create a let outside of a lambda rather than inside of a lambda, to take greater advantage of sharing. Second, lazy evaluation is not black magic, it doesn’t evaluate things in an order based off of the phase of the moon; it is deterministic and follows fairly simple semantics. While tracing impure lazy computations is very difficult, it is not too hard to get a reasonable idea of how a pure lazy function will be evaluated.

(The answers to the questions posed earlier are that the first expression halts because it uses itself before it has been rebound on the heap, the second expression is an infinite loop, and the third one is a normal recursive function that will reduce to the correct answer.)

Comments (10)

AnonymousJune 20th, 2011 at 3:39 pm

Lazy evaluation is understandable and predictable, but only when considering the program as a whole. While that might work fine for your toy example, doing it for something non-trivial is hard and doing it for a complete program is insane.

“The next Haskell will be strict” Simon Peyton Jones

pipocaJune 20th, 2011 at 4:19 pm

The performance characteristics of a sub-expression will be the same (modulo variables already having been evaluated on the heap) if you evaluate it in the context of the larger expression or if you just evaluate the sub-expression itself.

Programs are built out of small blocks, and if you understand the behavior of the blocks themselves, you can predict some of the characteristics of the structures you make out of them. Sure, you’ll get some interesting emergent behavior out of them in the larger structure, but knowing the basic behavior of the blocks will get you reasonably far.

ceiiJune 20th, 2011 at 5:53 pm

Anonymous: I’m curious about the origin of that quote. As far as I’m aware, SPJ has never done something so rash as to take a definite position on this question.

The only consensus about strict by default vs. lazy by default seems to be that you lose important and desirable properties no matter which side of the fence your language stands on.

EricJune 20th, 2011 at 7:41 pm

@Anonymous.

“The next Haskell will be strict” Simon Peyton Jones

Do you have the reference for that quote. Last time I heard SPJ in a presentation, he said that he’d certainly like sometimes Haskell to be strict, but he’s pretty sure that, by then, he would regret that and would like to come back again to “by-need”, so he will never be able to definitely set-up his mind on this question.

Eric.

C. McCannJune 20th, 2011 at 10:21 pm

That quote is from “Wearing the hair shirt: a retrospective on Haskell”. See here: http://research.microsoft.com/en-us/um/people/simonpj/papers/haskell-retrospective/

I doubt he meant it in a completely literal sense, since it came after a section talking about the benefits vs. drawbacks to laziness without clearly favoring either, but it’s hard to tell what a single line from a powerpoint slide means without the context of the full presentation.

vJune 21st, 2011 at 12:59 am

http://www.cs.nott.ac.uk/~gmh/appsem-slides/peytonjones.ppt

There’s the quote.

Alessandro StamattoJune 21st, 2011 at 11:39 am

In the presentation I remember him saying something along the lines “laziness is not essential, and all the exciting features offered by it can be done with strictness then the next will be strict haskell”

Alessandro StamattoJune 21st, 2011 at 11:42 am

In the presentation I remember him saying something along the lines “laziness is not essential, and all the exciting features offered by it can be done with strictness, as such the next haskell will be strict ”

*Corrected a typo

Alexander BatischevJuly 28th, 2011 at 12:20 pm

Based off of those logs, it is reasonable to expect that the first expression will run faster.

You probably meant “the second expression”.

Thank you for a great article!

Mike BFebruary 24th, 2012 at 11:55 am

Note that the “next Haskell will be strict” comment was made 9 years ago, at a talk in 2003 and Haskell is now 50% older, with no strict variation/derivative/counterpoint to Haskell in sight (or is there)?

https://research.microsoft.com/en-us/um/people/simonpj/papers/haskell-retrospective/

Leave a comment

Your comment