Analysis of lazy-stream programs.

What’s the difference between the following two functions, a and b:

import Control.Monad.Instances
import Control.Arrow
 
a, b :: (Int->Int) -> (Int,Int) -> (Int,Int)
a = fmap
b = second

Can you figure it out? If not, here’s a hint

f ::  ((Int -> Int) -> Int -> (Int, Int)) -> Int -> Int
f sec a = case sec (+1) a of (x,y) -> 6

The two functions, a and b, are identical for every single input, except one. Did you figure it out?

*Main> f a undefined
*** Exception: Prelude.undefined
*Main> f b undefined
6

Undefined, also called bottom or \bot, is a subtle value in lazy languages like haskell and for certain types of programs, this can make a big difference. The functions a and b apply to \bot as follows:

a f \bot \Rightarrow \bot
b f \bot \Rightarrow (\bot,\bot)

We say a function f is strict if f \bot \Rightarrow \bot and is non-strict otherwise. Looking at the definition of fmap for tuples, we can see why it is strict in its second argument:

instance Functor ((,) a) where
	fmap f (x,y) = (x, f y)

Before evaluation even gets to the right hand side of the equal sign, it will return \perp because it cannot pattern match (x,y) on undefined. We can make this non-strict in a couple ways, the first uses functions as an alternative to pattern matching:

instance Functor ((,) a) where
	fmap f t = (fst t, f (snd t))

The second uses the special tilde (~) symbol to delay the pattern matching until the result is actually used.

instance Functor ((,) a) where
	fmap f ~(x,y) = (x, f y)

And we see, the non-strict second function, is implemented as follows, using the tilde approach.

instance Arrow (->) where
-- ...
	first f = f *** id
	second f = id *** f
	(f *** g) ~(x,y) = (f x, g y)

Why non-strict?

At first glance, it might seem silly to implement functions non-strictly. After all, the earlier these functions fail out with bottom, the more efficient the program is in these cases. However, one particular point of interest is lazy streams.

Recall that the getContents :: IO [Char] function returns a lazy list of input characters. Lets create a couple functions that motivate extra laziness.

import IO
 
withNext :: [Char] -> [(Char,Char)]
withNext (a:a':as) = (a,a'):withNext (a':as)
 
main = do
  hSetBuffering stdout NoBuffering
  hSetBuffering stdin NoBuffering
  fmap withNext getContents >>= putStr . map fst

When I run this program (sorry windows users, this won’t work for you), and type “Hello World!”, here is what I get.

runhaskell test HeHlellol oW oWrolrdl!d

The underlined letters are the ones I typed. How did this happen? Note the following about withNext:

\mbox{\bf withNext} (x:\bot) \Rightarrow \perp

It is strict in the second half of its argument. Lets modify our function to be more lazy in this respect.

withNext' :: [Char] -> [(Char,Char)]
withNext' (a:as) = (a, head as):withNext as
-- Or, the tilde version
-- withNext' (a:(~(a':as))) = (a,a'):withNext (a':as)

As was the case with withNext, withNext' is strict in its argument.

\mbox{\bf withNext' } \bot \Rightarrow \bot

Using this, we can see that withNext' is not strict in the second half of its argument.

\mbox{\bf withNext' } (x:\bot) \Rightarrow (x,\bot):\bot

When we run our more lazy version of this program, we get what we’d expect:

runhaskell test HHeelllloo WWoorrlldd!!

Connecting Strictness to Streams

As can be seen above, strictness and streams are tightly coupled. Here we present a theoretical evaluation (\leadsto) of a function at a given point in time. Here we introduce a new symbol we’ll call partial, \partial, that represents a value that we have no information about at the given moment. \hat{x} represents a symbol where everything is known and all other variables could mean either. So, for our withNext' function above, we may write the following equations:

\mbox{\bf withNext' } \partial \leadsto \partial
\mbox{\bf withNext' } (x:\partial) \leadsto (x,\partial):\partial

Note that our \leadsto equations are the same as our strictness equations. Thus we may use these functions to prove invariants on our stream-based code.

Example from Reactive

Take the following mappend code from reactive:

instance Ord t => Monoid (FutureG t a) where
  -- ...
  -- Pick the earlier future.
  Future (s,a) `mappend` Future (t,b) =
    Future (s `min` t, if s <= t then a else b)

Using strictness analysis, we can come up with the following \leadsto equations.

\mbox{\bf mappend } (\mbox{\bf Future } \partial) (\mbox{\bf Future } x) \leadsto \partial
\mbox{\bf mappend } (\mbox{\bf Future }x) (\mbox{\bf Future }\partial) \leadsto \partial

This tells us that if we do not know anything about the tuples in the Future arguments to mappend, then we cannot know anything about the result. So a developer might want to maintain an invariant that \mbox{\bf Future } x implies that x\leadsto (y,z).

Continuing on…

\mbox{\bf mappend } (\mbox{\bf Future } (\partial, \partial)) (\mbox{\bf Future } (\hat{x},\hat{y})) \leadsto
\mbox{\bf Future }(\mbox{\bf min }\partial\;\hat{x}, \mbox{\bf if } \partial <= \hat{x}\; \mbox{\bf then } \partial \;\mbox{\bf else } \hat{y} )

If we add some more invariants, we can go even further. Lets say that if we have the invariant that \mbox{\bf Future t} implies that min (\partial :: t) (x :: t) \leadsto x and (<=) (\partial :: t) (x :: t) \leadsto False. Then we get a Future that allows us to mappend now with a past future and a future future.

\mbox{\bf mappend } (\mbox{\bf Future } (\partial, \partial)) (\mbox{\bf Future } (\hat{x},\hat{y})) \leadsto\mbox{\bf Future }(\hat{x}, \hat{y} )

Verifying the the code is sufficiently lazy is now just a matter of seeing that these invariants are met.

Comments (5)

noneNovember 9th, 2008 at 12:04 am

But, there is a spreading sentiment that lazy i/o is a gross hack, relying at its root on unsafeInterleaveIO. Oleg has a big rant about this in his recent Iteratee paper.

adminNovember 10th, 2008 at 10:02 am

I’ve found that working with lazy IO can be difficult due to the subtlety of laziness issues like the one mentioned in the above post. Fortunately, projects like Reactive are doing much to remove the difficulty and enable programmers to develop highly interactive applications in a DSEL that actually models the problem domain directly instead exposing the machine’s low-level imperative operations.

Peter VerswyvelenNovember 10th, 2008 at 1:21 pm

Hi David,

Yes this lazy pattern matching stuff is really tricky.

When I first read about it in the Haskell School of Expression, I said wtf… here I am trying to escape from the imperative and object oriented world, and now I have to deal with this! :) But it did make a lot of sense - to match against some thing one must observe that thing.

Unfortunately this took away a bit of fun when programming Haskell; writing pattern matches on anything that has a “spine” felt like carefully walking through a minefield… albeit a pure minefield ;-)

The same tricky lazy pattern matching is also described in detail in the Programming with Arrows paper from John Hughes, at http://www.cs.chalmers.se/~rjmh/afp-arrows.pdf

Keep up the great blog!

[...] instance of pairing, we can also use fmap in place second. (The two, however, are not quite the same.) An advantage of names like result, element and second is that they’re more specifically [...]

[...] Netsuperbrain.com has a very good article on lazy pattern matching. [...]

Leave a comment

Your comment