Applied Functional Programming: Part 1

This post begins a two part series designed to present functional programming concepts as they apply to real world programming. Using functional techniques, we’ll solve a problem that came up in a commercial project I’m working on.

I’m a seasoned freelance software developer. There’s no PhD after my name and I’ve written heaps of C++ and python code for commercial projects. For me, concepts need to be both practical and time-saving before I integrate them into my development work. On the other hand, I enjoy thinking about and studying concepts that are thought provoking and clever, like first class patterns in pure pattern calculus for example.

My focus, for these posts, is where these two categories merge. Maybe we can call this fuzzy area applied theory. All too often, applied theory isn’t taught or written about and doesn’t reach Joe the programmer who can actually use it on his project.

After reading interesting research papers, I often ask myself one or both of two questions: “What problems, that I’ve encountered or might encounter, would I want to solve with this?”, and “What are the details required to put this to use?”. Working out the answers to these questions is a real joy for me. I don’t always figure out how to make clever ideas useful or practical, but when I do, and subsequently use it in a commercial project, I get a lot of satisfaction.

Lets see what techniques we can apply to a giant, ugly heap of poo code that I’ve been working with. You’ll recognize it as the code base that combines an intense security requirement with occasional embarrassing vulnerabilities. OpenSSL is the name.

Now, I want to make it clear that I’m not dissing the OpenSSL developers (except maybe Eric Young for including a self-advertising clause in the license that still sticks to this day). I’ve written my share of what I call “superman code”.

How does superman code arise? You get a call from a frantic customer: “Hey, we need to write X software and we need it in 2 days”. You put on your cape and fly out there immediately. With plenty of caffeine and snacks, you hack together something with super-human speed that solves their imminent problem. You fly back home tired and they throw a party while crediting you (or themselves occasionally) for being a genius. 6 months later you revisit the superman code and scratch your head wondering just what the heck did the undocumented X509_REVOKED_add1_ext_i2d function do again? You end up rewriting half of it.

Our example is an OpenSSL function called SSL_CTX_set_verify. Give that manpage a quick read to get a idea what that function does. My task was to make a preexisting OpenSSL client verify that the server’s certificate was valid upon connection. The code below shows the lines of code I added:

asio::ssl::context context( io_service
                          , asio::ssl::context::tlsv1_client
                          );
//2 New Lines
context.set_verify_mode(asio::ssl::context::verify_peer);
context.load_verify_file("ca.pem");
//...

Note that I’m using the boost.asio library which includes a thin wrapper on OpenSSL. Unfortunately, my small change didn’t work when I ran it. Can you spot the bug?

I hope you didn’t spend as much time as I did looking for a bug in either the boost.asio or OpenSSL code bases. My bug was in the “//…” part. Lets expand a little bit:

asio::ssl::context context( io_service
                          , asio::ssl::context::tlsv1_client
                          );
//2 New Lines
context.set_verify_mode(asio::ssl::context::verify_peer);
context.load_verify_file("ca.pem");
//...
context.set_verify_mode(asio::ssl::context::verify_none);
//...

Oops! I changed the verify mode twice. What a bummer! This is where a functional programming principle, called purity, comes into play.

“set_verify_mode” is really a misnomer because what we are actually doing is changing the verify mode “state” to something else. The type of that function, in Haskell notation, would be something like:

set_verify_mode :: Context -> Int -> Context

We can do better. What we really want to do is define a context with a specified verify mode [declarative style], not create an arbitrary default context and change it [imperative style].

Consider the following code:

data ContextParams = CP { verifyMode :: Int
                        -- ...
                        }
context :: ContextParams -> Context

What we’re doing is moving all the imperative set_ functions (that really modify a context) into the “constructor” of Context. This being the case, we’re disallowing, at compile time, the changing of the context verify mode. The bug I introduced would never have made it to run time if Context was designed in this way.

Moving on, lets see if we can stop passing Ints around. I want to disallow the following type of code from compiling:

let c = context $ CP { verifyMode = 3 + 22
                     , --...
                     }

Setting a verifyMode to a sum of numbers really doesn’t make sense, does it? The essence of the verify mode is not an Integer, but something else. Looking at the manual page, we see that there are different sets of verify modes that make sense depending whether or not the context is a client or server. That indicates to me two distinct data types.

data ClientVerifyMode = -- ...
data ServerVerifyMode = -- ...

Lets set up a simple client mode definition:

data ClientVerifyMode = ClientVerifyPeer
                      | ClientVerifyNone

But going further, we know that if we’re verifying the peer we’ll be needing certificate authority keys to verify with (this is set with the SSL_CTX_use_certificate_file function). Also required would be the maximum certificate depth (SSL_CTX_set_verify_depth). On the other hand the SSL_CTX_set_verify function allows for a function to be passed to override the other parameters. Further “out there”, one could bypass all this stuff by writing his own verification function with SSL_CTX_set_cert_verify_callback.

Confusing? I found it confusing anyway. How can we simplify this? What is the essence of what we are trying to do here? The answer can be found by acting like a buddhist monk. Go take an unstructured walk. Exhale your desires and aversions. Inhale nature. Feel the oneness of all things. Notice the feeling of being touched by your clothing or the taste in your mouth. This is living. This is the field of inspiration. This is where the answer lies if it falls upon the path of your destiny.

Welcome back. What our function and its relatives boils down to is verification of the peer’s credentials, which is a certificate chain. Either the credentials are acceptable or they are not by whatever means you check them. So, the essence we’ve been looking for is:

data ClientPeerVerifier = -- Credentials -> Bool

That little comment is there to document our intended semantics or meaning. We derived this while doing hippie things. I plan on talking more about meanings and the subject of denotational semantics in a later post. Now we can give appropriate names to the specializations available in OpenSSL

data ClientPeerVerifier = -- Credentials -> Bool
                        | F (Credentials->Bool)

The F constructor, standing for function, represents the most general SSL_CTX_set_cert_verify_callback style verifier. Now some more

data ClientPeerVerifier = -- Credentials -> Bool
                        | F (Credentials->Bool)
                        | ConstTrue
                        | SimpleVerify TrustedCertificates Int

ConstTrue will represent the SSL_VERIFY_NONE parameter. Notice how our semantics imply a nice name for this (”const True” is a function that returns True for every input). SimpleVerify merges SSL_CTX_use_certificate_file, SSL_CTX_set_verify_depth, and SSL_VERIFY_PEER.

There is something missing still, what about that “further verification process” controlled by the verify_callback parameter of the SSL_CTX_set_verify function? If we tried to implement this in our semantics, and call it z, we’d have:

z :: TrustedCertificates -> Int -> (Credentials -> Bool -> Bool) -> (Credentials -> Bool)
z ts depth g c = g (simpleVerify ts depth c) c

The first two parameters are the same as SimpleVerify above. The third parameter is a function taking in credentials and the result of the simple verify procedure.

Lacking a good name for Z, lets call it verifyWithSimpleVerifyResult. SimpleVerify is really a specialization of this, but we’ll leave it in case there is some optimization for a NULL verify_func argument to SSL_CTX_set_verify.

data ClientPeerVerifier = -- Credentials -> Bool
                        | F (Credentials->Bool)
                        | ConstTrue
                        | SimpleVerify TrustedCertificates Int
                        | VerifyWithSimpleVerifyResult TrustedCertificates Int (Credentials -> Bool -> Bool)

Now we have all our cases covered. The only thing I might add to the code would be a comment explaining the meaning of VerifyWithSimpleVerifyResult, for example the definition of z above.

We could, if we wanted, write functions that have a user description (as opposed to semantic name) to help a user document his own code:

alwaysAccept :: ClientPeerVerifier
alwaysAccept = ConstTrue

For the ServerPeerVerifier, we’ll need a different meaning, or essence, as a client doesn’t always supply a certificate.

data ServerPeerVerifier = -- Maybe Credentials -> Bool

Concluding Thoughts

Using semantics, in many ways, obviates the need for a large portion of documentation. For example, knowing the semantics of ClientPeerVerifier allows me to understand completely ConstTrue by only reading its name. Compare this approach to the wordy, and incomplete, manual pages of OpenSSL. Thinking in semantics has a whole lot of other advantages as well which I’ll cover in another post.

Judiciously using purity and specialized data types (like we did when removing the use of Int for verifyMode) assists a developer in writing bug-free programs by disallowing a slew of bugs to get past the compilation stage.

For the actual project, I would likely continue using C++ but make the data types match what they would be if I were to transfer the Haskell code above. It turns out this isn’t as hard as one might think. The real use of Haskell on this problem was to construct a framework to understand the code that was there and design a new meaningful wrapper.

Excercises

Feel free to post your answers as comments.

(*) Figure out what to call the SSL_VERIFY_FAIL_IF_NO_PEER_CERT style ServerPeerVerifier.

(*) How would the semantics (meanings) need to change if we were handling the SSL_VERIFY_CLIENT_ONCE parameter?

(**) The SSL_CTX_set_cert_verify_callback function actually expects the callback to modify one of its arguments. If we wanted to include this functionality, how would we change the return value our ClientPeerVerifier semantics?

(**) Using the meaning style presented above or another functional style, write an OpenSSL wrapper in Haskell and submit it to Hackage. (Thanks!)

Comments (4)

Maciej PiechotkaSeptember 14th, 2009 at 6:16 pm

I guess it is better: F (Credentials-> IO Bool) As it may involve some action (checking files, asking user etc.)

Linktipps #11 :: BlackflashSeptember 15th, 2009 at 2:44 am

<

p>[...] State-Monade mit einigen hilfreichen

Quentin MoserSeptember 15th, 2009 at 12:51 pm

The meaning of the VerifyWithSimpleVerifResult case in the semantic domain is actually a monomorphic restriction of the monadic bind for (ReaderT Credentials IO):

combine :: (Credentials -> IO Bool) -> (Bool -> Credentials -> IO Bool) -> Credentials -> IO Bool

or equivalently

combine :: ReaderT Credentials IO Bool -> (Bool -> ReaderT Credentials IO Bool) -> ReaderT Credentials IO Bool

, only limiting its first argument to the functions that can be expressed with SimpleVerify.

The interface could be made both simpler and more powerful by finding a way to lift that limitation and offer fully compositional semantics, either as a full blown Monad instance or a simpler monomorphic combinator as above. I don’t know if OpenSSL’s API is rich enough to allow this, but I suspect it isn’t, which would be a striking example of bad API design.

David SankelSeptember 15th, 2009 at 1:59 pm

@Maciej: Thanks for pointing that out. While our semantics can use lazy io for file access and some basic forms of user interaction, it feels as though we’re missing something. The approach you pointed out, using the IO monad aka. sin bin, gives us the opportunity to use imperative interaction techniques.

While that will certainly do the trick, we’ve lost the functional nature of our library. An alternative might be found in FRP which seems like a good match for a library like OpenSSL.

@Quentin: Cool observation.

If we use the ‘F’ case of ClientPeerVerifier and ignore the other built in options, we can use the semantics as our implementation and get the composability you mentioned. Of course, we’ll need to rewrite or heavily modify much of OpenSSL.

One scary point is that other TLS libraries, like GNU TLS include a compatibility layer for previously OpenSSL applications, keeping that API alive.

Hopefully as semantics becomes more of a common term among software developers we can expect better APIs for the future.

Leave a comment

Your comment