An introduction to fuzzing OCaml with AFL, Crowbar and Bun

by Nathan Rebours on Sep 4th, 2019

American Fuzzy Lop or AFL is a fuzzer: a program that tries to find bugs in other programs by sending them various auto-generated inputs. This article covers the basics of AFL and shows an example of fuzzing a parser written in OCaml. It also introduces two extensions: the Crowbar library which can be used to fuzz any kind of OCaml program or function and the Bun tool for integrating fuzzing into your CI.

All of the examples given in this article are available on GitHub at ocaml-afl-examples. The README contains all the information you need to understand, build and fuzz them yourself.

What is AFL?

AFL actually isn't just a fuzzer but a set of tools. What makes it so good is that it doesn't just blindly send random input to your program hoping for it to crash; it inspects the execution paths of the program and uses that information to figure out which mutations to apply to the previous inputs to trigger new execution paths. This approach allows for much more efficient and reliable fuzzing (as it will try to maximize coverage) but requires the binaries to be instrumented so the execution can be monitored.

AFL provides wrappers for the common C compilers that you can use to produce the instrumented binaries along with the CLI fuzzing client: afl-fuzz.

afl-fuzz is straight-forward to use. It takes an input directory containing a few initial valid inputs to your program, an output directory and the instrumented binary. It will then repeatedly mutate the inputs and feed them to the program, registering the ones that lead to crashes or hangs in the output directory.

Because it works in such a way, it makes it very easy to fuzz a parser.

To fuzz a parse.exe binary, that takes a file as its first command-line argument and parses it, you can invoke afl-fuzz in the following way:

$ afl-fuzz -i inputs/ -o findings/ /path/to/parse.exe @@

The findings/ directory is where afl-fuzz will write the crashes it finds, it will create it for you if it doesn't exist. The inputs/ directory contains one or more valid input files for your program. By valid we mean "that don't crash your program". Finally the @@ part tells afl-fuzz where on the command line the input file should be passed to your program, in our case, as the first argument.

Note that it is possible to supply afl-fuzz with more detail about how to invoke your program. If you need to pass it command-line options for instance, you can run it as:

$ afl-fuzz -i inputs/ -o findings/ -- /path/to/parse.exe --option=value @@

If you wish to fuzz a program that takes its input from standard input, you can also do that by removing the @@ from the afl-fuzz invocation.

Once afl-fuzz starts, it will draw a fancy looking table on the standard output to keep you updated about its progress. From there, you'll mostly be interested in is the top right corner which contains the number of crashes and hangs it has found so far:

Example output from afl-fuzz

You might need to change some of your CPU settings to achieve better performance while fuzzing. afl-fuzz's output will tell you if that's the case and guide you through the steps required to make that happen.

Using AFL to fuzz an OCaml parser

First of all, if you want to fuzz an OCaml program with AFL you'll need to produce an instrumented binary. afl-fuzz has an option to work with regular binaries but you'd lose a lot of what makes it efficient. To instrument your binary you can simply install a +afl opam switch and build your executable from there. AFL compiler variants are available from OCaml 4.05.0 onwards. To install such a switch you can run:

$ opam switch create fuzzing-switch 4.07.1+afl

If your program already parses the standard input or a file given to it via the command line, you can simply build the executable from your +afl switch and adapt the above examples. If it doesn't, it's still easy to fuzz any parsing function.

Imagine we have a simple-parser library which exposes the following parse_int function:

val parse_int: string -> (int, [> `Msg of string]) result
(** Parse the given string as an int or return [Error (`Msg _)].
    Does not raise, usually... *)

We want to use AFL to make sure our function is robust and won't crash when receiving unexpected inputs. As you can see the function returns a result and isn't supposed to raise exceptions. We want to make sure that's true.

To find crashes, AFL traps the signals sent by your program. That means that it will consider uncaught OCaml exceptions as crashes. That's good because it makes it really simple to write a executable that fits what afl-fuzz expects:

let () =
  let file = Sys.argv.(1) in
  let ic = open_in file in
  let length = in_channel_length ic in
  let content = really_input_string ic length in
  close_in ic;
  ignore (Simple_parser.parse_int content)

We have to provide example inputs to AFL so we can write a valid file to the inputs/ directory containing 123 and an invalid file containing not an int. Both should parse without crashing and make good starting point for AFL as they should trigger different execution paths.

Because we want to make sure AFL does find crashes we can try to hide a bug in our function:

let parse_int s =
  match List.init (String.length s) (String.get s) with
  | ['a'; 'b'; 'c'] -> failwith "secret crash"
  | _ -> (
      match int_of_string_opt s with
      | None -> Error (`Msg (Printf.sprintf "Not an int: %S" s))
      | Some i -> Ok i)

Now we just have to build our native binary from the right switch and let afl-fuzz do the rest:

$ afl-fuzz -i inputs/ -o findings/ ./fuzz_me.exe @@

It should find that the abc input leads to a crash rather quickly. Once it does, you'll see it in the top right corner of its output as shown in the picture from the previous section.

At this point you can interrupt afl-fuzz and have a look at the content of the findings/crashes:

$ ls findings/crashes/
id:000000,sig:06,src:000111,op:havoc,rep:16  README.txt

As you can see it contains a README.txt which will give you some details about the afl-fuzz invocation used to find the crashes and how to reproduce them in the folder and a file of the form id:...,sig:...,src:...,op:...,rep:... per crash it found. Here there's just one:

$ cat findings/crashes/id:000000,sig:06,src:000111,op:havoc,rep:16

As expected it contains our special input that triggers our secret crash. We can rerun the program with that input ourselves to make sure it does trigger it:

$ ./fuzz_me.exe findings/crashes/id:000000,sig:06,src:000111,op:havoc,rep:16
Fatal error: exception Failure("secret crash")

No surprise here, it does trigger our uncaught exception and crashes shamefully.

Using Crowbar and AFL for property-based testing

This works well but only being able to fuzz parsers is quite a limitation. That's where Crowbar comes into play.

Crowbar is a property-based testing framework. It's much like Haskell's QuickCheck. To test a given function, you define how its arguments are shaped, a set of properties the result should satisfy and it will make sure they hold with any combinations of randomly generated arguments. Let's clarify that with an example.

I wrote a library called Awesome_list and I want to test its sort function:

val sort: int list -> int list
(** Sorts the given list of integers. Result list is sorted in increasing
    order, most of the time... *)

I want to make sure it really works so I'm going to use Crowbar to generate a whole lot of lists of integers and verify that when I sort them with Awesome_list.sort the result is, well... sorted.

We'll write our tests in a file. First we need to tell Crowbar how to generate arguments for our function. It exposes some combinators to help you do that:

let int_list = Crowbar.(list (range 10))

Here we're telling Crowbar to generate lists of any size, containing integers ranging from 0 to 10. Crowbar also exposes more complex and custom generator combinators so don't worry, you can use it to build more complex arguments.

Now we need to define our property. Once again it's pretty simple, we just want the output to be sorted:

let is_sorted l =
  let rec is_sorted = function
    | [] | [_] -> true
    | hd::(hd'::_ as tl) -> hd <= hd' && is_sorted tl
  Crowbar.check (is_sorted l)

All that's left to do now is to register our test:

let () =
  Crowbar.add_test ~name:"Awesome_list.sort" [int_list]
      (fun l -> is_sorted (Awesome_list.sort l))

and to compile that file to a binary. Crowbar will take care of the magic.

We can run that binary in "Quickcheck" mode where it will either try a certain amount of random inputs or keep trying until one of the properties breaks depending on the command-line options we pass it. What we're interested in here is its less common "AFL" mode. Crowbar made it so our executable can be used with afl-fuzz just like that:

$ afl-fuzz -i inputs -o findings -- ./fuzz_me.exe @@

What will happen then is that our fuzz_me.exe binary will read the inputs provided by afl-fuzz and use it to determine which test to run and how to generate the arguments to pass to our function. If the properties are satisfied, the binary will exit normally; if they aren't, it will make sure that afl-fuzz interprets that as a crash by raising an exception.

A nice side-effect of Crowbar's approach is that afl-fuzz will still be able to pick up crashes. For instance, if we implement Awesome_list.sort as:

let sort = function
  | [1; 2; 3] -> failwith "secret crash"
  | [4; 5; 6] -> [6; 5; 4]
  | l -> List.sort l

and use AFL and Crowbar to fuzz-test our function, it will find two crashes: one for the input [1; 2; 3] which triggers a crash and one for [4; 5; 6] for which the is_sorted property won't hold.

The content of the input files found by afl-fuzz itself won't be of much help as it needs to be interpreted by Crowbar to build the arguments that were passed to the function to trigger the bug. We can invoke the fuzz_me.exe binary ourselves on one of the files in findings/crashes and the Crowbar binary will replay the test and give us some more helpful information about what exactly is going on:

$ ./fuzz_me.exe findings/crashes/id\:000000\,sig\:06\,src\:000011\,op\:flip1\,pos\:5 
Awesome_list.sort: ....
Awesome_list.sort: FAIL

When given the input:

    [1; 2; 3]
the test threw an exception:

    Failure("secret crash")
    Raised at file "", line 33, characters 17-33
    Called from file "awesome-list/fuzz/", line 11, characters 78-99
    Called from file "src/", line 264, characters 16-19
Fatal error: exception Crowbar.TestFailure
$ ./fuzz_me.exe findings/crashes/id\:000001\,sig\:06\,src\:000027\,op\:arith16\,pos\:5\,val\:+7 
Awesome_list.sort: ....
Awesome_list.sort: FAIL

When given the input:

    [4; 5; 6]
the test failed:

    check false
Fatal error: exception Crowbar.TestFailure

We can see the actual inputs as well as distinguish the one that broke the invariant from the one that triggered a crash.

Using bun to run fuzz testing in CI

While AFL and Crowbar provide no guarantees they can give you confidence that your implementation is not broken. Now that you know how to use them, a natural follow-up is to want to run fuzz tests in your CI to enforce that level of confidence.

Problem is, AFL isn't very CI friendly. First it has this refreshing output that isn't going to look great on your travis builds output and it doesn't tell you much besides that it could or couldn't find crashes or invariant infrigements

Hopefully, like most problems, this one has a solution: bun. bun is a CLI wrapper around afl-fuzz, written in OCaml, that helps you get the best out of AFL effortlessly. It mostly does two things:

The first is that it will run several afl-fuzz processes in parallel (one per core by default). afl-fuzz starts with a bunch of deterministic steps. In my experience, using parallel processes during this phase rarely proved very useful as they tend to find the same bugs or slight variations of those bugs. It only achieves its full potential in the second phase of fuzzing.

The second thing, which is the one we're the most interested in, is that bun provides a useful and CI-friendly summary of what's going on with all the fuzzing processes so far. When one of them finds a crash, it will stop all processes and pretty-print all of the bug-triggering inputs to help you reproduce and debug them locally. See an example bun output after a crash was found:

Crashes found! Take a look; copy/paste to save for reproduction:
1432	echo JXJpaWl0IA== | base64 -d > crash_0.$(date -u +%s)
1433	echo NXJhkV8QAA== | base64 -d > crash_1.$(date -u +%s)
1434	echo J3Jh//9qdGFiYmkg | base64 -d > crash_2.$(date -u +%s)
1435	09:35.32:[ERROR]All fuzzers finished, but some crashes were found!

Using bun is very similar to using afl-fuzz. Going back to our first parser example, we can fuzz it with bun like this:

$ bun --input inputs/ --output findings/ /path/to/parse.exe

You'll note that you don't need to provide the @@ anymore. bun assumes that it should pass the input as the first argument of your to-be-fuzzed binary.

bun also comes with an alternative no-kill mode which lets all the fuzzers run indefinitely instead of terminating them whenever a crash is discovered. It will regularly keep you updated on the number of crashes discovered so far and when terminated will pretty-print each of them just like it does in regular mode.

This mode can be convenient if you suspect your implementation may contain a lot of bugs and you don't want to go through the whole process of fuzz testing it to only find a single bug.

You can use it in CI by running bun --no-kill via timeout. For instance:

timeout --preserve-status 60m bun --no-kill --input inputs --output findings ./fuzz_me.exe

will fuzz fuzz_me.exe for an hour no matter what happens. When timeout terminates bun, it will provide you with a handful of bugs to fix!

Final words

I really want to encourage you to use those tools and fuzzing in general. Crowbar and bun are fairly new so you will probably encounter bugs or find that it lacks a feature you want but combined with AFL they make for very nice tools to effectively test critical components of your OCaml code base or infrastructure and detect newly-introduced bugs. They are already used accross the MirageOS ecosystem where it has been used to fuzz the TCP/IP stack mirage-tcpip and the DHCP implementation charrua thanks to somerandompacket. You can consult Crowbar's hall of fame to find out about bugs uncovered by this approach.

I also encourage anyone interested to join us in using this promising toolchain, report those bugs, contribute those extra features and help the community build more robust software.

Finally if you wish to learn more about how to efficienly use fuzzing for testing I recommend the excellent Write Fuzzable Code article by John Regehr.