Fuzzing and Proofs

Hillel Wayne has been, with the help of the larger programming community, exploring formal proofs of code, and his recent post loosely coincided with the Go team’s release of fuzzing support. It got me thinking about the direction Go is going.

As Hillel suggests, most developers write tests rather than proving their code, because the former is orders of magnitude easier than the latter. Well, let’s be honest: most developers don’t even write unit tests, but of those that try to make sure their code is correct mostly just write tests.

A quick diversion to explain some things for the non-CIS folks:

Code is instructions telling a computer to calculate something
Functional tests generally test that a system behaves how you would expect. It’s necessary, but very coarse grained, like making sure your house has a roof.
Unit tests run specific functions (that, together with a bunch of other functions, make up the system) and check that – given some input – they produce expected output. This is akin to turning all the light switches on and off in your house to make sure they work.
Formal proofs use (usually) inductive logic to prove that functions will work the way they are expected to. They don’t actually run the functions; they prove the functions are correct. There’s no real analogy to real world objects, because proofs exist the same way we can prove qualities of addition across integers.

If you didn’t take a science track in college, the fourth item may be a bit abstract. As children, we’re often taught addition through the third point; at some point, someone gave you an apple, asked you how many apples you had, and then asked how many apples you’d have if they gave you one more. You didn’t know, so they gave you a second apple and had you count them, and you had two, and you (eventually) learned that 1 + 1 = 2. You did this for a while until it got boring, and you learned to extrapolate: if someone asked how many apples you’d have if you had 1,923 apples and they gave you one more, you knew the answer. But you never actually had 1,923 apples and were given one more, and you never counted out 1,924 apples.

There are a number of proofs around things that we learned through rote. We know that 1,923 + 1 = 1,924, not because someone counted something out, but because there are definitions and proofs that can be checked and verified. Proofs are important because you usually can’t test every possible combination of inputs and outputs.

Very often, in complex systems, functions have dozens and even hundreds of lines of code, and 10 or more input variables. The number of different code paths – the complexity – of algorithms, and the number of permutations of inputs, means you’re often dealing with billions of possible inputs and outputs. Or, if you’re taking input from a user, essentially infinite.

All of this is to say that it is very rare for a unit test to be able to test all combinations. So, usually, developers hand-pick a few test cases and call it good. Experienced developers will pick cases that they know are bad input, and check that their functions fail properly. But they’re still not checking all the cases, which leaves the possibility that there are cases that developers believe will succeed when they will actually fail. This is where bugs come from, and your own experience should have proven by now that almost no code is bug-free.

Formal proofs obviate the need to check every possible value. They assert that the code is correct, by virtue of proof. And some languages (and tools) can actually verify formal proofs, and confirm that – yes – a function complies with a formal proof of correctness. This is what Hillel is talking about. If you’re making sure the football field doesn’t have gopher holes, then unit testing sections off from the 10-yard line to the end zone and looks for gopher holes, and extrapolates that since there are none, the field is free. Fuzzing sections off from the center field to an end-zone and checks that: it’s a much larger area, but it’s still only part of the field. A formal proof doesn’t check for holes: instead, it confirms that no gophers exist in the football park.

If you check, you’ll see that there’s no entry in Hillel’s challenge for Go. I’ve never come across a formal proof system for Go, and there’s certainly nothing built in to the language or core tooling. Instead, what the Go team is doing is doubling-down on the “test every input” approach; it’s called “fuzzing,” and it was recently added as a core feature of the Go test library.

Fuzzing is an expansion of unit testing. It takes a unit test and runs it for a wide variety of random input, checking for failures. It’s unit testing on steroids, and tests how experienced developers often test: trying to break things, rather than just verifying what they already expect.

The limitation of fuzzing is a well known problem: P/NP. There are problems which (non-quantum) computers can’t solve with brute force, not within the lifetime of the universe. We rely on these for the cryptography that protects our bank’s web interactions. We rely on the fact that fuzzing can’t fully test these functions, and the functions don’t even have to be cryptographically complex; often, even small functions are complex enough that it would take days or weeks of fuzzing to try every possible input.

It’s a curious thing, Go’s fuzzing. Everything about it emphasizes the weakness of brute-force testing. If you don’t explicitly limit it, a fuzzing test will run essentially forever – which it would have to do to fully test a function that performs simple addition across 64-bit integers:

A modern CPU can calculate around 150x10E9 FPO/s
A 64-bit int can have 2^64, or 368x10E17 ops
369x10E17 / 150x10E9 = 246x10E6 seconds, or nearly 8 years

And that’s for fuzzing to test f(x,y) = x+y. That’s not a very complex function.

Fuzzing isn’t bad; it’s quite useful, insofar as it makes testing for failures easier. But it’s not as useful as a formal proof system, and where other languages put effort into making proving easier, there’s a curious lack of work in this area. Pragmatically, adding fuzzing is not a surprising addition to Go. It’s an incremental improvement, and incremental improvements are Go’s bread and butter. On the other hand, Go’s vetting tooling is incredibly sophisticated, comparable even to Java’s vastly older tooling. It’s a little perplexing that more – or any – work has been made in formal proof systems.

I will be surprised if I find, after the initial novelty wears off, much in-the-field use of Go’s fuzzing. And I can’t help but wish that all the effort put into it – all the tremendous skill and experience – had gone instead into tools that help developers actually prove correctness, rather than inferring it.