regalloc-fuzzing · cranelift · Zulip Chat Archive

@Nick Fitzgerald I've set up fuzzing with libfuzzer and Arbitrary on our WIP replacement for regalloc, and I wanted to start a conversation about strategies to do this efficiently.

bnjbvr/regalloc.rs

Modular register allocator algorithms. Contribute to bnjbvr/regalloc.rs development by creating an account on GitHub.

Benjamin Bouvier (Feb 21 2020 at 15:56):

Right now, I've implemented a validator function that checks that the given input (constructed from random data bytes) is correct, and i run this before passing the generated input to the actually-useful test oracles. With code-coverage guided fuzzing, it seems libfuzzer found the way to create some valid inputs by luck, but it seems they remain mostly identical: a function with one block only, a few instructions in this block.

Benjamin Bouvier (Feb 21 2020 at 15:57):

So my question is really about the best strategy i could use there: is it likely that libfuzzer may find the different allowed inputs (several blocks), or should i start to make my own generator so that i only generate valid test cases?

bjorn3 (Feb 21 2020 at 16:23):

For the Arbitrary impl on for example Label and Block, you may want to manually implement one that creates valid Strings when the input is not utf-8. For example by reading 32bits and converting that to a base64 string. This way the fuzzer doesn't have to "learn" what utf-8 is.

bjorn3 (Feb 21 2020 at 16:24):

Another idea would be to store the name String in a side table, and use indexes everywhere instead. If you allow the side tables to be empty, you could skip them in the Arbitrary impl.

Benjamin Bouvier (Feb 21 2020 at 16:27):

Ah, good point, this was one of the things i wanted to ask about, as well: is there a way the Arbitrary derive trait can ignore some fields and not generate these, by having the user provide a default value instead? All the string fields in the regalloc crate definitely belong to this category.

One way to emulate this would be to have a FuzzFunc data structures, which derives Arbitrary, and it would only contain the fields we actually want to fuzz. Then it would also implement Into<Func> and fill in all the default values, e.g. for strings and all of this.

fitzgen (he/him) (Feb 21 2020 at 17:50):

edit: from the mentions view, I didn't get to see the whole thread, deleting this comment and then reading backlog >.<

fitzgen (he/him) (Feb 21 2020 at 17:57):

@Benjamin Bouvier so you don't need to jump all the way to "custom generators" from here, you can explore doing targeted impl Arbitrary for X blocks by hand, for example to avoid generating irrelevant strings

fitzgen (he/him) (Feb 21 2020 at 17:59):

is the validation pass to make sure that (for example) we only use things that have been already defined? (I'm looking at things like BlockIx which at first blush looks like an index that needs to be valid)

fitzgen (he/him) (Feb 21 2020 at 18:01):

bytecodealliance/wasmtime

Standalone JIT-style runtime for WebAssembly, using Cranelift - bytecodealliance/wasmtime

fitzgen (he/him) (Feb 21 2020 at 18:03):

regarding ignoring some fields: no we can't currently (and we can't ever fully ignore, we would need to either use a Default::default implementation or some other function)

I'll file an issue for this tho because I've also wanted it in the past and think it would be generally valuable

Benjamin Bouvier (Feb 21 2020 at 18:04):

re: validation, yes, it's the same as Cranelift's verifier (checks that the IR is sane: blocks are not empty and must end with a control flow instruction, etc)

Benjamin Bouvier (Feb 21 2020 at 18:05):

fitzgen (he/him) (Feb 21 2020 at 18:05):

fitzgen (he/him) (Feb 21 2020 at 18:06):

hm... is there something more abstract than literal clif that the allocator can work on? like can it take in a set of constraints? might be easier to generate the constraints than actually valid clif

fitzgen (he/him) (Feb 21 2020 at 18:07):

if doing full clif, it may prove more fruitful to have the fuzz target take in String and seed the corpus with a bunch of valid clif files, fwiw

fitzgen (he/him) (Feb 21 2020 at 18:07):

fitzgen (he/him) (Feb 21 2020 at 19:40):

Allow custom arbitrary methods for fields in the custom derive · Issue #33 · rust-fuzz/arbitrary

Sometimes a field of a struct doesn't implement arbitrary and it is either impossible to do (because it is from another crate, for example) or undesired. We should support some kind of attribut...

Benjamin Bouvier (Feb 24 2020 at 18:09):

@Nick Fitzgerald is there a way to have libfuzzer/the harness record statistics on my behalf? say, if i wanted to count the number of test cases that were valid vs invalid (i.e. didn't trigger a panic, but resulted in an error during interpretation, for instance), and get an idea of how effective my fuzzing is.

fitzgen (he/him) (Feb 24 2020 at 18:12):

cargo fuzz and libFuzzer should stop once they discover a panic, so I think the answer is that all of the test cases run didn't trigger a panic

are you looking for "what % of test cases reached code location X?" where X is the branch for valid test cases?

Benjamin Bouvier (Feb 24 2020 at 18:16):

Context is I run the (now always valid) generated func in an IR interpreter, and interpreting can return errors, e.g. division by zero, infinite loops, etc. So in this case, the generated func is structurally valid, but not runnable. I'd like to get a rough estimate of the proportion of such test cases, vs test cases that can actually be interpreted, and thus can go through register allocation.

fitzgen (he/him) (Feb 24 2020 at 18:32):

the official libfuzzer docs recommend using clang code coverage visualization to get an idea of "how good the fuzzer is" but rustc doesn't support that right now :(

as a hacky work around, you could try adding a panic!() to the start of the register allocation testing code path. or even just a println!("got to reg alloc") and count them with a CLI script

Benjamin Bouvier (Feb 24 2020 at 18:47):

ok, thanks! had another idea: if i can eliminate most real OOMs, i can trigger fake OOMs by allocating very large vectors on paths where the test case is valid but not interpreted correctly, so they get displayed in the output of libfuzzer (when using multiple jobs). Quite hacky :slight_smile:

Benjamin Bouvier (Feb 25 2020 at 18:28):

@Nick Fitzgerald Hey, i'm debug-printing the Unstructured instance's length, and i see it's around 3 bytes in most of my test runs, which is not enough bytes to run interesting programs. Do you know how its size is computed, and if there are ways i can increase it?

Benjamin Bouvier (Feb 25 2020 at 18:29):

Generating mostly correct output is time consuming, and i get to around 10 execs/run only, so there might be some bad interactions there...

fitzgen (he/him) (Feb 25 2020 at 18:31):

Benjamin Bouvier (Feb 25 2020 at 18:34):

Benjamin Bouvier (Feb 25 2020 at 18:39):

Since arbitrary returns a Result, it would be pretty nice if cargo fuzz could show me a relative proportion of Err among generated inputs, to make it discoverable that there's something wrong with the size of the raw data bytes. Would it be feasible?

fitzgen (he/him) (Feb 25 2020 at 18:52):

perhaps... cargo fuzz mostly just wraps libfuzzer and provides the logic to build sanitizers and link libfuzzer

but we could probably use the Arbitrary::size_hint to auto add seed files to the corpus or to pass -max_len and -len_control flags to libfuzzer. not sure exactly how this would work, there's some design work to be done

fitzgen (he/him) (Feb 25 2020 at 20:41):

Somehow leverage `Arbitrary::size_hint` to create initial seeds for corpus or control `-max_len` and `-len_control`? · Issue #218 · rust-fuzz/cargo-fuzz

@bnjbvr was reporting that starting fuzzing from scratch with a fuzz target that takes an Arbtirary impl was spending a lot of time on three bytes long inputs, where the Arbitrary implementation re...

Stream: cranelift

Topic: regalloc-fuzzing

Benjamin Bouvier (Feb 21 2020 at 15:50):

Benjamin Bouvier (Feb 21 2020 at 15:56):

Benjamin Bouvier (Feb 21 2020 at 15:57):

bjorn3 (Feb 21 2020 at 16:23):

bjorn3 (Feb 21 2020 at 16:24):

Benjamin Bouvier (Feb 21 2020 at 16:27):

fitzgen (he/him) (Feb 21 2020 at 17:50):

fitzgen (he/him) (Feb 21 2020 at 17:57):

fitzgen (he/him) (Feb 21 2020 at 17:59):

fitzgen (he/him) (Feb 21 2020 at 18:01):

fitzgen (he/him) (Feb 21 2020 at 18:03):

Benjamin Bouvier (Feb 21 2020 at 18:04):

Benjamin Bouvier (Feb 21 2020 at 18:05):

fitzgen (he/him) (Feb 21 2020 at 18:05):

fitzgen (he/him) (Feb 21 2020 at 18:06):

fitzgen (he/him) (Feb 21 2020 at 18:07):

fitzgen (he/him) (Feb 21 2020 at 18:07):

fitzgen (he/him) (Feb 21 2020 at 19:40):

Benjamin Bouvier (Feb 24 2020 at 18:09):

fitzgen (he/him) (Feb 24 2020 at 18:12):

Benjamin Bouvier (Feb 24 2020 at 18:16):

fitzgen (he/him) (Feb 24 2020 at 18:32):

Benjamin Bouvier (Feb 24 2020 at 18:47):

Benjamin Bouvier (Feb 25 2020 at 18:28):

Benjamin Bouvier (Feb 25 2020 at 18:29):

fitzgen (he/him) (Feb 25 2020 at 18:31):

Benjamin Bouvier (Feb 25 2020 at 18:34):

Benjamin Bouvier (Feb 25 2020 at 18:39):

fitzgen (he/him) (Feb 25 2020 at 18:52):

fitzgen (he/him) (Feb 25 2020 at 20:41):