@Nick Fitzgerald I've set up fuzzing with libfuzzer and Arbitrary on our WIP replacement for regalloc, and I wanted to start a conversation about strategies to do this efficiently.
Right now, I've implemented a validator function that checks that the given input (constructed from random data bytes) is correct, and i run this before passing the generated input to the actually-useful test oracles. With code-coverage guided fuzzing, it seems libfuzzer found the way to create some valid inputs by luck, but it seems they remain mostly identical: a function with one block only, a few instructions in this block.
So my question is really about the best strategy i could use there: is it likely that libfuzzer may find the different allowed inputs (several blocks), or should i start to make my own generator so that i only generate valid test cases?
For the Arbitrary impl on for example Label and Block, you may want to manually implement one that creates valid Strings when the input is not utf-8. For example by reading 32bits and converting that to a base64 string. This way the fuzzer doesn't have to "learn" what utf-8 is.
Another idea would be to store the name String in a side table, and use indexes everywhere instead. If you allow the side tables to be empty, you could skip them in the Arbitrary impl.
Ah, good point, this was one of the things i wanted to ask about, as well: is there a way the Arbitrary derive trait can ignore some fields and not generate these, by having the user provide a default value instead? All the string fields in the regalloc crate definitely belong to this category.
One way to emulate this would be to have a FuzzFunc
data structures, which derives Arbitrary, and it would only contain the fields we actually want to fuzz. Then it would also implement Into<Func>
and fill in all the default values, e.g. for strings and all of this.
edit: from the mentions view, I didn't get to see the whole thread, deleting this comment and then reading backlog >.<
@Benjamin Bouvier so you don't need to jump all the way to "custom generators" from here, you can explore doing targeted impl Arbitrary for X
blocks by hand, for example to avoid generating irrelevant strings
is the validation pass to make sure that (for example) we only use things that have been already defined? (I'm looking at things like BlockIx
which at first blush looks like an index that needs to be valid)
if so, then it makes sense to only have an Arbitrary
implementation "one level up" at the thing that contains/defines all the valid entities/indices. we do this kind of thing in wasmtime here: https://github.com/bytecodealliance/wasmtime/blob/master/crates/fuzzing/src/generators/api.rs
regarding ignoring some fields: no we can't currently (and we can't ever fully ignore, we would need to either use a Default::default
implementation or some other function)
I'll file an issue for this tho because I've also wanted it in the past and think it would be generally valuable
re: validation, yes, it's the same as Cranelift's verifier (checks that the IR is sane: blocks are not empty and must end with a control flow instruction, etc)
thanks for the pointers, i'll look into this early next week!
no problem! happy to help, keep me in the loop :)
hm... is there something more abstract than literal clif that the allocator can work on? like can it take in a set of constraints? might be easier to generate the constraints than actually valid clif
if doing full clif, it may prove more fruitful to have the fuzz target take in String
and seed the corpus with a bunch of valid clif files, fwiw
worth trying at least
Here is the issue I filed for "skipping" fields in #[derive(Arbitrary)]
:
https://github.com/rust-fuzz/arbitrary/issues/33
@Nick Fitzgerald is there a way to have libfuzzer/the harness record statistics on my behalf? say, if i wanted to count the number of test cases that were valid vs invalid (i.e. didn't trigger a panic, but resulted in an error during interpretation, for instance), and get an idea of how effective my fuzzing is.
cargo fuzz
and libFuzzer should stop once they discover a panic, so I think the answer is that all of the test cases run didn't trigger a panic
maybe I'm not following exactly what you're asking for though...
are you looking for "what % of test cases reached code location X?" where X is the branch for valid test cases?
Context is I run the (now always valid) generated func in an IR interpreter, and interpreting can return errors, e.g. division by zero, infinite loops, etc. So in this case, the generated func is structurally valid, but not runnable. I'd like to get a rough estimate of the proportion of such test cases, vs test cases that can actually be interpreted, and thus can go through register allocation.
I don't think there's a nice way to get this info out, unfortunately
the official libfuzzer docs recommend using clang code coverage visualization to get an idea of "how good the fuzzer is" but rustc doesn't support that right now :(
as a hacky work around, you could try adding a panic!()
to the start of the register allocation testing code path. or even just a println!("got to reg alloc")
and count them with a CLI script
ok, thanks! had another idea: if i can eliminate most real OOMs, i can trigger fake OOMs by allocating very large vectors on paths where the test case is valid but not interpreted correctly, so they get displayed in the output of libfuzzer (when using multiple jobs). Quite hacky :slight_smile:
@Nick Fitzgerald Hey, i'm debug-printing the Unstructured instance's length, and i see it's around 3 bytes in most of my test runs, which is not enough bytes to run interesting programs. Do you know how its size is computed, and if there are ways i can increase it?
Generating mostly correct output is time consuming, and i get to around 10 execs/run only, so there might be some bad interactions there...
@Benjamin Bouvier couple things:
you can manually do head /dev/random -c 1000 > fuzz/corpus/my-fuzz-target/my-random-seed
to get a bigger sized seed file in the corpus for libfuzzer to mutate
cargo fuzz run my-fuzz-target -- -help=1
should also show a bunch of libfuzzer flags you can pass in that control input size. anything after the --
is passed directly to libfuzzer
great, -max_len and -len_control is what i want. Thanks!
Since arbitrary
returns a Result
, it would be pretty nice if cargo fuzz
could show me a relative proportion of Err
among generated inputs, to make it discoverable that there's something wrong with the size of the raw data bytes. Would it be feasible?
perhaps... cargo fuzz
mostly just wraps libfuzzer and provides the logic to build sanitizers and link libfuzzer
but we could probably use the Arbitrary::size_hint
to auto add seed files to the corpus or to pass -max_len
and -len_control
flags to libfuzzer. not sure exactly how this would work, there's some design work to be done
Filed https://github.com/rust-fuzz/cargo-fuzz/issues/218
Last updated: Dec 23 2024 at 12:05 UTC