Hyperion101010 commented on Issue #611:
@fitzgen sir this was a gsoc2020 project idea, I worked in the application period and submitted a proposal. Given the time I had at I hand i wasn't able to get complete idea about the different vulnerabilities like ABI abstractions, Heap and Stack safety. I want to voluntarily contribute for the idea, but couldn't do the same before I clear out some doubts.
I would like to start understanding the fuzzing process more closely and contributing by writing fuzzers perhaps. During the application process I wrote mails for the project details, but I never got any reply which is completely fine given the situation we have now.
Is there any way we can do a conversation for the doubts I have, I see that there used to be a IRC channel for wasmtime one year ago, but now they migrated to Matrix which unfortunately doesn't has any such channel. If you are available on any channel of Mozilla/(other open source org) please let me know.
Good day!
bjorn3 commented on Issue #611:
https://bytecodealliance.zulipchat.com/ is the primary discussion channel.
bjorn3 commented on Issue #611:
I think this can be closed.
cfallin closed Issue #611:
I plan on laying out some foundational fuzzing infrastructure for Wasmtime in the next few weeks. I'd like to use this issue as a kind of meta issue to keep track of this work. I'd also appreciate feedback on the plan from anyone with experience fuzzing or domain knowledge of a particular thing we plan on fuzzing.
Goals
Find bugs!
- Bugs that we wouldn't otherwise find until our users hit them.
- Bugs that are hard to manually write test cases for, or that you wouldn't even think of testing for.
Make bugs (fuzzer-found or otherwise) easier to debug via automatic test case reduction.
Strategy
Breadth not Depth
At least initially, let's build out a few different fuzzing approaches enough that they start identifying bugs, but not spend a ton of time building bespoke tools tailored for exactly the problems we have at hand.
My assumptions are that
- we have low-hanging fruit available, since we haven't done a ton of fuzzing for a bunch of corners yet, and
- different fuzzing approaches tend to uncover different sets of bugs.
Therefore, by making a bunch of different just-good-enough fuzzers, we will repeatedly discover new, unique low-hanging fruit bugs.
Additionally, this gives us a nice foundation that we can spring board off of in the future when we decide to go deeper in any particular direction.
Decouple Generators and Oracles
A generator creates test cases (usually given an RNG or a random byte stream input). An oracle determines if executing a test case uncovered a bug. In general, it is good software engineering to separate concerns, but separating these two parts specifically allows us to:
- reuse oracles during automatic test case reduction (a la
creduce
), and- swap out existing, off-the-shelf generators with more intelligent, custom generators the future.
Implementation
In general, I recommend that we use
libFuzzer
to drive our fuzzing. It is coverage-guided, which means it can find interesting code paths more quickly than testing purely random inputs will. It also has a nice Rust interface in the form ofcargo-fuzz
.Any custom generators we create should take
libFuzzer
-provided input bytes and then re-interpret that as a sequence of random values to drive choices inside the generator. This lets us combine the benefits of smart, structure-aware generators with those of coverage-guided fuzzing. We can implement this by implementing our custom generators in terms of thearbitrary
crate'sArbitrary
trait.As far as test case reduction goes, when a generator is creating Wasm files, it should be relatively easy to use binaryen's
wasm-reduce
on the Wasm file, or usecreduce
on the WAT disassembly. We can, however, do some small things to make the process turnkey:
- [ ] Write glue scripts for running
wasm-reduce
and/orcreduce
on a Wasm test case with any of our various oraclesFor generators that are creating custom in-memory data structures by implementing the
Arbitrary
trait, test case reduction requires we implement some custom logic. TheArbitrary
trait supports defining a customshrink
method that takes&self
and returns an iterator of smaller instances ofSelf
. We can use this to create custom test case reduction for each of our custom test case generators.Finally, any custom generator we create (and any generator we wrap that supports turning the generation of individual test case features on/off) should support swarm testing. Swarm testing is where we randomly turn on/off the generation of various test case features (such as, should a generator create Wasm test cases that use
call_indirect
or not?) so that we are more likely to generate pathological test cases where bugs are more likely to be found. This is relatively easy implement and should yieldFuzzing Wasmtime's Embedding API
This is a case where, unfortunately, we can't really use existing off-the-shelf solutions.
Generators
- [x] Build a custom generator that creates a sequence of API calls. It shouldn't perform the calls, just describe them. This generator should have some smarts about knowing how to generate valid API calls.
Oracles
- [x] Interpret API call descriptions and perform the actual API call. Find unexpected panics, assertion failures, and segfaults.
Wasm Execution Fuzzing
We should fuzz our execution of Wasm. Yes, Cranelift has some fuzzing in SpiderMonkey, but we should also make sure that all of our Wasmtime-specific JIT'ing machinery is well fuzzed, as well as our WASI implementation and sandboxing.
Generators
[x] Use
wasm-opt -ttf
to generate random, valid Wasm files.[ ] Write a custom generator that creates Wasm files that make sequences of WASI syscalls.
Oracles
[ ] Execute the file and ensure Wasmtime doesn't panic, fail any
assert!(..)
s, or segfault regardless if executing the Wasm traps.[ ]
strace
the process or something and ensure it doesn't do any syscalls outside the preopened directory given to the WASI sandbox or something?[x] Differential fuzzing where we compare the observable results of execution between:
- [x] Cranelift without optimizations
- [x] Cranelift with opt=speed
- [x] Cranelift with opt=size
- [x] Cranelift with opt=speed_and_size
- [ ] Cranelift with a warm code cache
- [ ] Cranelift with a cold code cache
- [x] Lightbeam
More Stuff to Explore in the Future
- Alternatively, we could MacGyver some custom code coverage scheme via instrumenting Wasm files with Walrus instead of doing this inside Cranelift at the clif level.
Create test case generators and oracles for our Wasm interface types support? What would be involved here is not super clear to me yet.
Questions
Should the fuzzing corpus be committed into the git repo? Or perhaps should it be a separate repo that we include as a git submodule?
What work here should we prioritize?
- In particular, what variants would be most valuable to compare / most likely to uncover high-priority bugs in differential fuzzing of Wasm execution?
Is there anything here you think we should not implement?
Are there any other WASI-targeted oracles we can create? The
strace
idea is pretty half-baked right now. I'd appreciate some more ideas from folks more involved in the WASI side of things than I am...
Last updated: Dec 23 2024 at 12:05 UTC