wasmtime / Issue #611 Initial Fuzzing Infrastructure · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / Issue #611 Initial Fuzzing Infrastructure

Wasmtime GitHub notifications bot (Apr 04 2020 at 06:21):

@fitzgen sir this was a gsoc2020 project idea, I worked in the application period and submitted a proposal. Given the time I had at I hand i wasn't able to get complete idea about the different vulnerabilities like ABI abstractions, Heap and Stack safety. I want to voluntarily contribute for the idea, but couldn't do the same before I clear out some doubts.
I would like to start understanding the fuzzing process more closely and contributing by writing fuzzers perhaps. During the application process I wrote mails for the project details, but I never got any reply which is completely fine given the situation we have now.
Is there any way we can do a conversation for the doubts I have, I see that there used to be a IRC channel for wasmtime one year ago, but now they migrated to Matrix which unfortunately doesn't has any such channel. If you are available on any channel of Mozilla/(other open source org) please let me know.
Good day!

Wasmtime GitHub notifications bot (Apr 04 2020 at 07:14):

bjorn3 commented on Issue #611:

https://bytecodealliance.zulipchat.com/ is the primary discussion channel.

Wasmtime GitHub notifications bot (Feb 03 2021 at 20:43):

bjorn3 commented on Issue #611:

I think this can be closed.

Wasmtime GitHub notifications bot (Feb 03 2021 at 20:51):

cfallin closed Issue #611:

I plan on laying out some foundational fuzzing infrastructure for Wasmtime in the next few weeks. I'd like to use this issue as a kind of meta issue to keep track of this work. I'd also appreciate feedback on the plan from anyone with experience fuzzing or domain knowledge of a particular thing we plan on fuzzing.

Goals

Find bugs!

Bugs that we wouldn't otherwise find until our users hit them.

Bugs that are hard to manually write test cases for, or that you wouldn't even think of testing for.

Make bugs (fuzzer-found or otherwise) easier to debug via automatic test case reduction.

Strategy

Breadth not Depth

At least initially, let's build out a few different fuzzing approaches enough that they start identifying bugs, but not spend a ton of time building bespoke tools tailored for exactly the problems we have at hand.

My assumptions are that

we have low-hanging fruit available, since we haven't done a ton of fuzzing for a bunch of corners yet, and

different fuzzing approaches tend to uncover different sets of bugs.

Therefore, by making a bunch of different just-good-enough fuzzers, we will repeatedly discover new, unique low-hanging fruit bugs.

Additionally, this gives us a nice foundation that we can spring board off of in the future when we decide to go deeper in any particular direction.

Decouple Generators and Oracles

A generator creates test cases (usually given an RNG or a random byte stream input). An oracle determines if executing a test case uncovered a bug. In general, it is good software engineering to separate concerns, but separating these two parts specifically allows us to:

reuse oracles during automatic test case reduction (a la creduce), and

swap out existing, off-the-shelf generators with more intelligent, custom generators the future.

Implementation

In general, I recommend that we use libFuzzer to drive our fuzzing. It is coverage-guided, which means it can find interesting code paths more quickly than testing purely random inputs will. It also has a nice Rust interface in the form of cargo-fuzz.

Any custom generators we create should take libFuzzer-provided input bytes and then re-interpret that as a sequence of random values to drive choices inside the generator. This lets us combine the benefits of smart, structure-aware generators with those of coverage-guided fuzzing. We can implement this by implementing our custom generators in terms of the arbitrary crate's Arbitrary trait.

As far as test case reduction goes, when a generator is creating Wasm files, it should be relatively easy to use binaryen's wasm-reduce on the Wasm file, or use creduce on the WAT disassembly. We can, however, do some small things to make the process turnkey:

[ ] Write glue scripts for running wasm-reduce and/or creduce on a Wasm test case with any of our various oracles

For generators that are creating custom in-memory data structures by implementing the Arbitrary trait, test case reduction requires we implement some custom logic. The Arbitrary trait supports defining a custom shrink method that takes &self and returns an iterator of smaller instances of Self. We can use this to create custom test case reduction for each of our custom test case generators.

Finally, any custom generator we create (and any generator we wrap that supports turning the generation of individual test case features on/off) should support swarm testing. Swarm testing is where we randomly turn on/off the generation of various test case features (such as, should a generator create Wasm test cases that use call_indirect or not?) so that we are more likely to generate pathological test cases where bugs are more likely to be found. This is relatively easy implement and should yield

Fuzzing Wasmtime's Embedding API

This is a case where, unfortunately, we can't really use existing off-the-shelf solutions.

Generators

[x] Build a custom generator that creates a sequence of API calls. It shouldn't perform the calls, just describe them. This generator should have some smarts about knowing how to generate valid API calls.

Oracles

[x] Interpret API call descriptions and perform the actual API call. Find unexpected panics, assertion failures, and segfaults.

Wasm Execution Fuzzing

We should fuzz our execution of Wasm. Yes, Cranelift has some fuzzing in SpiderMonkey, but we should also make sure that all of our Wasmtime-specific JIT'ing machinery is well fuzzed, as well as our WASI implementation and sandboxing.

Generators

[x] Use wasm-opt -ttf to generate random, valid Wasm files.

[ ] Write a custom generator that creates Wasm files that make sequences of WASI syscalls.

Oracles

[ ] Execute the file and ensure Wasmtime doesn't panic, fail any assert!(..)s, or segfault regardless if executing the Wasm traps.

[ ] strace the process or something and ensure it doesn't do any syscalls outside the preopened directory given to the WASI sandbox or something?

[x] Differential fuzzing where we compare the observable results of execution between:

[x] Cranelift without optimizations

[x] Cranelift with opt=speed

[x] Cranelift with opt=size

[x] Cranelift with opt=speed_and_size

[ ] Cranelift with a warm code cache

[ ] Cranelift with a cold code cache

[x] Lightbeam

More Stuff to Explore in the Future

Add support for code-coverage in Cranelift and leverage it to build equivalence-module-inputs testing and coverage-guided fuzzing for Wasmtime

Alternatively, we could MacGyver some custom code coverage scheme via instrumenting Wasm files with Walrus instead of doing this inside Cranelift at the clif level.

Create test case generators and oracles for our Wasm interface types support? What would be involved here is not super clear to me yet.

Questions

Should the fuzzing corpus be committed into the git repo? Or perhaps should it be a separate repo that we include as a git submodule?

What work here should we prioritize?

In particular, what variants would be most valuable to compare / most likely to uncover high-priority bugs in differential fuzzing of Wasm execution?

Is there anything here you think we should not implement?

Are there any other WASI-targeted oracles we can create? The strace idea is pretty half-baked right now. I'd appreciate some more ideas from folks more involved in the WASI side of things than I am...

Last updated: Apr 16 2025 at 20:03 UTC