wasmtime / issue #9255 Improve libFuzzer feedback of Cran... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #9255 Improve libFuzzer feedback of Cran...

Wasmtime GitHub notifications bot (Sep 16 2024 at 16:16):

alexcrichton added the fuzzing label to Issue #9255.

Wasmtime GitHub notifications bot (Sep 16 2024 at 16:16):

alexcrichton opened issue #9255:

One of the things we've struggled with historically in fuzzing is generating interesting enough WebAssembly modules which execute interesting corner cases without trapping almost immediately. For example many wasm-smith modules might immediately have an infinite loop, immediately infinitely recurse, or immediately trap with an out of bounds load. For all of these conditions we have various checks and balances in place to ensure that we get hopefully some better coverage, but I was just thinking of another possible way we could improve it.

What I'm imagining is that we can leverage libFuzzer's coverage-based feedback with a scheme such as:

Fix a constant N at a big number, maybe 10_000

Allocate N bytes extra in all VMContexts when this feature is enabled, and initialize all bytes to zero

Assign a unique number to all Cranelift lowering rules

When a lowering I rule is used, then after the lowering rule is matched increment the byte at I % N in the VMContext

When fuzzing, enable this option (perhaps only sometimes?). After WebAssembly execution is performed take a look at the map on each VMContext (maybe this is a per-store map then instead of per-VMContext?)

Define N empty functions at compile time with gobbledegook to make sure they don't get optimized away

For each byte in the map that has been incremented call function N.

The hope is that this scheme enables libFuzzer to see what actually happened at runtime. It can know that not only was the lowering rule executed in Cranelift but addtionally the generated code was executed at runtime. WIth N being sufficiently high enough we could get pretty good coverage of "actually executed this lowering rule in a fuzz test case".

I'll note that this is similar to #1151 in spirit and that there'd still be a lot of details to work out here. For example how exactly to model this in Cranelift would be tricky as right now there's not an easy notion of "this lowering rule I was used" nor would it necessarily be easy to inject code to modify bytes during lowering. In any case though I figured I could note down the issue for possible future exploration.

Wasmtime GitHub notifications bot (Sep 16 2024 at 16:17):

github-actions[bot] commented on issue #9255:

Subscribe to Label Action

cc @fitzgen

<details>
This issue or pull request has been labeled: "fuzzing"

Thus the following users have been cc'd because of the following labels:

fitzgen: fuzzing

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (Sep 17 2024 at 20:38):

fitzgen commented on issue #9255:

This is a neat little hack!

I think it would be relatively straightforward to implement by

Modifying the ISLE compiler to add a on_constructor_rule_fired hook to its generated trait, along with calls to that trait at the appropriate places. This hook would have a default implementation that does nothing. The hook would be given the term name, unique rule ID, ISLE source location, etc...[^0]

We implement this hook for lowering, saving a record of which lower rule fired for a particular CLIF instruction's lowering in the ISLE context.

After the CLIF instruction is lowered, we inspect that aforementioned record, and emit a canned mach-inst sequence to set the i % Nth bit in the bitmap in the vmctx[^1] where i is the global index of this constructor rule and N is the size of the bitmap. Note that because we lower in a backwards pass, emitting the bit-setting code after we lower means the bit-setting code comes before the lowered CLIF instruction. We want this, so that branching instructions or returns or whatever don't prevent their rule's associated bit from getting set.

Note that we also might want to make those dummy functions return a unique integer, just to try and prevent LLVM/the linker from deduplicating them all and defeating our intentions.

[^0]: Or, instead of symbolicating those things eagerly, it could be given just a unique rule ID, and then provide some mechanism for getting the other bits on demand. I think all we really need for this use case is the rule's term's name.

[^1]: Insert handwaving about new kind of CLIF global for getting this bitmap's location, similar to stack limits, to make this generic for all Cranelift users rather than specific to just Wasmtime.

Wasmtime GitHub notifications bot (Sep 17 2024 at 20:38):

fitzgen edited a comment on issue #9255:

This is a neat little hack!

I think it would be relatively straightforward to implement by

Modifying the ISLE compiler to add a on_constructor_rule_fired hook to its generated trait, along with calls to that trait at the appropriate places. This hook would have a default implementation that does nothing. The hook would be given the term name, unique rule ID, ISLE source location, etc...[^0]

We implement this hook for lowering, saving a record of which lower rule fired for a particular CLIF instruction's lowering in the ISLE context.

After the CLIF instruction is lowered, we inspect that aforementioned record, and emit a canned mach-inst sequence to set the i % Nth bit in the bitmap in the vmctx[^1] where i is the global index of this constructor rule and N is the size of the bitmap. Note that because we lower in a backwards pass, emitting the bit-setting code after we lower means the bit-setting code comes before the lowered CLIF instruction. We want this, so that branching instructions or returns or whatever don't prevent their rule's associated bit from getting set.

Note that we also might want to make those dummy functions return a unique integer, just to try and prevent LLVM/the linker from deduplicating them all and defeating our intentions.

[^0]: Or, instead of symbolicating those things eagerly, it could be given just a unique rule ID, and then provide some mechanism for getting the other bits on demand. I think all we really need for this use case is the rule's term's name so we can filter for only lower rules and not all the intermediate terms' rules.

[^1]: Insert handwaving about new kind of CLIF global for getting this bitmap's location, similar to stack limits, to make this generic for all Cranelift users rather than specific to just Wasmtime.

Last updated: Apr 18 2025 at 09:03 UTC