cfallin opened issue #9449:
In the Wasmtime meeting today, there was a lot of good discussion about our procedures and standards around fuzzing, and how to make process improvements to make sure fuzzing catches issues in on-by-default features. I thought I'd expand a bit on an idea I proposed there.
The basic problem is: the configuration space of Wasm features that get fuzzed is the result of a fairly nontrivial interaction between feature selection logic in multiple places and crates. There are plausible reasons why this isn't all centralized; even if we do eventually get to a point where there is One Main Function that determines what is fuzzed, there is still the question of whether the plumbing that flows outward from that definition faithfully preserves the feature.
I'm suggesting as a principle that the only way to really know whether fuzzing will catch bugs is to... make fuzzing catch bugs. An "integration test" wrapped around our whole setup, as it were. The idea (which I'll maybe call a "fire drill" approach) is:
- Have a branch in the repo (
fuzzing-firedrill
) that hasmain
at some recent point plus one commit.- Update our fuzzing configuration to run one or several targets on this branch as well as
main
in oss-fuzz.- As a weekly/monthly/?? chore, recreate this branch by taking current
main
and injecting one fault. Perhaps we can keep a list/queue of ideas. They should be simple changes -- one-line commits that invert a bit or conditional, or delete a register update, or ... Ideally, this is ~10 minutes per week of work.- Wait for oss-fuzz to find the issue. Revert the fault commit (or move to the next fault?) when it does. If it doesn't, find out why.
The main idea is that we want to test the whole pipeline all the way to oss-fuzz-email-in-inboxes. Nothing else tests this whole pipeline today. We simply don't know if it works, except when we get emails. The idea here is to turn the email rate up from "0 + actual errors" to "fault-injected stream + actual errors", so it's always nonzero, so we can keep verifying it works.
There were a few objections or alternative ideas in discussions today:
- Suggestions to do automated mutation testing (the meeting notes above describe the proposal as mutation testing even). I don't think this is quite the same thing: the idea is explicitly not to have automation (which could fail), but to simulate a human making a faulty commit. It's like a firedrill: do everything except actually set the fire (put the commit on
main
).- Suggestions that this only catches bugs we think to inject: I would argue the main point is to (i) ensure the reporting pipeline works, and (ii) that code we inject faults into is as load-bearing as we think it is. As long as our sense of which code is load-bearing is tuned right, it shouldn't matter too much which faults we inject, just that we break it (and watch for the email).
- Suggestions that this would take too much time: I don't think "push one commit a week" is too bad, but on the other hand, we're getting something for it too: this adds end-to-end assurance/confidence in our setup that we don't really have today otherwise. If we get no emails from oss-fuzz for a month, is that because we're on a lucky streak and writing good code, or because something else changed? I'd argue that "push a commit that deletes a line every Monday morning" is easier than reviewing coverage carefully.
Thoughts?
alexcrichton commented on issue #9449:
One idea I had sort of along these lines (but also relatively orthogonal) which was similar to what Nick was saying about tests in wasmtime-fuzzing is https://github.com/bytecodealliance/wasmtime/pull/9452. Mostly cc'ing that as related-to, but definitely not replacing this.
I personally really like the idea of testing "oss-fuzz-email-in-inboxes" as oss-fuzz infrastructure itself has hiccups relatively frequently and this helps us diagnose that as well. The success of this testing though I think will be based in how little effort we need to apply to it. Automating as much as we can with "just hit the button" or something like that I think would be best. I wouldn't want to have to maintain a document explaining all the various steps of updating branches, how/when to revert, what cadence, etc.
That being said I think this is all quite doable. GitHub Actions gives us cron jobs and we can leave an unprotected branch able to get modified by it. Having a button to hit for "ok revert that top commit" which ideally auto-merges or just doesn't even go through PRs would be pretty nice too.
cfallin commented on issue #9449:
Ah, perhaps this is workable: note certain points in the code with a magic comment string.
// delete-to-test-fuzzing
before a line of code or somesuch. Then a GH Actions cronjob could pick one of these at random and update the branch once a week, as you say. (The selection of such lines needs a little human thought, but not much: should be deletions that won't break compilation trivially, and will do interesting things: misalign the stack, generate the wrong value, skip a GC write barrier, ...)
alexcrichton commented on issue #9449:
Oh man now I really like that. One could imagine dialing that up to 11 and fully automate this entire system. For example a branch could be made and if a maintainer doesn't flag somewhere "hey the fuzzers found the bug" then an issue would automatically get opened and filed. Otherwise the management of the "fuzz this broken wasmtime" branch would be 100% automated in theory.
Last updated: Dec 23 2024 at 13:07 UTC