Stream: git-wasmtime

Topic: wasmtime / PR #4868 fuzzgen: Statistics framework


view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2022 at 20:15):

afonso360 opened PR #4868 from fuzz-stats to main:

:wave: Hey,

This PR Introduces a statistics module for fuzzgen. This allows us to measure stuff and get some info about what is going on.

I don't want to commit this in its current format since it has some issues, see the comments below.

Here's the output after 50000 inputs:

== FuzzGen Statistics  ====================
Total Inputs: 50000
Valid Inputs: 29130 (58.3%)
Total Runs: 42358
Successful Runs: 17564 (41.5% of Total Runs)
Timed out Runs: 12445 (29.4% of Total Runs)
Traps:
        user code: heap_oob: 0 (0.0% of Total Runs)
        user code: icall_null: 0 (0.0% of Total Runs)
        user code: int_ovf: 0 (0.0% of Total Runs)
        user code: int_divz: 12349 (29.2% of Total Runs)
        user debug: 0 (0.0% of Total Runs)
        user code: heap_misaligned: 0 (0.0% of Total Runs)
        user code: bad_sig: 0 (0.0% of Total Runs)
        user code: bad_toint: 0 (0.0% of Total Runs)
        user code: interrupt: 0 (0.0% of Total Runs)
        user code: unreachable: 0 (0.0% of Total Runs)
        user code: table_oob: 0 (0.0% of Total Runs)
        resumable: 0 (0.0% of Total Runs)
        user code: stk_ovf: 0 (0.0% of Total Runs)

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2022 at 20:17):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2022 at 20:17):

afonso360 created PR review comment:

I think doing this looses us the nice cargo fmt output which is not great. Do we have some other way of getting the amount of inputs that the fuzzer tried, while using the Arbitrary version of the macro?

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2022 at 20:18):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2022 at 20:19):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2022 at 20:19):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2022 at 20:19):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2022 at 20:38):

afonso360 updated PR #4868 from fuzz-stats to main.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 05 2022 at 08:37):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 05 2022 at 08:37):

afonso360 created PR review comment:

Comparing Traps and Timeouts against runs is a correct but misleading metric.

We have many runs per input.

Every run has a chance to either trap or timeout. However when they do so we drop the entire input instead of just that run. This causes these values to be under represented in terms of percentage.

A better metric would be timeouts / valid inputs. But it also seems like a very wrong metric in its own way. However it would better represent inefficiencies in the fuzzer.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 05 2022 at 08:38):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 07 2022 at 01:35):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 07 2022 at 01:35):

jameysharp created PR review comment:

As a crude hack, we could duplicate the logic in the libfuzzer crate that's triggered by the RUST_LIBFUZZER_DEBUG_PATH environment variable: https://github.com/rust-fuzz/libfuzzer/blob/63b922685a0e714d2e7d2af9050e8860d5f6dc09/src/lib.rs#L203-L218

I've been trying to work out how to gather statistics like this from all the fuzz targets, and the other problem that bothers me is how to report results at the end of the run. At the moment the option I find most appealing is if somebody sorts out https://github.com/rust-fuzz/libfuzzer/issues/46. As I noted over there, it's also possible to use libc::atexit, but only if you're careful not to use any Rust I/O functions in the at-exit handler.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 07 2022 at 20:48):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 07 2022 at 20:48):

fitzgen created PR review comment:

In a variety of fuzz targets I've written in the past, we have something like

static ITER: AtomicUsize = AtomicUsize::new(0);

fuzz_target!(|data| {
    let n = ITER.fetch_add(1, Ordering::Relaxed);
    if log::is_enabled(log::Level::Info) && n % 1024 == 0 {
        dump_statistics();
    }

    // ...
});

and this has worked out pretty well, with adjustments to 1024 as needed for the fuzz target's throughput, of course.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 07 2022 at 22:29):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 07 2022 at 22:29):

jameysharp created PR review comment:

That's approximately what Afonso has done in this PR, and what Andrew did in the differential fuzz target. It's a pattern that has already helped us find important issues in both fuzzers, so that's great!

An alternative is what the metrics_printer crate does: kick off a thread and dump metrics at a periodic interval. That avoids having to separately tune for each fuzz target's throughput.

But one thing I'd like to be able to do is get these statistics for OSS-Fuzz runs. I had the thought that instead of slowing down fuzzing, maybe we could fetch the corpus from OSS-Fuzz and replay it locally with stats collection turned on. Those corpora currently range from under 2k up to 25k tests. It's possible each corpus is heavily biased in terms of the statistics we care about, compared to the full run which produced it. So maybe this isn't an effective alternative to just dumping stats to the OSS-Fuzz stderr log. But if it is: I don't see any way right now to make metrics printing useful with so few tests, except by setting an environment variable with the number of inputs and having the fuzz target print after it has processed that many inputs.

Afonso's recent testing has a related issue. I noticed the test plan was "ask libfuzzer to run 100,010 inputs" and the statistics are printed after a hardcoded number of inputs (this PR says 50k but I'm guessing it was 100k for these experiments), so there are about 10 tests that get run at the end without being counted in the stats. If you try to make these two numbers equal, it's tedious to reason about whether the current test is included in the count or not, so adding a few extra tests on the end just to be sure makes total sense.

Again, an environment variable could help since we know how many tests we're trying to run. But in both cases I'd really rather just be able to print the statistics when libFuzzer is done with its run, and not have to tune anything at all.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 13:35):

afonso360 updated PR #4868 from fuzz-stats to main.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 23 2022 at 19:05):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 23 2022 at 19:05):

afonso360 created PR review comment:

I think this is no longer very relevant timeouts are now solidly in 0.0% territory and with #4895 it now makes sense to compare traps with runs since they no longer drop an input.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 23 2022 at 19:06):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 23 2022 at 19:06):

afonso360 created PR review comment:

As a crude hack, we could duplicate the logic in the libfuzzer crate that's triggered by the RUST_LIBFUZZER_DEBUG_PATH environment variable: https://github.com/rust-fuzz/libfuzzer/blob/63b922685a0e714d2e7d2af9050e8860d5f6dc09/src/lib.rs#L203-L218

Huh so that's how they do it! I never knew. I think there's a better solution, see below.

I've been trying to work out how to gather statistics like this from all the fuzz targets, and the other problem that bothers me is how to report results at the end of the run.

I think switching to iter % n == 0 for printing is a great idea! And when I'm measuring stuff for other PR's I can always adjust how many runs I do so that it prints at the end of the run. i.e. pick n=10k and run 50k when benchmarking or something like that.

It doesn't fix that exact issue, but at least for my use case it works, and it sounds like it would be useful for OSS-Fuzz as well.


And I've also been thinking maybe we can do without the total_inputs metric. total_inputs is only used for the "Valid Inputs %" and we can get that metric by other means. When looking at the fuzz log we can check the run number that printed this statistics and do some math (run 50k has 25k valid inputs, so its 50% valid). When benchmarking its also easy to calculate manually when presenting results.

Additionally we should soon be at 100% on this target (only 1 PR left! I hope), so it would become really obvious if we are missing some runs, since Valid inputs would no longer be 50000 or 100000 it would be something like 41273 and that would be something that would probably stand out.

It would still be nice to have a 100% there, but while we figure out some solution I think it would be worth merging this. (I'm also tired of cherry-picking these 3 commits everywhere :big_smile:).

view this post on Zulip Wasmtime GitHub notifications bot (Sep 23 2022 at 19:09):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 23 2022 at 19:11):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 23 2022 at 19:11):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 23 2022 at 19:59):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 23 2022 at 20:02):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 23 2022 at 20:03):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 24 2022 at 00:00):

jameysharp created PR review comment:

I think removing the "total inputs" metric is a good plan. It at least lets us get something merged, even if it isn't perfect for all our use cases. I'll be happy to review and approve that!

view this post on Zulip Wasmtime GitHub notifications bot (Sep 24 2022 at 00:00):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 24 2022 at 05:25):

afonso360 updated PR #4868 from fuzz-stats to main.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 24 2022 at 05:35):

afonso360 updated PR #4868 from fuzz-stats to main.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 24 2022 at 05:47):

afonso360 has marked PR #4868 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 26 2022 at 23:15):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 27 2022 at 09:43):

afonso360 updated PR #4868 from fuzz-stats to main.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 27 2022 at 16:01):

jameysharp has enabled auto merge for PR #4868.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 27 2022 at 16:04):

jameysharp merged PR #4868.


Last updated: Oct 23 2024 at 20:03 UTC