afonso360 opened PR #4868 from fuzz-stats
to main
:
:wave: Hey,
This PR Introduces a statistics module for fuzzgen. This allows us to measure stuff and get some info about what is going on.
I don't want to commit this in its current format since it has some issues, see the comments below.
Here's the output after 50000 inputs:
== FuzzGen Statistics ==================== Total Inputs: 50000 Valid Inputs: 29130 (58.3%) Total Runs: 42358 Successful Runs: 17564 (41.5% of Total Runs) Timed out Runs: 12445 (29.4% of Total Runs) Traps: user code: heap_oob: 0 (0.0% of Total Runs) user code: icall_null: 0 (0.0% of Total Runs) user code: int_ovf: 0 (0.0% of Total Runs) user code: int_divz: 12349 (29.2% of Total Runs) user debug: 0 (0.0% of Total Runs) user code: heap_misaligned: 0 (0.0% of Total Runs) user code: bad_sig: 0 (0.0% of Total Runs) user code: bad_toint: 0 (0.0% of Total Runs) user code: interrupt: 0 (0.0% of Total Runs) user code: unreachable: 0 (0.0% of Total Runs) user code: table_oob: 0 (0.0% of Total Runs) resumable: 0 (0.0% of Total Runs) user code: stk_ovf: 0 (0.0% of Total Runs)
afonso360 submitted PR review.
afonso360 created PR review comment:
I think doing this looses us the nice
cargo fmt
output which is not great. Do we have some other way of getting the amount of inputs that the fuzzer tried, while using the Arbitrary version of the macro?
afonso360 edited PR review comment.
afonso360 edited PR review comment.
afonso360 edited PR review comment.
afonso360 edited PR review comment.
afonso360 updated PR #4868 from fuzz-stats
to main
.
afonso360 submitted PR review.
afonso360 created PR review comment:
Comparing Traps and Timeouts against runs is a correct but misleading metric.
We have many runs per input.
Every run has a chance to either trap or timeout. However when they do so we drop the entire input instead of just that run. This causes these values to be under represented in terms of percentage.
A better metric would be timeouts / valid inputs. But it also seems like a very wrong metric in its own way. However it would better represent inefficiencies in the fuzzer.
afonso360 edited PR review comment.
jameysharp submitted PR review.
jameysharp created PR review comment:
As a crude hack, we could duplicate the logic in the libfuzzer crate that's triggered by the
RUST_LIBFUZZER_DEBUG_PATH
environment variable: https://github.com/rust-fuzz/libfuzzer/blob/63b922685a0e714d2e7d2af9050e8860d5f6dc09/src/lib.rs#L203-L218I've been trying to work out how to gather statistics like this from all the fuzz targets, and the other problem that bothers me is how to report results at the end of the run. At the moment the option I find most appealing is if somebody sorts out https://github.com/rust-fuzz/libfuzzer/issues/46. As I noted over there, it's also possible to use
libc::atexit
, but only if you're careful not to use any Rust I/O functions in the at-exit handler.
fitzgen submitted PR review.
fitzgen created PR review comment:
In a variety of fuzz targets I've written in the past, we have something like
static ITER: AtomicUsize = AtomicUsize::new(0); fuzz_target!(|data| { let n = ITER.fetch_add(1, Ordering::Relaxed); if log::is_enabled(log::Level::Info) && n % 1024 == 0 { dump_statistics(); } // ... });
and this has worked out pretty well, with adjustments to
1024
as needed for the fuzz target's throughput, of course.
jameysharp submitted PR review.
jameysharp created PR review comment:
That's approximately what Afonso has done in this PR, and what Andrew did in the
differential
fuzz target. It's a pattern that has already helped us find important issues in both fuzzers, so that's great!An alternative is what the metrics_printer crate does: kick off a thread and dump metrics at a periodic interval. That avoids having to separately tune for each fuzz target's throughput.
But one thing I'd like to be able to do is get these statistics for OSS-Fuzz runs. I had the thought that instead of slowing down fuzzing, maybe we could fetch the corpus from OSS-Fuzz and replay it locally with stats collection turned on. Those corpora currently range from under 2k up to 25k tests. It's possible each corpus is heavily biased in terms of the statistics we care about, compared to the full run which produced it. So maybe this isn't an effective alternative to just dumping stats to the OSS-Fuzz stderr log. But if it is: I don't see any way right now to make metrics printing useful with so few tests, except by setting an environment variable with the number of inputs and having the fuzz target print after it has processed that many inputs.
Afonso's recent testing has a related issue. I noticed the test plan was "ask libfuzzer to run 100,010 inputs" and the statistics are printed after a hardcoded number of inputs (this PR says 50k but I'm guessing it was 100k for these experiments), so there are about 10 tests that get run at the end without being counted in the stats. If you try to make these two numbers equal, it's tedious to reason about whether the current test is included in the count or not, so adding a few extra tests on the end just to be sure makes total sense.
Again, an environment variable could help since we know how many tests we're trying to run. But in both cases I'd really rather just be able to print the statistics when libFuzzer is done with its run, and not have to tune anything at all.
afonso360 updated PR #4868 from fuzz-stats
to main
.
afonso360 submitted PR review.
afonso360 created PR review comment:
I think this is no longer very relevant timeouts are now solidly in 0.0% territory and with #4895 it now makes sense to compare traps with runs since they no longer drop an input.
afonso360 submitted PR review.
afonso360 created PR review comment:
As a crude hack, we could duplicate the logic in the libfuzzer crate that's triggered by the RUST_LIBFUZZER_DEBUG_PATH environment variable: https://github.com/rust-fuzz/libfuzzer/blob/63b922685a0e714d2e7d2af9050e8860d5f6dc09/src/lib.rs#L203-L218
Huh so that's how they do it! I never knew. I think there's a better solution, see below.
I've been trying to work out how to gather statistics like this from all the fuzz targets, and the other problem that bothers me is how to report results at the end of the run.
I think switching to
iter % n == 0
for printing is a great idea! And when I'm measuring stuff for other PR's I can always adjust how many runs I do so that it prints at the end of the run. i.e. pick n=10k and run 50k when benchmarking or something like that.It doesn't fix that exact issue, but at least for my use case it works, and it sounds like it would be useful for OSS-Fuzz as well.
And I've also been thinking maybe we can do without the
total_inputs
metric.total_inputs
is only used for the "Valid Inputs %" and we can get that metric by other means. When looking at the fuzz log we can check the run number that printed this statistics and do some math (run 50k has 25k valid inputs, so its 50% valid). When benchmarking its also easy to calculate manually when presenting results.Additionally we should soon be at 100% on this target (only 1 PR left! I hope), so it would become really obvious if we are missing some runs, since
Valid inputs
would no longer be50000
or100000
it would be something like41273
and that would be something that would probably stand out.It would still be nice to have a 100% there, but while we figure out some solution I think it would be worth merging this. (I'm also tired of cherry-picking these 3 commits everywhere :big_smile:).
afonso360 edited PR review comment.
afonso360 edited PR review comment.
afonso360 edited PR review comment.
afonso360 edited PR review comment.
afonso360 edited PR review comment.
afonso360 edited PR review comment.
jameysharp created PR review comment:
I think removing the "total inputs" metric is a good plan. It at least lets us get something merged, even if it isn't perfect for all our use cases. I'll be happy to review and approve that!
jameysharp submitted PR review.
afonso360 updated PR #4868 from fuzz-stats
to main
.
afonso360 updated PR #4868 from fuzz-stats
to main
.
afonso360 has marked PR #4868 as ready for review.
jameysharp submitted PR review.
afonso360 updated PR #4868 from fuzz-stats
to main
.
jameysharp has enabled auto merge for PR #4868.
jameysharp merged PR #4868.
Last updated: Nov 22 2024 at 16:03 UTC