afonso360 opened issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [ ] Generating test inputs with control flow
- [ ] Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
- [ ] Generating memory loads/stores (stack and heap)
- [ ] Catch out of bounds memory accesses in the interpreter
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
afonso360 edited issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [ ] Generating test inputs with control flow
- [ ] Generate multiple blocks and basic jump instructions (#3094)
- [ ] Generate
brtable
's,brif
,brff
branches- [ ] Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
- [ ] Generating memory loads/stores (stack and heap)
- [ ] Catch out of bounds memory accesses in the interpreter
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
afonso360 edited issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [ ] Generating test inputs with control flow
- [ ] Generate multiple blocks and basic jump instructions (#3094)
- [ ] Generate
brtable
's- [ ] Generate
brif
,brff
? (Are these staying after the legacy x86 backend is removed?)- [ ] Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
- [ ] Generating memory loads/stores (stack and heap)
- [ ] Catch out of bounds memory accesses in the interpreter
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
afonso360 edited issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [ ] Generating test inputs with control flow
- [ ] Generate multiple blocks and basic jump instructions (#3094)
- [ ] Generate
br_table
's- [ ] Generate
brif
,brff
? (Are these staying after the legacy x86 backend is removed?)- [ ] Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
- [ ] Generating memory loads/stores (stack and heap)
- [ ] Catch out of bounds memory accesses in the interpreter
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
afonso360 edited issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [x] Generating test inputs with control flow
- [x] Generate multiple blocks and basic jump instructions (#3094)
- [ ] Generate
br_table
's and other jump table jumps- [ ] Generating memory loads/stores (stack and heap)
- [x] Add stack support to the interpreter with virtual addresses (#3187)
- [ ] Add heap support to the interpreter with virtual addresses
- [ ] Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
afonso360 edited issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [x] Generating test inputs with control flow
- [x] Generate multiple blocks and basic jump instructions (#3094)
- [ ] Generate
br_table
's and other jump table jumps (#3299)- [ ] Generating memory loads/stores (stack and heap)
- [x] Add stack support to the interpreter with virtual addresses (#3187)
- [ ] Add heap support to the interpreter with virtual addresses
- [ ] Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
afonso360 edited issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [x] Generating test inputs with control flow
- [x] Generate multiple blocks and basic jump instructions (#3094)
- [x] Generate
br_table
's and other jump table jumps (#3299)- [ ] Generating memory loads/stores (stack and heap)
- [x] Add stack support to the interpreter with virtual addresses (#3187)
- [ ] Add heap support to the interpreter with virtual addresses
- [ ] Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
afonso360 edited issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [x] Generating test inputs with control flow
- [x] Generate multiple blocks and basic jump instructions (#3094)
- [x] Generate
br_table
's and other jump table jumps (#3299)- [ ] Generating memory loads/stores (stack and heap)
- [x] Add stack support to the interpreter with virtual addresses (#3187)
- [ ] Add heap support to the interpreter with virtual addresses (#3302)
- [ ] Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
afonso360 edited issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [x] Generating test inputs with control flow
- [x] Generate multiple blocks and basic jump instructions (#3094)
- [x] Generate
br_table
's and other jump table jumps (#3299)- [ ] Generating memory loads/stores (stack and heap)
- [x] Add stack support to the interpreter with virtual addresses (#3187)
- [ ] Add heap support to the interpreter with virtual addresses (#3302)
- [ ] Cross endianness support for loads/store
- [ ] Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
afonso360 edited issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [x] Generating test inputs with control flow
- [x] Generate multiple blocks and basic jump instructions (#3094)
- [x] Generate
br_table
's and other jump table jumps (#3299)- [ ] Generating memory loads/stores (stack and heap)
- [x] Add stack support to the interpreter with virtual addresses (#3187)
- [ ] Add heap support to the interpreter with virtual addresses (#3302)
- [ ] Cross endianness support for loads/stores
- [ ] Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
afonso360 edited issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [x] Generating test inputs with control flow
- [x] Generate multiple blocks and basic jump instructions (#3094)
- [x] Generate
br_table
's and other jump table jumps (#3299)- [ ] Generating memory loads/stores (stack and heap)
- [x] Add stack support to the interpreter with virtual addresses (#3187)
- [ ] Add heap support to the interpreter with virtual addresses (#3302)
- [ ] Cross endianness loads/stores
- [ ] Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
akirilov-arm labeled issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [x] Generating test inputs with control flow
- [x] Generate multiple blocks and basic jump instructions (#3094)
- [x] Generate
br_table
's and other jump table jumps (#3299)- [ ] Generating memory loads/stores (stack and heap)
- [x] Add stack support to the interpreter with virtual addresses (#3187)
- [ ] Add heap support to the interpreter with virtual addresses (#3302)
- [ ] Cross endianness loads/stores
- [ ] Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
afonso360 edited issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [x] Generating test inputs with control flow
- [x] Generate multiple blocks and basic jump instructions (#3094)
- [x] Generate
br_table
's and other jump table jumps (#3299)- [ ] Generating memory loads/stores (stack and heap)
- [x] Add stack support to the interpreter with virtual addresses (#3187)
- [x] Add heap support to the interpreter with virtual addresses (#3302)
- [ ] Cross endianness loads/stores
- [ ] Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
afonso360 edited issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [x] Generating test inputs with control flow
- [x] Generate multiple blocks and basic jump instructions (#3094)
- [x] Generate
br_table
's and other jump table jumps (#3299)- [ ] Generating memory loads/stores (stack and heap)
- [x] Add stack support to the interpreter with virtual addresses (#3187)
- [x] Add heap support to the interpreter with virtual addresses (#3302)
- [ ] Add table support to the interpreter with virtual addresses
- [ ] Cross endianness loads/stores
- [ ] Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
jameysharp labeled issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [x] Generating test inputs with control flow
- [x] Generate multiple blocks and basic jump instructions (#3094)
- [x] Generate
br_table
's and other jump table jumps (#3299)- [ ] Generating memory loads/stores (stack and heap)
- [x] Add stack support to the interpreter with virtual addresses (#3187)
- [x] Add heap support to the interpreter with virtual addresses (#3302)
- [ ] Add table support to the interpreter with virtual addresses
- [ ] Cross endianness loads/stores
- [ ] Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
github-actions[bot] commented on issue #3050:
Subscribe to Label Action
cc @fitzgen
<details>
This issue or pull request has been labeled: "fuzzing"Thus the following users have been cc'd because of the following labels:
- fitzgen: fuzzing
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
afonso360 edited issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [x] Generating test inputs with control flow
- [x] Generate multiple blocks and basic jump instructions (#3094)
- [x] Generate
br_table
's and other jump table jumps (#3299)- [ ] Generating memory loads/stores (stack and heap)
- [x] Add stack support to the interpreter with virtual addresses (#3187)
- [x] Add heap support to the interpreter with virtual addresses (#3302)
- [ ] Add table support to the interpreter with virtual addresses (#4433)
- [ ] Generate Stack Load/Store instructions on fuzzgen (#4438)
- [ ] Cross endianness loads/stores
- [ ] Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
afonso360 edited issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [x] Generating test inputs with control flow
- [x] Generate multiple blocks and basic jump instructions (#3094)
- [x] Generate
br_table
's and other jump table jumps (#3299)- [ ] Generating memory loads/stores (stack and heap)
- [x] Add stack support to the interpreter with virtual addresses (#3187)
- [x] Add heap support to the interpreter with virtual addresses (#3302)
- [x] Add table support to the interpreter with virtual addresses (#4433)
- [ ] Generate Stack Load/Store instructions on fuzzgen (#4438)
- [ ] Cross endianness loads/stores
- [ ] Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
afonso360 edited issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [x] Generating test inputs with control flow
- [x] Generate multiple blocks and basic jump instructions (#3094)
- [x] Generate
br_table
's and other jump table jumps (#3299)- [ ] Generating memory loads/stores (stack and heap)
- [x] Add stack support to the interpreter with virtual addresses (#3187)
- [x] Add heap support to the interpreter with virtual addresses (#3302)
- [x] Add table support to the interpreter with virtual addresses (#4433)
- [x] Generate Stack Load/Store instructions on fuzzgen (#4438)
- [ ] Cross endianness loads/stores
- [ ] Generating calls (just to builtins/intrinsics? or just between multiple functions in one testcase?)
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
afonso360 edited issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [x] Generating test inputs with control flow
- [x] Generate multiple blocks and basic jump instructions (#3094)
- [x] Generate
br_table
's and other jump table jumps (#3299)- [ ] Generating memory loads/stores
- [x] Stack
- [x] Interpreter Support (#3187)
- [x] Fuzzer Support (#4438)
- [ ] Heap
- [x] Interpreter Support (#3302)
- [ ] Fuzzer Support
- [ ] Table
- [x] Interpreter Support (#4433)
- [ ] Fuzzer Support
- [ ] Symbols
- [ ] Interpreter Support
- [ ] Fuzzer Support
- [ ] Others
- [ ] Cross endianness loads/stores
- [ ]
notrap
MemFlags- [ ]
aligned
MemFlags- [ ]
readonly
MemFlags- [ ] Generating calls
- [x] Generating LibCall's (#4782)
- [ ] Generating Function Calls
- [ ] Indirect Function Calls
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
afonso360 commented on issue #3050:
We've been having performance issues with fuzzgen. So I decided to implement a statistics framework and measure :science: a bunch of stuff!
Be warned, this comment is really long. These are the notes that I took while doing these improvements.
TL;DR: We have issues with input formats, we waste about 60% of our inputs. Did some improvements that doubled fuzzgen's performance.
Setup
All of these measurements are done on AArch64 (Ampere Altra) running on a single core. Mostly because it was easier for me to run them there.
All the measurements are started from the OSS-Fuzz corpus which is 10744 files (50MB). I also don't pass any
-max_len
flags since the fuzzer is selecting a default of about 1MB which IIRC was what Alex Crichton mentioned OSS-Fuzz uses.#4868 introduces a statistics framework for fuzzgen to allow us to measure what is going on and where we are losing performance.
All of these steps cause a fuzzer input format change, but so does the baseline due to #4867 so hopefully they are all somewhat on equal standing? I'm not sure how to avoid / compensate for this but I don't think it turned out to be too relevant.
For all of these I ran them with the following command:
cargo fuzz run cranelift-fuzzgen --no-default-features -- -runs=100010 -seed=1588797004
Baseline
Baseline is
main
with the following PR's on top:
- cranelift: Implement missing i128 rotates on AArch64 (#4866)
- cranelift: Prepare fuzzgen for AArch64 (#4867)
- fuzzgen: Statistics framework (#4868)
The
aarch64
ones were necessary for stable execution.#99818 REDUCE cov: 30635 ft: 178125 corp: 5452/5451Kb lim: 48702 exec/s: 18 rss: 1510Mb L: 58/45498 MS: 2 EraseBytes-PersAutoDict- DE: "\x01\x00\x00\x00\x00\x00\x00\x0d"- == FuzzGen Statistics ==================== Total Inputs: 100000 Well Formed Inputs: 48458 (48.5%) Total Runs: 76648 Successful Runs: 38549 (50.3% of Total Runs) Timed out Runs: 32679 (42.6% of Total Runs) Traps: stk_ovf: 0 (0.0% of Total Runs) heap_oob: 0 (0.0% of Total Runs) icall_null: 0 (0.0% of Total Runs) int_ovf: 0 (0.0% of Total Runs) unreachable: 0 (0.0% of Total Runs) interrupt: 0 (0.0% of Total Runs) heap_misaligned: 0 (0.0% of Total Runs) bad_sig: 0 (0.0% of Total Runs) int_divz: 5420 (7.1% of Total Runs) bad_toint: 0 (0.0% of Total Runs) table_oob: 0 (0.0% of Total Runs)
Note
Each "Well Formed Input" can have multiple "Runs", each run is an input to a fuzzgen generated function by the interpreter and if successful also a host execution with that same input.This is surprising to me in a bunch of ways!
We are indeed dropping a lot of inputs as malformed (52.5%)
* This is better than I expected, I was thinking we would get something along the lines of 80-90%.
* See @jameysharp comment about possible reasons why42.6% of runs are timeouts!
* I did not expect this!Traps are about 7.1% of runs
* Not great, not terrible, lets look at the 50%'s first
* Also important to note that when a run trap's the entire input is dropped, so this also roughly translates into 7% of inputs being dropped.0% Invalid memory accesses! Awesome!
Applying @jameysharp recent patches
The previous PR's plus:
- jameysharp:test-inputs-length (#4862)
- jameysharp:cranelift-fuzzgen (#4863)
#99897 NEW cov: 30628 ft: 177683 corp: 5448/5433Kb lim: 48702 exec/s: 19 rss: 1735Mb L: 6707/32810 MS: 1 ChangeBinInt- == FuzzGen Statistics ==================== Total Inputs: 100000 Well Formed Inputs: 49089 (49.1%) Total Runs: 155175 Successful Runs: 116304 (75.0% of Total Runs) Timed out Runs: 33061 (21.3% of Total Runs) Traps: heap_oob: 0 (0.0% of Total Runs) heap_misaligned: 0 (0.0% of Total Runs) bad_sig: 0 (0.0% of Total Runs) bad_toint: 0 (0.0% of Total Runs) icall_null: 0 (0.0% of Total Runs) int_ovf: 5 (0.0% of Total Runs) unreachable: 0 (0.0% of Total Runs) int_divz: 5805 (3.7% of Total Runs) stk_ovf: 0 (0.0% of Total Runs) table_oob: 0 (0.0% of Total Runs) interrupt: 0 (0.0% of Total Runs)
This increases the amount of runs that we do by a lot! It also seems neutral on execs/s and coverage.
The percentage of div by zero traps probably goes down because we stop running the input function as soon as one trap is found, and don't execute the rest of the inputs. Timed out runs have the same issue.
Also we now hit a
int_ovf
trap, I suspect we could also reach that without these changes but it makes it more likeley due to more inputs, so that a nice win!Mostly Forward Branching
I haven't yet submitted a PR for this since I want to clean it up a little bit before submitting (WIP commit here).
The idea is to try to always generate forward branches with a small amount of backwards branches. This should avoid the amount of infinite loops we produce.
#99983 REDUCE cov: 29993 ft: 166091 corp: 4265/5351Kb lim: 393617 exec/s: 44 rss: 1316Mb L: 850/393581 MS: 1 EraseBytes- == FuzzGen Statistics ==================== Total Inputs: 100000 Well Formed Inputs: 37680 (37.7%) Total Runs: 792534 Successful Runs: 785981 (99.2% of Total Runs) Timed out Runs: 893 (0.1% of Total Runs) Traps: heap_oob: 0 (0.0% of Total Runs) heap_misaligned: 0 (0.0% of Total Runs) bad_sig: 0 (0.0% of Total Runs) bad_toint: 0 (0.0% of Total Runs) int_divz: 5660 (0.7% of Total Runs) unreachable: 0 (0.0% of Total Runs) stk_ovf: 0 (0.0% of Total Runs) table_oob: 0 (0.0% of Total Runs) icall_null: 0 (0.0% of Total Runs) interrupt: 0 (0.0% of Total Runs) int_ovf: 0 (0.0% of Total Runs)
A 130% improvement in execs/s (19 -> 44)! We were indeed wasting a lot of time in timed out runs.
We now timeout on about 2.3% of well formed inputs, which is a lot better. The 0.1% number is misleading because as soon as a timeout is found we instantly drop the entire input, so we find much fewer of these cases when comparing to runs.
The number of runs has exploded (785k)! Eventually we may want to tune this if we find we are spending too much time in runs vs inputs.
Better Instruction Selection
This is @jameysharp's WIP commit that tries to make the input format more resilient by never trying to insert opcodes unless we have the input data types for it.
See this comment for a longer explanation
#99998 REDUCE cov: 30503 ft: 168926 corp: 4522/5062Kb lim: 1048576 exec/s: 29 rss: 1548Mb L: 367/394877 MS: 5 ChangeBit-ChangeBit-ChangeByte-CMP-EraseBytes- DE: "\x01\x00\x00q"- == FuzzGen Statistics ==================== Total Inputs: 100000 Well Formed Inputs: 40927 (40.9%) Total Runs: 739299 Successful Runs: 726769 (98.3% of Total Runs) Timed out Runs: 416 (0.1% of Total Runs) Traps: int_ovf: 9 (0.0% of Total Runs) bad_sig: 0 (0.0% of Total Runs) icall_null: 0 (0.0% of Total Runs) table_oob: 0 (0.0% of Total Runs) int_divz: 12105 (1.6% of Total Runs) heap_oob: 0 (0.0% of Total Runs) heap_misaligned: 0 (0.0% of Total Runs) stk_ovf: 0 (0.0% of Total Runs) unreachable: 0 (0.0% of Total Runs) bad_toint: 0 (0.0% of Total Runs) interrupt: 0 (0.0% of Total Runs)
A 3% improvement in Well Formed Inputs (I also ran this a few more times but it was always within 3-4%)
I don't understand why our
int_divz
numbers exploded here (Aprox 29% of "Well Formed Inputs"!), but its something we definitely have to improve.Please don't derive any sort of performance measurements from this, I implemented it in probably the worst way possible. It was just to check the "Well Formed Inputs" difference.
End notes
Here's the branch that I used for all of these, In the mean time I've iterated on some of those and separated them into individual PR's.
Generating "Well Formed Inputs" remains a big issue and I think its probably going to take a few iterations improving various parts of our input format, like @jameysharp did for instruction selection (Thanks!)
* We lose about 60% of our inputs to this!
* Is there a good way to collect stats about where we are trying to select something that is not available? Or whyarbitrary
decided our input wasn't valid?With some of the other issues solved
int_divz
is now also a big issue.
* We get 12105 on a Well Formed Input size of 40927 (29%)!
* This means that 29% of input programs were dropped due to a div by zero, and I would guess that it was on the first execution for a lot of them. Which means that we potentially never even run them in the host.
afonso360 edited a comment on issue #3050:
We've been having performance issues with fuzzgen. So I decided to implement a statistics framework and measure :science: a bunch of stuff!
Be warned, this comment is really long. These are the notes that I took while doing these improvements.
TL;DR: We have issues with input formats, we waste about 60% of our inputs. Did some improvements that doubled fuzzgen's performance.
<details>
<summary><h2>Setup</h2></summaryAll of these measurements are done on AArch64 (Ampere Altra) running on a single core. Mostly because it was easier for me to run them there.
All the measurements are started from the OSS-Fuzz corpus which is 10744 files (50MB). I also don't pass any
-max_len
flags since the fuzzer is selecting a default of about 1MB which IIRC was what Alex Crichton mentioned OSS-Fuzz uses.#4868 introduces a statistics framework for fuzzgen to allow us to measure what is going on and where we are losing performance.
All of these steps cause a fuzzer input format change, but so does the baseline due to #4867 so hopefully they are all somewhat on equal standing? I'm not sure how to avoid / compensate for this but I don't think it turned out to be too relevant.
For all of these I ran them with the following command:
cargo fuzz run cranelift-fuzzgen --no-default-features -- -runs=100010 -seed=1588797004
</details>
<details>
<summary><h2>Baseline</h2></summary>Baseline is
main
with the following PR's on top:
- cranelift: Implement missing i128 rotates on AArch64 (#4866)
- cranelift: Prepare fuzzgen for AArch64 (#4867)
- fuzzgen: Statistics framework (#4868)
The
aarch64
ones were necessary for stable execution.#99818 REDUCE cov: 30635 ft: 178125 corp: 5452/5451Kb lim: 48702 exec/s: 18 rss: 1510Mb L: 58/45498 MS: 2 EraseBytes-PersAutoDict- DE: "\x01\x00\x00\x00\x00\x00\x00\x0d"- == FuzzGen Statistics ==================== Total Inputs: 100000 Well Formed Inputs: 48458 (48.5%) Total Runs: 76648 Successful Runs: 38549 (50.3% of Total Runs) Timed out Runs: 32679 (42.6% of Total Runs) Traps: stk_ovf: 0 (0.0% of Total Runs) heap_oob: 0 (0.0% of Total Runs) icall_null: 0 (0.0% of Total Runs) int_ovf: 0 (0.0% of Total Runs) unreachable: 0 (0.0% of Total Runs) interrupt: 0 (0.0% of Total Runs) heap_misaligned: 0 (0.0% of Total Runs) bad_sig: 0 (0.0% of Total Runs) int_divz: 5420 (7.1% of Total Runs) bad_toint: 0 (0.0% of Total Runs) table_oob: 0 (0.0% of Total Runs)
Note
Each "Well Formed Input" can have multiple "Runs", each run is an input to a fuzzgen generated function by the interpreter and if successful also a host execution with that same input.This is surprising to me in a bunch of ways!
We are indeed dropping a lot of inputs as malformed (52.5%)
* This is better than I expected, I was thinking we would get something along the lines of 80-90%.
* See @jameysharp comment about possible reasons why42.6% of runs are timeouts!
* I did not expect this!Traps are about 7.1% of runs
* Not great, not terrible, lets look at the 50%'s first
* Also important to note that when a run trap's the entire input is dropped, so this also roughly translates into 7% of inputs being dropped.0% Invalid memory accesses! Awesome!
</details>
<details>
<summary><h2> Applying @jameysharp recent patches</h2></summary>The previous PR's plus:
- jameysharp:test-inputs-length (#4862)
- jameysharp:cranelift-fuzzgen (#4863)
#99897 NEW cov: 30628 ft: 177683 corp: 5448/5433Kb lim: 48702 exec/s: 19 rss: 1735Mb L: 6707/32810 MS: 1 ChangeBinInt- == FuzzGen Statistics ==================== Total Inputs: 100000 Well Formed Inputs: 49089 (49.1%) Total Runs: 155175 Successful Runs: 116304 (75.0% of Total Runs) Timed out Runs: 33061 (21.3% of Total Runs) Traps: heap_oob: 0 (0.0% of Total Runs) heap_misaligned: 0 (0.0% of Total Runs) bad_sig: 0 (0.0% of Total Runs) bad_toint: 0 (0.0% of Total Runs) icall_null: 0 (0.0% of Total Runs) int_ovf: 5 (0.0% of Total Runs) unreachable: 0 (0.0% of Total Runs) int_divz: 5805 (3.7% of Total Runs) stk_ovf: 0 (0.0% of Total Runs) table_oob: 0 (0.0% of Total Runs) interrupt: 0 (0.0% of Total Runs)
This increases the amount of runs that we do by a lot! It also seems neutral on execs/s and coverage.
The percentage of div by zero traps probably goes down because we stop running the input function as soon as one trap is found, and don't execute the rest of the inputs. Timed out runs have the same issue.
Also we now hit a
int_ovf
trap, I suspect we could also reach that without these changes but it makes it more likeley due to more inputs, so that a nice win!
</details><details>
<summary><h2>Mostly Forward Branching</h2></summary>I haven't yet submitted a PR for this since I want to clean it up a little bit before submitting (WIP commit here).
The idea is to try to always generate forward branches with a small amount of backwards branches. This should avoid the amount of infinite loops we produce.
#99983 REDUCE cov: 29993 ft: 166091 corp: 4265/5351Kb lim: 393617 exec/s: 44 rss: 1316Mb L: 850/393581 MS: 1 EraseBytes- == FuzzGen Statistics ==================== Total Inputs: 100000 Well Formed Inputs: 37680 (37.7%) Total Runs: 792534 Successful Runs: 785981 (99.2% of Total Runs) Timed out Runs: 893 (0.1% of Total Runs) Traps: heap_oob: 0 (0.0% of Total Runs) heap_misaligned: 0 (0.0% of Total Runs) bad_sig: 0 (0.0% of Total Runs) bad_toint: 0 (0.0% of Total Runs) int_divz: 5660 (0.7% of Total Runs) unreachable: 0 (0.0% of Total Runs) stk_ovf: 0 (0.0% of Total Runs) table_oob: 0 (0.0% of Total Runs) icall_null: 0 (0.0% of Total Runs) interrupt: 0 (0.0% of Total Runs) int_ovf: 0 (0.0% of Total Runs)
A 130% improvement in execs/s (19 -> 44)! We were indeed wasting a lot of time in timed out runs.
We now timeout on about 2.3% of well formed inputs, which is a lot better. The 0.1% number is misleading because as soon as a timeout is found we instantly drop the entire input, so we find much fewer of these cases when comparing to runs.
The number of runs has exploded (785k)! Eventually we may want to tune this if we find we are spending too much time in runs vs inputs.
</details>
<details>
<summary><h2>Better Instruction Selection</h2></summary>This is @jameysharp's WIP commit that tries to make the input format more resilient by never trying to insert opcodes unless we have the input data types for it.
See this comment for a longer explanation
#99998 REDUCE cov: 30503 ft: 168926 corp: 4522/5062Kb lim: 1048576 exec/s: 29 rss: 1548Mb L: 367/394877 MS: 5 ChangeBit-ChangeBit-ChangeByte-CMP-EraseBytes- DE: "\x01\x00\x00q"- == FuzzGen Statistics ==================== Total Inputs: 100000 Well Formed Inputs: 40927 (40.9%) Total Runs: 739299 Successful Runs: 726769 (98.3% of Total Runs) Timed out Runs: 416 (0.1% of Total Runs) Traps: int_ovf: 9 (0.0% of Total Runs) bad_sig: 0 (0.0% of Total Runs) icall_null: 0 (0.0% of Total Runs) table_oob: 0 (0.0% of Total Runs) int_divz: 12105 (1.6% of Total Runs) heap_oob: 0 (0.0% of Total Runs) heap_misaligned: 0 (0.0% of Total Runs) stk_ovf: 0 (0.0% of Total Runs) unreachable: 0 (0.0% of Total Runs) bad_toint: 0 (0.0% of Total Runs) interrupt: 0 (0.0% of Total Runs)
A 3% improvement in Well Formed Inputs (I also ran this a few more times but it was always within 3-4%)
I don't understand why our
int_divz
numbers exploded here (Aprox 29% of "Well Formed Inputs"!), but its something we definitely have to improve.Please don't derive any sort of performance measurements from this, I implemented it in probably the worst way possible. It was just to check the "Well Formed Inputs" difference.
</details>
End notes
Here's the branch that I used for all of these, In the mean time I've iterated on some of those and separated them into individual PR's.
Generating "Well Formed Inputs" remains a big issue and I think its probably going to take a few iterations improving various parts of our input format, like @jameysharp did for instruction selection (Thanks!)
* We lose about 60% of our inputs to this!
* Is there a good way to collect stats about where we are trying to select something that is not available? Or whyarbitrary
decided our input wasn't valid?With some of the other issues solved
int_divz
is now also a big issue.
* We get 12105 on a Well Formed Input size of 40927 (29%)!
* This means that 29% of input programs were dropped due to a div by zero, and I would guess that it was on the first execution for a lot of them. Which means that we potentially never even run them in the host.
afonso360 edited a comment on issue #3050:
We've been having performance issues with fuzzgen. So I decided to implement a statistics framework and measure :science: a bunch of stuff!
Be warned, this comment is really long. These are the notes that I took while doing these improvements.
TL;DR: We have issues with input formats, we waste about 60% of our inputs. Did some improvements that doubled fuzzgen's performance.
<details>
<summary><h2>Setup</h2></summaryAll of these measurements are done on AArch64 (Ampere Altra) running on a single core. Mostly because it was easier for me to run them there.
All the measurements are started from the OSS-Fuzz corpus which is 10744 files (50MB). I also don't pass any
-max_len
flags since the fuzzer is selecting a default of about 1MB which IIRC was what Alex Crichton mentioned OSS-Fuzz uses.#4868 introduces a statistics framework for fuzzgen to allow us to measure what is going on and where we are losing performance.
All of these steps cause a fuzzer input format change, but so does the baseline due to #4867 so hopefully they are all somewhat on equal standing? I'm not sure how to avoid / compensate for this but I don't think it turned out to be too relevant.
For all of these I ran them with the following command:
cargo fuzz run cranelift-fuzzgen --no-default-features -- -runs=100010 -seed=1588797004
</details>
<details>
<summary><h2>Baseline</h2></summary>Baseline is
main
with the following PR's on top:
- cranelift: Implement missing i128 rotates on AArch64 (#4866)
- cranelift: Prepare fuzzgen for AArch64 (#4867)
- fuzzgen: Statistics framework (#4868)
The
aarch64
ones were necessary for stable execution.#99818 REDUCE cov: 30635 ft: 178125 corp: 5452/5451Kb lim: 48702 exec/s: 18 rss: 1510Mb L: 58/45498 MS: 2 EraseBytes-PersAutoDict- DE: "\x01\x00\x00\x00\x00\x00\x00\x0d"- == FuzzGen Statistics ==================== Total Inputs: 100000 Well Formed Inputs: 48458 (48.5%) Total Runs: 76648 Successful Runs: 38549 (50.3% of Total Runs) Timed out Runs: 32679 (42.6% of Total Runs) Traps: stk_ovf: 0 (0.0% of Total Runs) heap_oob: 0 (0.0% of Total Runs) icall_null: 0 (0.0% of Total Runs) int_ovf: 0 (0.0% of Total Runs) unreachable: 0 (0.0% of Total Runs) interrupt: 0 (0.0% of Total Runs) heap_misaligned: 0 (0.0% of Total Runs) bad_sig: 0 (0.0% of Total Runs) int_divz: 5420 (7.1% of Total Runs) bad_toint: 0 (0.0% of Total Runs) table_oob: 0 (0.0% of Total Runs)
Note
Each "Well Formed Input" can have multiple "Runs", each run is an input to a fuzzgen generated function by the interpreter and if successful also a host execution with that same input.This is surprising to me in a bunch of ways!
We are indeed dropping a lot of inputs as malformed (52.5%)
* This is better than I expected, I was thinking we would get something along the lines of 80-90%.
* See @jameysharp comment about possible reasons why42.6% of runs are timeouts!
* I did not expect this!Traps are about 7.1% of runs
* Not great, not terrible, lets look at the 50%'s first
* Also important to note that when a run trap's the entire input is dropped, so this also roughly translates into 7% of inputs being dropped.0% Invalid memory accesses! Awesome!
</details>
<details>
<summary><h2> Applying @jameysharp recent patches</h2></summary>The previous PR's plus:
- jameysharp:test-inputs-length (#4862)
- jameysharp:cranelift-fuzzgen (#4863)
#99897 NEW cov: 30628 ft: 177683 corp: 5448/5433Kb lim: 48702 exec/s: 19 rss: 1735Mb L: 6707/32810 MS: 1 ChangeBinInt- == FuzzGen Statistics ==================== Total Inputs: 100000 Well Formed Inputs: 49089 (49.1%) Total Runs: 155175 Successful Runs: 116304 (75.0% of Total Runs) Timed out Runs: 33061 (21.3% of Total Runs) Traps: heap_oob: 0 (0.0% of Total Runs) heap_misaligned: 0 (0.0% of Total Runs) bad_sig: 0 (0.0% of Total Runs) bad_toint: 0 (0.0% of Total Runs) icall_null: 0 (0.0% of Total Runs) int_ovf: 5 (0.0% of Total Runs) unreachable: 0 (0.0% of Total Runs) int_divz: 5805 (3.7% of Total Runs) stk_ovf: 0 (0.0% of Total Runs) table_oob: 0 (0.0% of Total Runs) interrupt: 0 (0.0% of Total Runs)
This increases the amount of runs that we do by a lot! It also seems neutral on execs/s and coverage.
The percentage of div by zero traps probably goes down because we stop running the input function as soon as one trap is found, and don't execute the rest of the inputs. Timed out runs have the same issue.
Also we now hit a
int_ovf
trap, I suspect we could also reach that without these changes but it makes it more likeley due to more inputs, so that a nice win!
</details><details>
<summary><h2>Mostly Forward Branching</h2></summary>I haven't yet submitted a PR for this since I want to clean it up a little bit before submitting (WIP commit here).
The idea is to try to always generate forward branches with a small amount of backwards branches. This should reduce the amount of infinite loops we produce.
#99983 REDUCE cov: 29993 ft: 166091 corp: 4265/5351Kb lim: 393617 exec/s: 44 rss: 1316Mb L: 850/393581 MS: 1 EraseBytes- == FuzzGen Statistics ==================== Total Inputs: 100000 Well Formed Inputs: 37680 (37.7%) Total Runs: 792534 Successful Runs: 785981 (99.2% of Total Runs) Timed out Runs: 893 (0.1% of Total Runs) Traps: heap_oob: 0 (0.0% of Total Runs) heap_misaligned: 0 (0.0% of Total Runs) bad_sig: 0 (0.0% of Total Runs) bad_toint: 0 (0.0% of Total Runs) int_divz: 5660 (0.7% of Total Runs) unreachable: 0 (0.0% of Total Runs) stk_ovf: 0 (0.0% of Total Runs) table_oob: 0 (0.0% of Total Runs) icall_null: 0 (0.0% of Total Runs) interrupt: 0 (0.0% of Total Runs) int_ovf: 0 (0.0% of Total Runs)
A 130% improvement in execs/s (19 -> 44)! We were indeed wasting a lot of time in timed out runs.
We now timeout on about 2.3% of well formed inputs, which is a lot better. The 0.1% number is misleading because as soon as a timeout is found we instantly drop the entire input, so we find much fewer of these cases when comparing to runs.
The number of runs has exploded (785k)! Eventually we may want to tune this if we find we are spending too much time in runs vs inputs.
</details>
<details>
<summary><h2>Better Instruction Selection</h2></summary>This is @jameysharp's WIP commit that tries to make the input format more resilient by never trying to insert opcodes unless we have the input data types for it.
See this comment for a longer explanation
#99998 REDUCE cov: 30503 ft: 168926 corp: 4522/5062Kb lim: 1048576 exec/s: 29 rss: 1548Mb L: 367/394877 MS: 5 ChangeBit-ChangeBit-ChangeByte-CMP-EraseBytes- DE: "\x01\x00\x00q"- == FuzzGen Statistics ==================== Total Inputs: 100000 Well Formed Inputs: 40927 (40.9%) Total Runs: 739299 Successful Runs: 726769 (98.3% of Total Runs) Timed out Runs: 416 (0.1% of Total Runs) Traps: int_ovf: 9 (0.0% of Total Runs) bad_sig: 0 (0.0% of Total Runs) icall_null: 0 (0.0% of Total Runs) table_oob: 0 (0.0% of Total Runs) int_divz: 12105 (1.6% of Total Runs) heap_oob: 0 (0.0% of Total Runs) heap_misaligned: 0 (0.0% of Total Runs) stk_ovf: 0 (0.0% of Total Runs) unreachable: 0 (0.0% of Total Runs) bad_toint: 0 (0.0% of Total Runs) interrupt: 0 (0.0% of Total Runs)
A 3% improvement in Well Formed Inputs (I also ran this a few more times but it was always within 3-4%)
I don't understand why our
int_divz
numbers exploded here (Aprox 29% of "Well Formed Inputs"!), but its something we definitely have to improve.Please don't derive any sort of performance measurements from this, I implemented it in probably the worst way possible. It was just to check the "Well Formed Inputs" difference.
</details>
End notes
Here's the branch that I used for all of these, In the mean time I've iterated on some of those and separated them into individual PR's.
Generating "Well Formed Inputs" remains a big issue and I think its probably going to take a few iterations improving various parts of our input format, like @jameysharp did for instruction selection (Thanks!)
* We lose about 60% of our inputs to this!
* Is there a good way to collect stats about where we are trying to select something that is not available? Or whyarbitrary
decided our input wasn't valid?With some of the other issues solved
int_divz
is now also a big issue.
* We get 12105 on a Well Formed Input size of 40927 (29%)!
* This means that 29% of input programs were dropped due to a div by zero, and I would guess that it was on the first execution for a lot of them. Which means that we potentially never even run them in the host.
jameysharp commented on issue #3050:
I am so excited about all this work you did over the weekend! I need to dig into it all. But until then, I'll reply to https://github.com/bytecodealliance/wasmtime/pull/4824#issuecomment-1236408856 here:
You could still use something like the current opcodes table, as long as you have a way of filtering it to ensure that all the resources needed are available for each opcode. I don't think just input/output type signatures are sufficient for that.
Is there any opcode that you think could be problematic? I'm not seeing why we couldn't do that based on signature.
Any instructions with at least one operand which is not a "value". For example, stack load/store, with their need for a stack slot operand. Or function calls, which need a function reference, and also have a type signature that depends on the selected function reference. My WIP patch handles stack slots in https://github.com/bytecodealliance/wasmtime/commit/0c9689ba978276bd925b47db0d20623a118699fb#diff-226e0f113ddbbcedaa5265109a951e52d30aa35307537684f56a5e6bedc9ba0eR369-R390 but I didn't get to handling calls yet.
I think this is important because I suspect that we're currently discarding a lot of inputs due to, say, not generating any I128 variables up front but then trying to insert an I128 "add", or things like that. I'm betting that we'll find more bugs with the same number of test executions if we fix this. But I haven't done any measurements to check this guess.
Either way, I think it's nice to apply the patches that allow just borrowing slices everywhere. So I've opened #4863 with those. The remaining work-in-progress part is no longer on that branch but is still accessible as 0c9689b.I did some measurements on these! With #4863 and the WIP commit we get a 3-4% improvement in the number of valid inputs that we generate (vs total inputs). While these are not earth shattering numbers, I think we definitely should either commit this or something like it.
I hope #4863 by itself didn't affect any of these numbers. It may have changed which stack slot is selected in some tests but otherwise if it changed how things are generated then I probably introduced a bug. :sweat_smile:
But it looks like you measured #4862 and #4863 together and saw a 1% increase in well-formed inputs. I don't think that's surprising from either #4862 or random noise, so that's reassuring.
As for "Better Instruction Selection": by my count, going from 38k well-formed inputs to 41k is an 8% increase (40927 / 37680 is 108%). Going from 38% to 41% is a difference of 3 percentage points but I'm not sure that's the best way to measure that change.
The "Mostly Forward Branching" work has very promising-looking results!
Also important to note that when a run trap's the entire input is dropped, so this also roughly translates into 7% of inputs being dropped.
How about continuing on to the remaining test inputs for the given function? If any of them don't trap then at least we've successfully tested something.
As a bigger project, maybe we should try doing something like wasm-smith's new no-trapping mode.
Is there a good way to collect stats about where we are trying to select something that is not available? Or why
arbitrary
decided our input wasn't valid?I'm not sure yet; I'm thinking about it. Some observations that might help:
- There's a general goal in
arbitrary
that it shouldn't fail (https://github.com/rust-fuzz/arbitrary/commit/1d2f65360f634968b5525f4d6430a88a376277f0).Unstructured::choose
may be the only method that ever actually fails, though I haven't checked. If so, we primarily need to audit for calls tochoose
where we might pass in an empty slice, and avoid doing that. We can also try replacing?
operators with.unwrap()
and see if any fail. Or maybe print a stack trace before returning the error, with std::backtrace::Backtrace.- I've observed that we discard an input if CLIF validation fails after the NaN canonicalization pass. I think we should defer canonicalization until we're about to run the function, instead of while we're generating it. And probably panic instead of ignoring the error.
- In general, let's try to minimize the number of ways that test-case generation can fail, so that
cargo fuzz fmt
is reliable.
afonso360 commented on issue #3050:
There's a general goal in arbitrary that it shouldn't fail (rust-fuzz/arbitrary@1d2f653). Unstructured::choose may be the only method that ever actually fails, though I haven't checked. If so, we primarily need to audit for calls to choose where we might pass in an empty slice, and avoid doing that. We can also try replacing ? operators with .unwrap() and see if any fail. Or maybe print a stack trace before returning the error, with std::backtrace::Backtrace.
So I did this last one, I recorded backtraces for
Unstructured::choose
where we fail due to a empty choice. And from the numbers it appears to be most of the cases where fail to generate an input. It tracks with previous measurements of being about 50% of the inputs.See this branch for the backtrace printing.
I used this script to group and print the backtraces.With 150k execs we got the following results: (Reordered for better readability)
Loaded 71331 stacks First Line duplicates 55.8 - Count: 39787 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 15.0 - Count: 10731 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_target_block 13.4 - Count: 9554 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::stack_slot_with_size 9.8 - Count: 6974 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_jumptables 3.3 - Count: 2381 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_switch 2.2 - Count: 1561 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_br_table 0.5 - Count: 343 - Line: 0: cranelift_fuzzgen::function_generator::insert_call
<details>
<summary>
I also tried grouping by the first two stack frames, and about 30% come from <code>insert_opcode</code> which is not unexpected.
</summary>Two Line duplicates 29.6 - Count: 21108 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::insert_opcode 12.8 - Count: 9163 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_target_block 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_bricmp 9.8 - Count: 6974 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_jumptables 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate 7.9 - Count: 5615 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::stack_slot_with_size 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_load_store_address 7.3 - Count: 5209 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_bricmp 4.8 - Count: 3401 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::stack_slot_with_size 1: cranelift_fuzzgen::function_generator::insert_stack_store 4.0 - Count: 2833 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::insert_stack_store 3.7 - Count: 2642 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter 3.3 - Count: 2381 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_switch 1: cranelift_fuzzgen::function_generator::FunctionGenerator::finalize_block 2.2 - Count: 1603 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::insert_load_store 2.2 - Count: 1561 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_br_table 1: cranelift_fuzzgen::function_generator::FunctionGenerator::finalize_block 1.9 - Count: 1389 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: alloc::vec::Vec<T,A>::extend_desugared 1.7 - Count: 1236 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_switch 1.6 - Count: 1144 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_br_table 1.3 - Count: 940 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_target_block 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_br 1.1 - Count: 769 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_br 0.9 - Count: 628 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_target_block 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_jump 0.8 - Count: 564 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::insert_const 0.8 - Count: 550 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::insert_cmp 0.8 - Count: 538 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::stack_slot_with_size 1: cranelift_fuzzgen::function_generator::insert_stack_load 0.7 - Count: 526 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate 0.5 - Count: 343 - Line: 0: cranelift_fuzzgen::function_generator::insert_call 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_instructions 0.3 - Count: 214 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::insert_stack_load
</details>
Fixing
get_variable_of_type
Most of the issues with
get_variable_of_type
should go away when we start doing proper instruction selection.If there are tail uses elsewhere I think we can change
get_variable_of_type
to generate a variable on demand with a random default value and making sure this variable is not used anywhere else (by not including it in the variable pool).This should pretty much cover all scenarios.
Fixing issues with invalid blocks / jumptables
I've updated the "Mostly forward branching" PR to entirely fix these issues by being smarter about which terminators we can insert.
See: https://github.com/bytecodealliance/wasmtime/pull/4894
insert_call
I think we can easily fix this by not generating a call if we don't have a func ref, but its only about 0.5% so I'm going to leave this for later.
Any instructions with at least one operand which is not a "value". For example, stack load/store, with their need for a stack slot operand. Or function calls, which need a function reference, and also have a type signature that depends on the selected function reference. My WIP patch handles stack slots in 0c9689b#diff-226e0f113ddbbcedaa5265109a951e52d30aa35307537684f56a5e6bedc9ba0eR369-R390 but I didn't get to handling calls yet.
What do you think about doing something similar to what you did in the WIP patch but with a base on the opcode signature table.
After generating the resources for the function we filter the opcodes based on signature. Most opcodes are fairly easy, we just need to ensure that we have the right variable types.
For
call
andcall_indirect
, we can also filter out function references where we don't have the correct types. If we end up with 0 valid references we can also filter out these opcodes from ever being selected.
stack_load
/stack_stores
are also similar in that we can filter them out based on if we have stack slots with enough space for the type and if we have variables to store or load those types.I think this is pretty much what you were doing in the WIP patch, but instead of generating the options we filter the options. This is also closer to how I imagine it working when we move to generating the opcodes from
cranelift-meta
.I'll try to implement something along these lines during the week.
I've observed that we discard an input if CLIF validation fails after the NaN canonicalization pass. I think we should defer canonicalization until we're about to run the function, instead of while we're generating it. And probably panic instead of ignoring the error.
Panicking sounds like a good idea because it should never fail there. (I've opened #4896 for this)
I'd still like to keep canonicalization on the
fuzzgen
crate. If anyone else wants to run random clif it would be nice to be able to depend on only that crate. Insidefuzzgen
we can run those passes later, but not by much we only run them after fully generating the function.As a bigger project, maybe we should try doing something like wasm-smith's new no-trapping mode.
I think we should still allow some degree of trapping, we've caught a bunch of issues with the interpreter due to this. However I also think we shou
[message truncated]
afonso360 edited a comment on issue #3050:
There's a general goal in arbitrary that it shouldn't fail (rust-fuzz/arbitrary@1d2f653). Unstructured::choose may be the only method that ever actually fails, though I haven't checked. If so, we primarily need to audit for calls to choose where we might pass in an empty slice, and avoid doing that. We can also try replacing ? operators with .unwrap() and see if any fail. Or maybe print a stack trace before returning the error, with std::backtrace::Backtrace.
So I did this last one, I recorded backtraces for
Unstructured::choose
where we fail due to a empty choice. And from the numbers it appears to be most of the cases where fail to generate an input. It tracks with previous measurements of being about 50% of the inputs.See this branch for the backtrace printing.
I used this script to group and print the backtraces.With 150k execs we got the following results: (Reordered for better readability)
Loaded 71331 stacks First Line duplicates 55.8 - Count: 39787 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 15.0 - Count: 10731 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_target_block 13.4 - Count: 9554 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::stack_slot_with_size 9.8 - Count: 6974 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_jumptables 3.3 - Count: 2381 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_switch 2.2 - Count: 1561 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_br_table 0.5 - Count: 343 - Line: 0: cranelift_fuzzgen::function_generator::insert_call
<details>
<summary>
I also tried grouping by the first two stack frames, and about 30% come from <code>insert_opcode</code> which is not unexpected.
</summary>Two Line duplicates 29.6 - Count: 21108 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::insert_opcode 12.8 - Count: 9163 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_target_block 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_bricmp 9.8 - Count: 6974 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_jumptables 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate 7.9 - Count: 5615 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::stack_slot_with_size 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_load_store_address 7.3 - Count: 5209 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_bricmp 4.8 - Count: 3401 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::stack_slot_with_size 1: cranelift_fuzzgen::function_generator::insert_stack_store 4.0 - Count: 2833 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::insert_stack_store 3.7 - Count: 2642 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter 3.3 - Count: 2381 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_switch 1: cranelift_fuzzgen::function_generator::FunctionGenerator::finalize_block 2.2 - Count: 1603 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::insert_load_store 2.2 - Count: 1561 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_br_table 1: cranelift_fuzzgen::function_generator::FunctionGenerator::finalize_block 1.9 - Count: 1389 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: alloc::vec::Vec<T,A>::extend_desugared 1.7 - Count: 1236 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_switch 1.6 - Count: 1144 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_br_table 1.3 - Count: 940 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_target_block 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_br 1.1 - Count: 769 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_br 0.9 - Count: 628 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_target_block 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_jump 0.8 - Count: 564 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::insert_const 0.8 - Count: 550 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::insert_cmp 0.8 - Count: 538 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::stack_slot_with_size 1: cranelift_fuzzgen::function_generator::insert_stack_load 0.7 - Count: 526 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate 0.5 - Count: 343 - Line: 0: cranelift_fuzzgen::function_generator::insert_call 1: cranelift_fuzzgen::function_generator::FunctionGenerator::generate_instructions 0.3 - Count: 214 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type 1: cranelift_fuzzgen::function_generator::insert_stack_load
</details>
Fixing
get_variable_of_type
Most of the issues with
get_variable_of_type
should go away when we start doing proper instruction selection.If there are tail uses elsewhere I think we can change
get_variable_of_type
to generate a variable on demand with a random default value and making sure this variable is not used anywhere else (by not including it in the variable pool).This should pretty much cover all scenarios.
Fixing issues with invalid blocks / jumptables
I've updated the "Mostly forward branching" PR to entirely fix these issues by being smarter about which terminators we can insert.
See: https://github.com/bytecodealliance/wasmtime/pull/4894
insert_call
I think we can easily fix this by not generating a call if we don't have a func ref, but its only about 0.5% so I'm going to leave this for later.
Any instructions with at least one operand which is not a "value". For example, stack load/store, with their need for a stack slot operand. Or function calls, which need a function reference, and also have a type signature that depends on the selected function reference. My WIP patch handles stack slots in 0c9689b#diff-226e0f113ddbbcedaa5265109a951e52d30aa35307537684f56a5e6bedc9ba0eR369-R390 but I didn't get to handling calls yet.
What do you think about doing something similar to what you did in the WIP patch but with a base on the opcode signature table.
After generating the resources for the function we filter the opcodes based on signature. Most opcodes are fairly easy, we just need to ensure that we have the right variable types.
For
call
andcall_indirect
, we can also filter out function references where we don't have the correct types. If we end up with 0 valid references we can also filter out these opcodes from ever being selected.
stack_load
/stack_stores
are also similar in that we can filter them out based on if we have stack slots with enough space for the type and if we have variables to store or load those types.I think this is pretty much what you were doing in the WIP patch, but instead of generating the options we filter the options. This is also closer to how I imagine it working when we move to generating the opcodes from
cranelift-meta
.I'll try to implement something along these lines during the week.
I've observed that we discard an input if CLIF validation fails after the NaN canonicalization pass. I think we should defer canonicalization until we're about to run the function, instead of while we're generating it. And probably panic instead of ignoring the error.
Panicking sounds like a good idea because it should never fail there. (I've opened #4896 for this)
I'd still like to keep canonicalization on the
fuzzgen
crate. If anyone else wants to run random clif it would be nice to be able to depend on only that crate. Insidefuzzgen
we can run those passes later, but not by much we only run them after fully generating the function.Edit: Reading https://github.com/bytecodealliance/wasmtime/pull/4896#pullrequestreview-1104415283 made me realize why this would be important.
As a bigger project, maybe we should try doing something like [wasm-smith's new no-trapping mode](https://github.com/bytecodealliance/wasm-tools/commit/f229c4fa03d45884f95ab392f672f4477c5ab4df
[message truncated]
jameysharp commented on issue #3050:
So I did this last one, I recorded backtraces for
Unstructured::choose
where we fail due to a empty choice. [...] With 150k execs we got the following results:```
Loaded 71331 stacksFirst Line duplicates
55.8 - Count: 39787 - Line: 0: cranelift_fuzzgen::function_generator::FunctionGenerator::get_variable_of_type
...
```I also tried grouping by the first two stack frames, and about 30% come from
insert_opcode
which is not unexpected.This is so cool!
The numbers in #4894 are confusing me though. The number of valid inputs doesn't agree with the number of stack traces. For example, the baseline reported in that PR had 31,622 invalid inputs according to the stats, but only 20,139 stack traces. Were those from different runs, or are there invalid inputs that are not being measured?
Also: of failures during the baseline run, roughly 27% were due to choosing an impossible terminator, and the other 73% were from instruction selection. After the PR, you captured about 71% as many stack traces, which is close enough to 73% to seem plausibly like variation due to random chance.
So I'd assumed the PR would have 71-73% as many invalid inputs. Instead it had about 92% as many invalid inputs. So the effect of the PR on the number of valid inputs was rather smaller than I thought it would be. I think this is consistent with a bunch of invalid inputs not capturing stack traces.
I've written a throwaway patch for the
arbitrary
crate that applies the same backtrace logic you used, but does it in the few places where that crate can actually construct a new error. Apply trace-arbitrary-errors.txt to a local clone of rust-fuzz/arbitrary, then tell Cargo to override dependencies with your copy.You should get similar results as your current branch without having to wrap your
choose!
macro around every single thing, except for a couple changes that I think might be informative. For one, this means the stack trace includes calls insidearbitrary
, so you'll want to compare more stack frames. I also changed it to report which error occurred (EmptyChoose
,NotEnoughData
, orIncorrectFormat
), although you can also tell from the stack trace, so feel free to remove that.This still misses stack traces for errors which originated from
DataValue::from_integer
(which I think shouldn't fail) or other calls that don't invokearbitrary
(but with #4896 merged, I can't find any others). But I hope it will be very close to getting a stack trace for every invalid input.By the way, do you need to limit the number of stack frames at all? I'd expect comparing the entire stack trace to work fine in this case because there aren't many control flow paths in cranelift-fuzzgen. So on Linux/Cygwin/WSL I'd do something like
sha1sum stack/* | sort | uniq -cw40 | sort -rn
to get a list of representative stack traces sorted by how often they occurred.I think we can easily fix [insert_call] by not generating a call if we don't have a func ref, but its only about 0.5% so I'm going to leave this for later.
It may be easy when you're doing the rest of the instruction selection work, but I agree, at 0.5% there's no rush. That said, in #4894 you measured double-digit percentages of failures in
insert_call
. Maybe in the run above you happened to get really lucky?What do you think about doing something similar to what you did in the WIP patch but with a base on the opcode signature table. [...] I think this is pretty much what you were doing in the WIP patch, but instead of generating the options we filter the options.
That all sounds great. I don't have a strong attachment to the style I used. The main thing I liked about it was capturing borrows of the appropriate slices in each option. I liked being able to avoid looking up resources when generating an instruction, by remembering what resources I already looked up when generating the list of choices. If there's a lookup in both places, I worry that they could get out of sync. So if you find that it's easy to do that while filtering options, I'd like that; but if it's not easy, I don't care that much.
As a bigger project, maybe we should try doing something like wasm-smith's new no-trapping mode.
I think we should still allow some degree of trapping, we've caught a bunch of issues with the interpreter due to this.
I don't understand: how do we catch interpreter bugs if we silently throw out cases where the interpreter traps? Are these bugs where the interpreter should have trapped but it didn't, so we only saw the trap during native-code execution?
To make sure we're talking about the same thing: "no-trapping mode" means generating extra code to ensure that the inputs are always valid for instructions that may trap. It doesn't mean disabling instructions that could trap. For example, you could bitwise-or 1 into the denominator of a division, so that it's never 0 regardless of how the value was generated.
afonso360 commented on issue #3050:
The numbers in #4894 are confusing me though. The number of valid inputs doesn't agree with the number of stack traces. For example, the baseline reported in that PR had 31,622 invalid inputs according to the stats, but only 20,139 stack traces. Were those from different runs, or are there invalid inputs that are not being measured?
I think this was an issue with #4894. I've re run the new benchmarks with your patch and it looks like we are now getting the expected results. See: https://github.com/bytecodealliance/wasmtime/pull/4894#issuecomment-1248438739
By the way, do you need to limit the number of stack frames at all? I'd expect comparing the entire stack trace to work fine in this case because there aren't many control flow paths in cranelift-fuzzgen. So on Linux/Cygwin/WSL I'd do something like
sha1sum stack/* | sort | uniq -cw40 | sort -rn
to get a list of representative stack traces sorted by how often they occurred.No issues, it all seemed to work fine, although it was a bit slow with 70k inputs, but no big deal.
It may be easy when you're doing the rest of the instruction selection work, but I agree, at 0.5% there's no rush. That said, in #4894 you measured double-digit percentages of failures in
insert_call
. Maybe in the run above you happened to get really lucky?I have no explanation for this. The 150k run above did start from a slightly larger corpus that I kept using while developing. But the one on #4894 is the OSS-Fuzz original corpus, maybe that makes a difference? That one is also only 50k runs and maybe that also figures into it? To be honest I'm kind of at a loss here.
That all sounds great. I don't have a strong attachment to the style I used. The main thing I liked about it was capturing borrows of the appropriate slices in each option. I liked being able to avoid looking up resources when generating an instruction, by remembering what resources I already looked up when generating the list of choices. If there's a lookup in both places, I worry that they could get out of sync. So if you find that it's easy to do that while filtering options, I'd like that; but if it's not easy, I don't care that much.
I'll try to work something out where we can preserver those constraints, I agree, I don't like the lookup in both places.
I think we should still allow some degree of trapping, we've caught a bunch of issues with the interpreter due to this.
I don't understand: how do we catch interpreter bugs if we silently throw out cases where the interpreter traps? Are these bugs where the interpreter should have trapped but it didn't, so we only saw the trap during native-code execution?
Yes, that's exactly it. So far it has happened 2 or 3 times (not sure) and it was in
div
, where we didn't properly catch div by zero and inrem
where the remainder of-1
should also trap. In these cases the interpreter actually crashed, but even if it didn't hopefully we would have caught those when comparing the results with the native execution.Another reason is that eventually I want to work on correct trapping semantics for the interpreter and testing those via the fuzzer and building the possible test cases now is easier than trying to do it all later.
To make sure we're talking about the same thing: "no-trapping mode" means generating extra code to ensure that the inputs are always valid for instructions that may trap. It doesn't mean disabling instructions that could trap. For example, you could bitwise-or 1 into the denominator of a division, so that it's never 0 regardless of how the value was generated.
Yeah, and that is pretty much how I'm planing to fix
int_divz
issues, however I want to add a config where sometimes we don't insert those. Or insert them only for some instructions.
afonso360 edited issue #3050:
In #3038 we introduced the initial version of the Cranelift CLIF-level differential fuzzer.
This fuzzer generates CLIF modules that are run on the interpreter and subsequently on the host machine (assuming no traps / invalid memory accesses) comparing the outputs of each run.
Roadmap:
- [x] Initial introduction (#3038)
- [x] Generating test inputs with control flow
- [x] Generate multiple blocks and basic jump instructions (#3094)
- [x] Generate
br_table
's and other jump table jumps (#3299)- [ ] Generating memory loads/stores
- [x] Stack
- [x] Interpreter Support (#3187)
- [x] Fuzzer Support (#4438)
- [ ] Heap
- [x] Interpreter Support (#3302)
- [ ] Fuzzer Support
- [ ] Table
- [x] Interpreter Support (#4433)
- [ ] Fuzzer Support
- [ ] Symbols
- [ ] Interpreter Support
- [ ] Fuzzer Support
- [ ] Others
- [ ] Cross endianness loads/stores
- [ ]
notrap
MemFlags- [ ]
aligned
MemFlags- [ ]
readonly
MemFlags- [ ] Generating calls
- [x] Generating LibCall's (#4782)
- [ ] Generating Function Calls
- [ ] Indirect Function Calls
- [ ] Full coverage of arithmetic ops
- [ ] Extend the codegen in the
cranelift-meta
crate to provide a table of acceptable opcodes and types- [ ] Full coverage of SIMD ops
- [ ] Misc
- [ ] Add
SourceLoc
to instructions- [ ] Add ValueLabels to instructions
afonso360 commented on issue #3050:
@jameysharp @elliottt :wave:
Here's what I played around with regarding generating the Opcodes, I tried to use the constraints encoded in the opcodes to generate valid signatures. IIRC I think they were too underspecified and at some point I stopped looking into it, but I don't remember a lot.
It does work for some opcodes, I do remember that.
It also needs a custom branch of cranelift with some additional interfaces on some types.
Repository:
https://github.com/afonso360/cranelift-opcode-genCustom Branch
https://github.com/afonso360/wasmtime/commits/fuzzgen-gen-inst
Last updated: Jan 24 2025 at 00:11 UTC