cfallin opened issue #3251:
In #3186 we found that the WebAssembly spec interpreter may not be suitable for high-throughput fuzzing because of certain performance characteristics. While we were able to locally patch one source of quadratic runtime, there are likely others, and the patch was not really a good fit to upstream. We are looking into other peer implementations to fuzz against for general programs (aside from existing
differential_wasmi
fuzzing).However, there is another way we could use the reference interpreter: we could validate single instructions' semantics more closely by generating short test cases that just execute an instruction (or a straight-line sequence with no control flow or calls) and return. This would be very valuable for instructions with more complex semantics, such as many of the SIMD instructions.
This is sort of a breadth-vs-depth situation: the "depth" comes from longer-running programs and tests interactions between instructions, and a good oracle there is a more optimized implementation; while the "breadth" comes from exhaustive many-inputs testing of very short sequences and tests mainly that we got the instruction semantics right. The combination seems like it should give pretty good coverage.
cc @abrown @alexcrichton
alexcrichton labeled issue #3251:
In #3186 we found that the WebAssembly spec interpreter may not be suitable for high-throughput fuzzing because of certain performance characteristics. While we were able to locally patch one source of quadratic runtime, there are likely others, and the patch was not really a good fit to upstream. We are looking into other peer implementations to fuzz against for general programs (aside from existing
differential_wasmi
fuzzing).However, there is another way we could use the reference interpreter: we could validate single instructions' semantics more closely by generating short test cases that just execute an instruction (or a straight-line sequence with no control flow or calls) and return. This would be very valuable for instructions with more complex semantics, such as many of the SIMD instructions.
This is sort of a breadth-vs-depth situation: the "depth" comes from longer-running programs and tests interactions between instructions, and a good oracle there is a more optimized implementation; while the "breadth" comes from exhaustive many-inputs testing of very short sequences and tests mainly that we got the instruction semantics right. The combination seems like it should give pretty good coverage.
cc @abrown @alexcrichton
alexcrichton labeled issue #3251:
In #3186 we found that the WebAssembly spec interpreter may not be suitable for high-throughput fuzzing because of certain performance characteristics. While we were able to locally patch one source of quadratic runtime, there are likely others, and the patch was not really a good fit to upstream. We are looking into other peer implementations to fuzz against for general programs (aside from existing
differential_wasmi
fuzzing).However, there is another way we could use the reference interpreter: we could validate single instructions' semantics more closely by generating short test cases that just execute an instruction (or a straight-line sequence with no control flow or calls) and return. This would be very valuable for instructions with more complex semantics, such as many of the SIMD instructions.
This is sort of a breadth-vs-depth situation: the "depth" comes from longer-running programs and tests interactions between instructions, and a good oracle there is a more optimized implementation; while the "breadth" comes from exhaustive many-inputs testing of very short sequences and tests mainly that we got the instruction semantics right. The combination seems like it should give pretty good coverage.
cc @abrown @alexcrichton
alexcrichton unlabeled issue #3251:
In #3186 we found that the WebAssembly spec interpreter may not be suitable for high-throughput fuzzing because of certain performance characteristics. While we were able to locally patch one source of quadratic runtime, there are likely others, and the patch was not really a good fit to upstream. We are looking into other peer implementations to fuzz against for general programs (aside from existing
differential_wasmi
fuzzing).However, there is another way we could use the reference interpreter: we could validate single instructions' semantics more closely by generating short test cases that just execute an instruction (or a straight-line sequence with no control flow or calls) and return. This would be very valuable for instructions with more complex semantics, such as many of the SIMD instructions.
This is sort of a breadth-vs-depth situation: the "depth" comes from longer-running programs and tests interactions between instructions, and a good oracle there is a more optimized implementation; while the "breadth" comes from exhaustive many-inputs testing of very short sequences and tests mainly that we got the instruction semantics right. The combination seems like it should give pretty good coverage.
cc @abrown @alexcrichton
github-actions[bot] commented on issue #3251:
Subscribe to Label Action
cc @fitzgen
<details>
This issue or pull request has been labeled: "fuzzing"Thus the following users have been cc'd because of the following labels:
- fitzgen: fuzzing
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
fitzgen commented on issue #3251:
My intuition is that combining multiple instructions, even if it is just two or three, would give us much more bang for our buck than testing single instructions.
cfallin commented on issue #3251:
Ah, old issue, paging back in context (I'm not sure why the labeler decided to tag you just now on a discussion from August!)...
Yup, agreed, that's the "or a straight-line sequence with no control flow or calls" above :-)
Adding some more thought just now: IMHO we should do it only up to the max depth of our instruction combining patterns. The point of this suggestion was specifically to target the fuzz throughput in a different way, at the input-value space rather than the input-program space. As we test straightline sequences of
n
instructions we haveO(|wasm opcodes|^n)
possibilities, and cut our time spent (on exploring argument values) per individual test program exponentially.In other words, if my goal is to say that all cases for (say) SIMD instruction X have been covered, then separately fuzzing
vector_add(X(x, y, z), [1.0, 1.0, 1.0, 1.0])
andvector_add(X(x, y, z), [2.0, 2.0, 2.0, 2.0])
doesn't really add any coverage. Testingvector_add(X, ...)
andvector_mul(X, ...)
might, if we have combining patterns that differ for those. But combining instructions may also mask some of the value space, for any operation that isn't bijective (e.g. extract one lane, or multiply by zero, or min/max).So, said another way, what I think would be useful is a unit-test-inspired approach where we take each individual lowering and then throw a long test-vector of argument values at it. (In other words the
Arbitrary
type is a(short inst sequence, Vec<ArgValues>)
.) That gets us "each individual lowering is good" coverage in a way that might hide when we're testingn
(forn
larger) instructions at a time.cc @abrown , any interest in looking at this as a way to get the spec interpreter in active fuzzing use?
abrown commented on issue #3251:
I would eventually like to look into this more but if anyone else has the bandwidth feel free to jump in first (read: I'm not sure when I will get to this).
abrown commented on issue #3251:
@cfallin, in a separate project I did something using
proptest
that reminded me of this. The basic idea, adapted to Wasmtime, would be to:
- construct a single-instruction WebAssembly module
- create random inputs using
proptest
- run the module's sole function in two different engines and check the results for equality
I wasn't too concerned with multiple-instruction sequences and other things mentioned above but if we have a place to put this
proptest
-guided fuzz test then I would be happy to contribute the code.
abrown commented on issue #3251:
I think this is closed by #4409 which is now in use by the new
differential
fuzz target (#4515).
abrown closed issue #3251:
In #3186 we found that the WebAssembly spec interpreter may not be suitable for high-throughput fuzzing because of certain performance characteristics. While we were able to locally patch one source of quadratic runtime, there are likely others, and the patch was not really a good fit to upstream. We are looking into other peer implementations to fuzz against for general programs (aside from existing
differential_wasmi
fuzzing).However, there is another way we could use the reference interpreter: we could validate single instructions' semantics more closely by generating short test cases that just execute an instruction (or a straight-line sequence with no control flow or calls) and return. This would be very valuable for instructions with more complex semantics, such as many of the SIMD instructions.
This is sort of a breadth-vs-depth situation: the "depth" comes from longer-running programs and tests interactions between instructions, and a good oracle there is a more optimized implementation; while the "breadth" comes from exhaustive many-inputs testing of very short sequences and tests mainly that we got the instruction semantics right. The combination seems like it should give pretty good coverage.
cc @abrown @alexcrichton
Last updated: Dec 23 2024 at 12:05 UTC