wasmtime / PR #4537 implement fuzzing for component types · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #4537 implement fuzzing for component types

Wasmtime GitHub notifications bot (Jul 29 2022 at 16:48):

dicej edited PR #4537 from component-fuzz-generator to main:

This is the first part of my work to address #4307. We now generate 1000
arbitrary types and tests for those types at build time. Each test includes a
component which imports and exports functions that take and return its
respective type. The exported function calls the imported function, which is
implemented by the host, and the host verifies that both the host function
argument and the guest function return value match the original input value.

In terms of #4307, this includes the test case generator and the static API
oracle. I'll follow up with a dynamic API oracle in a subsequent PR.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>

Wasmtime GitHub notifications bot (Jul 29 2022 at 16:49):

dicej edited PR #4537 from component-fuzz-generator to main:

This addresses #4307.

For the static API we generate 100 arbitrary test cases at build time, each of
which includes 0-5 parameter types, a result type, and a WAT fragment containing
an imported function and an exported function. The exported function calls the
imported function, which is implemented by the host. At runtime, the fuzz test
selects a test case at random and feeds it zero or more sets of arbitrary
parameters and results, checking that values which flow host-to-guest and
guest-to-host make the transition unchanged.

The fuzz test for the dynamic API follows a similar pattern, the only difference
being that test cases are generated at runtime.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>

Wasmtime GitHub notifications bot (Jul 29 2022 at 16:49):

dicej updated PR #4537 from component-fuzz-generator to main.

Wasmtime GitHub notifications bot (Jul 29 2022 at 17:53):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2022 at 17:53):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2022 at 17:53):

alexcrichton created PR review comment:

It's ok to omit this if it's empty

Wasmtime GitHub notifications bot (Jul 29 2022 at 17:53):

alexcrichton created PR review comment:

Is this necessary for this PR? (I think this might affect cargo fuzz build --no-default-features as still trying to build the spec interpreter)

Wasmtime GitHub notifications bot (Jul 29 2022 at 17:53):

alexcrichton created PR review comment:

Like you mentioned in the issue this can lead to too-deep recursion in the text format. I think a better strategy might be to generate a (type $name (record ...)) for each layer of a Type perhaps? That would avoid the *.wast limitations while still helping to fuzz super-deep type hierarchies in Wasmtime (which I think would actually be quite useful, so I'd rescind my other suggestion of limiting the depth here)

Wasmtime GitHub notifications bot (Jul 29 2022 at 17:53):

alexcrichton created PR review comment:

Was this for correctnes or a refactoring? Fine either way of course, just curious for my own edification

Wasmtime GitHub notifications bot (Jul 29 2022 at 17:53):

alexcrichton created PR review comment:

Currently the general style of this crate is that the oracles here don't take Unstructured but rather take a list of inputs. I think that would be possible to do here? This would take a TestCase as input as well as a Vec<(...)> for inputs/outputs of each run.

Wasmtime GitHub notifications bot (Jul 29 2022 at 17:53):

alexcrichton created PR review comment:

as a stylistic nit could this ] go on a separate line?

Wasmtime GitHub notifications bot (Jul 29 2022 at 17:53):

alexcrichton created PR review comment:

Currently the way time allotment on oss-fuzz works is that we're given a fixed time slice to balance between all of our fuzzing executables. In that sense the more fuzzers we have the more they take time away from other fuzzers. Since these are fuzzing the same thing in spirit could these two fuzzers be folded into one? The fuzz input could be used to determine whether the dynamic or static API could be fuzzed then I think.

Wasmtime GitHub notifications bot (Jul 29 2022 at 17:53):

alexcrichton created PR review comment:

Could this perhaps be replaced with something that dynamically fills out more of the bytes vector when an error is encountered? Otherwise this could run into a possible issue where unrelated changes to the fuzz generator could cause this build to fail if just-the-wrong-seed is chosen.

Wasmtime GitHub notifications bot (Jul 29 2022 at 17:53):

alexcrichton created PR review comment:

I'm a little worried about the impact that this will have on build times because it means that cargo fuzz build will build all of Wasmtime once for the host and then once again for the "target". This is a case where from Cargo's perspective when you execute cargo build --target $the_host_target it's considered a cross-compilation and packages shared between the two are built twice.

Improving this situation though is likely to not really be that easy. The "easy" thing to do is to split out the bits you need from wasmtime-fuzzing into their own separate crate. Splitting crates though is not always the easiest and ideally the pieces you're using to generate test cases here would ideally reuse the bits from wasmtime-environ as well to avoid duplicating things.

Overall though I do think that this is something that should be solved before landing due to how heavyweight this dependency is when fuzzing. This'll link in v8, for example, to the build script which is no small feat.

Wasmtime GitHub notifications bot (Jul 29 2022 at 18:15):

dicej updated PR #4537 from component-fuzz-generator to main.

Wasmtime GitHub notifications bot (Jul 29 2022 at 18:19):

dicej submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2022 at 18:19):

dicej created PR review comment:

Oops, I didn't see this until after I wrote https://github.com/bytecodealliance/wasmtime/pull/4537/commits/8fceba5be1f00c72582b5deaf6b490e421dfc887. Should I undo that and use this approach instead?

Wasmtime GitHub notifications bot (Jul 29 2022 at 18:19):

dicej submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2022 at 18:19):

dicej created PR review comment:

Not sure where that came from, honestly. I should be able to remove it.

Wasmtime GitHub notifications bot (Jul 29 2022 at 18:23):

dicej submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2022 at 18:23):

dicej created PR review comment:

Correctness. The original code meant that each argument would overwrite any that came before it in the list. This was the first bug the fuzz tests found.

Wasmtime GitHub notifications bot (Jul 29 2022 at 18:26):

alexcrichton created PR review comment:

Aah yes indeed I see the issue now!

Wasmtime GitHub notifications bot (Jul 29 2022 at 18:26):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2022 at 18:28):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2022 at 18:28):

alexcrichton created PR review comment:

Ah sorry for giving conflicting advice! I think it would be best to exercise as interesting type hierarchies as we can (including weirdly deep ones) so I think it'd be better to skip the *.wast-based limits. That being said if you'd prefer to get something working first I'm definitely ok having a follow-up issue for this.

Wasmtime GitHub notifications bot (Jul 29 2022 at 18:28):

dicej created PR review comment:

Ugh, https://github.com/bytecodealliance/wasmtime/commit/8fceba5be1f00c72582b5deaf6b490e421dfc887 didn't really address the issue anyway, so I'll try your approach.

Wasmtime GitHub notifications bot (Jul 29 2022 at 18:28):

dicej submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2022 at 19:50):

dicej submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2022 at 19:50):

dicej created PR review comment:

I'm not sure how to do that, given that we can't create an instance of Val without first getting the Types from a Func from an Instance, and the fuzzer has no way to do that.

Wasmtime GitHub notifications bot (Jul 29 2022 at 19:54):

dicej created PR review comment:

I guess I do know a way -- I used to have a Value enum in generator::component_types which could be constructed independently and then later converted to a Val when the Type was available. I had removed that because it was redundant in the current approach, but I could restore it.

Wasmtime GitHub notifications bot (Jul 29 2022 at 19:54):

dicej submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2022 at 20:04):

dicej submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2022 at 20:04):

dicej created PR review comment:

@alexcrichton How strongly do you feel about this? I think it would require adding a substantial amount of code.

Wasmtime GitHub notifications bot (Jul 29 2022 at 21:37):

dicej updated PR #4537 from component-fuzz-generator to main.

Wasmtime GitHub notifications bot (Jul 29 2022 at 22:44):

dicej updated PR #4537 from component-fuzz-generator to main.

Wasmtime GitHub notifications bot (Jul 29 2022 at 22:44):

dicej submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2022 at 22:44):

dicej created PR review comment:

This wasn't too painful to address. See my latest commit.

Wasmtime GitHub notifications bot (Aug 01 2022 at 15:45):

dicej updated PR #4537 from component-fuzz-generator to main.

Wasmtime GitHub notifications bot (Aug 01 2022 at 15:55):

alexcrichton created PR review comment:

Given recent spec clarifications it's ok for thi sto have 0 flags, so I think it's ok to remove the "non empty" part here

Wasmtime GitHub notifications bot (Aug 01 2022 at 15:55):

alexcrichton created PR review comment:

I think we'll want to allow this to have zero fields as well as Flags below to exercise the zero-size case.

Wasmtime GitHub notifications bot (Aug 01 2022 at 15:55):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Aug 01 2022 at 15:55):

alexcrichton created PR review comment:

Similar to the cases below I think that we'll want to remove this to allow testing zero-sized tuples as well.

Wasmtime GitHub notifications bot (Aug 01 2022 at 15:55):

alexcrichton created PR review comment:

Ah sorry I forgot to review this in the initial first pass. We won't want to have this implementation though because the IntoComponentFunc impl is only intended for host functions where the component-type of the function can be statically inferred. As you have probably surmised with the lock/etc in typechecking for these functions the "dynamic" interface here doesn't really match the interface for static functions.

Instead what we'll want to do is something along the lines of the difference between wasmtime::Func::{wrap,new} where here you're defining the new equivalent. That will grow a new function component::Linker::func_new which will take the type of the function as its argument as well as a closure. The return values of the closure are then type-checked against the original type at runtime before yielding the result back to the original component.

This may be worth splitting to its own PR if you'd prefer. I'm also happy to review it here as well though.

Wasmtime GitHub notifications bot (Aug 01 2022 at 15:55):

alexcrichton created PR review comment:

Could this be updated to match the canonical definition which aligns-up the summation of the discriminant and the max variant size rather than aligning-up each individual summation? I think it probably ends up being the same thing but figure it may be best to align (heh) with the spec where we can.

Wasmtime GitHub notifications bot (Aug 01 2022 at 15:55):

alexcrichton created PR review comment:

Similar comment as to above about aligning the result of the summation instead of each individual component to match the spec (although again I don't think it affects the results)

Wasmtime GitHub notifications bot (Aug 01 2022 at 15:55):

alexcrichton created PR review comment:

One thing I did for the fact-valid-module fuzzer was to just specify this as a usize as well.

Wasmtime GitHub notifications bot (Aug 01 2022 at 15:55):

alexcrichton created PR review comment:

Could the seed be printed here to assist debugging in case the build script later returns an error?

Wasmtime GitHub notifications bot (Aug 01 2022 at 15:55):

alexcrichton created PR review comment:

Oh right sorry yes I forgot that we need the component on-hand to generate the values. Yeah in that case the refactor isn't worth it and the current structure is good :+1:

Wasmtime GitHub notifications bot (Aug 01 2022 at 15:55):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Aug 01 2022 at 15:58):

dicej updated PR #4537 from component-fuzz-generator to main.

Wasmtime GitHub notifications bot (Aug 01 2022 at 16:11):

dicej created PR review comment:

I was thinking a zero-sized tuple was just a unit and thus redundant. Should we also allow zero-sized records, variants, unions, etc.? I.e. should I remove the NonEmptyArray type entirely?

Wasmtime GitHub notifications bot (Aug 01 2022 at 16:11):

dicej submitted PR review.

Wasmtime GitHub notifications bot (Aug 01 2022 at 16:15):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Aug 01 2022 at 16:15):

alexcrichton created PR review comment:

We should allow zero-size records, yeah, but variants/unions/enums all require at least one case so the NonEmptyArray there is correct. You're right that in Rust an empty tuple will translate to a (), but for fuzzing here I think it's best to exercise as many cases as possible.

Wasmtime GitHub notifications bot (Aug 01 2022 at 17:53):

dicej updated PR #4537 from component-fuzz-generator to main.

Wasmtime GitHub notifications bot (Aug 01 2022 at 18:11):

dicej created PR review comment:

Would you mind sketching out an example of how your proposed API would be used? I'm specifically interested in where the user would get the type of the function to pass to component::Linker::func_new. Would they query the Component for the things it imports, find the function of interest, and get the type from that?

Wasmtime GitHub notifications bot (Aug 01 2022 at 18:11):

dicej submitted PR review.

Wasmtime GitHub notifications bot (Aug 01 2022 at 18:13):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Aug 01 2022 at 18:13):

alexcrichton created PR review comment:

Yeah that is indeed a good question. In core wasm users would do FuncType::new(...) and create their own function type. In the fullness of time I'd expect the same operation to be available to components where function types could be manufactured in addition to being learned from components. That's a big missing piece of the embedding API for components right now though so, yeah, getting it from a component seems the most reasonable for now.

Wasmtime GitHub notifications bot (Aug 01 2022 at 23:45):

dicej updated PR #4537 from component-fuzz-generator to main.

Wasmtime GitHub notifications bot (Aug 01 2022 at 23:46):

dicej created PR review comment:

@alexcrichton I've just pushed an update per your suggested API. Let me know what you think.

Wasmtime GitHub notifications bot (Aug 01 2022 at 23:46):

dicej submitted PR review.

Wasmtime GitHub notifications bot (Aug 02 2022 at 13:48):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Aug 02 2022 at 13:48):

alexcrichton created PR review comment:

Personally I think that this needs to change in the future to taking the function type as the argument. That being said this works well enough for now and it's ok to extend this later, so I think this should stay as-is.

Otherwise though eventually this will require filling out more of the component::types submodule in the same manner as wasmtime::types which is able to describe imports/exports/etc. I think the implementation will be roughly the same as what you have already with Arc<Types> and FooIndex paired together, but again it's ok to defer this to later.

Wasmtime GitHub notifications bot (Aug 02 2022 at 13:48):

alexcrichton created PR review comment:

Eventually I think this will need to be filled out when the function type is an argument to func_new. Additionally for now I think it would be good to go ahead and fill out to have an "extra layer" of defense against bugs.

For now I think the new_dynamic here could take a TypeFuncIndex and the implementation could basically test that the two function indices are the same (since everything comes from within the same component right now)

Wasmtime GitHub notifications bot (Aug 02 2022 at 13:48):

alexcrichton created PR review comment:

Given that func_new is somewhat rare and already fairly verbose I think it's ok to skip the variant-without-StoreContextMut and go straight to requiring the argument during func_new

Wasmtime GitHub notifications bot (Aug 02 2022 at 13:48):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Aug 02 2022 at 13:48):

alexcrichton created PR review comment:

Similar to TypedFunc::call vs Func::call, is it possible to share this code with the existing entrypoint above? Given how different the parameters are represented it may be too onerous though.

Wasmtime GitHub notifications bot (Aug 02 2022 at 13:48):

alexcrichton created PR review comment:

Above the 2x2 matrix of "params direct/indirect" x "results direct/indirect" is expressed with 4 different cases but the main reason for that is to get all the fiddly bits around the types to work out. Otherwise here though this may be more amenable to a straightforward "load the params, do the call, store the result" without having to specify 4 different cases (which leads to a fair bit of duplication between everything)

Wasmtime GitHub notifications bot (Aug 02 2022 at 13:48):

alexcrichton created PR review comment:

Are these new 'static bounds still required?

Wasmtime GitHub notifications bot (Aug 02 2022 at 13:48):

alexcrichton created PR review comment:

This looks like it's basically copying the results of ty, but could ty be used as-is? The pairing of index: TypeFuncIndex and &Arc<Types> I think should be enough to type-check everything

Wasmtime GitHub notifications bot (Aug 02 2022 at 13:48):

alexcrichton created PR review comment:

Could this specify both the rhs and the lhs of the transmute? (stylistically I prefer to do that given how dangerous transmute is in general)

Wasmtime GitHub notifications bot (Aug 02 2022 at 14:07):

dicej created PR review comment:

We need to represent them as component::Types in call_host_dynamic in order to use Type::flatten_count, Type::next_field, Val::lift, Val::store, etc. I figured since we'd eventually convert them to component::Types anyway, might as well do it once here rather than every time call_host_dynamic is invoked.

Wasmtime GitHub notifications bot (Aug 02 2022 at 14:07):

dicej submitted PR review.

Wasmtime GitHub notifications bot (Aug 02 2022 at 14:20):

dicej submitted PR review.

Wasmtime GitHub notifications bot (Aug 02 2022 at 14:20):

dicej created PR review comment:

That was my first thought, but I couldn't see an elegant way to do it. With Func::call I was able to reuse call_raw by using [ValRaw: MAX_FLAT_PARAMS] as the LowerParams generic parameter. I can't use that trick with call_host, though, since the space for the parameters has already been allocated by the caller.

There probably is a way to refactor call_host to work in both contexts, but I expect it would be pretty ugly.

Wasmtime GitHub notifications bot (Aug 02 2022 at 14:24):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Aug 02 2022 at 14:24):

alexcrichton created PR review comment:

Ok that seems reasonable, this can be iterated on later if necessary.

Wasmtime GitHub notifications bot (Aug 02 2022 at 14:25):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Aug 02 2022 at 14:25):

alexcrichton created PR review comment:

Ok no worries, but in that case can you duplicate some of the more interesting test cases for imports for func_new-defined imports as well?

Wasmtime GitHub notifications bot (Aug 02 2022 at 14:45):

dicej created PR review comment:

It will be required now that we're making HostFunc::typecheck more than a no-op for the dynamic case, since I'll need to convert the type of that field back to a Box<dyn Fn...> so it can store a closure which captures the TypeFuncIndex.

Wasmtime GitHub notifications bot (Aug 02 2022 at 14:45):

dicej submitted PR review.

Wasmtime GitHub notifications bot (Aug 02 2022 at 16:49):

dicej updated PR #4537 from component-fuzz-generator to main.

Wasmtime GitHub notifications bot (Aug 03 2022 at 14:35):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Aug 03 2022 at 14:57):

dicej updated PR #4537 from component-fuzz-generator to main.

Wasmtime GitHub notifications bot (Aug 04 2022 at 17:02):

alexcrichton merged PR #4537.

Last updated: Apr 16 2025 at 15:03 UTC