wasmtime / PR #3876 Add support for `async` call hooks · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #3876 Add support for `async` call hooks

Wasmtime GitHub notifications bot (Mar 02 2022 at 23:46):

acw opened PR #3876 from acw/call-hook-async-variant to main:

The present runtime supports the ability to add aspect-like hooks to wasm entrance/exit via the call_hook method. In some cases, it can be useful to have this hook do asynchronous operations; to acquire a lock, for example.

This patch series provides an alternate way for users to provide a call hook that mirrors the resource limiter callbacks. To get around some trickiness with Rust types, closures, and Futures, call_hook_async takes an object on which a simple trait is implemented. This hook is then executed using the existing block_on mechanism.

Beyond just adding this capability, this patch makes two other changes to the code base to avoid an error condition in which one of these tasks becomes scheduled right as an async fiber is being dropped. The first changes the async_cx method so that it returns Option<AsyncCx> instead of just AsyncCx; this can be an early detection mechanism for the fiber being dropped. The more key one is a late check in block_on, which turns a former-panic into being a catchable an error condition.

Many thanks to @alexcrichton, who figured out the "resuming a task on a dying thread" issue.

Wasmtime GitHub notifications bot (Mar 03 2022 at 01:24):

acw updated PR #3876 from acw/call-hook-async-variant to main.

Wasmtime GitHub notifications bot (Mar 03 2022 at 15:55):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 03 2022 at 15:55):

alexcrichton created PR review comment:

Same as above I think this is best done as .expect() since it would be bad if wasm were able to execute and only caught it when an import happened.

Wasmtime GitHub notifications bot (Mar 03 2022 at 15:55):

alexcrichton created PR review comment:

For this I think it's ok to do .expect() because any trap raised from cancellation of a fiber should not be catch-able by wasm and wasm should never be able to initiate a call of an async import.

Wasmtime GitHub notifications bot (Mar 03 2022 at 15:55):

alexcrichton created PR review comment:

Instead of returning a trap could this return Ok(()), perhaps with an assert that this is CallHook::ExitingWasm (I forget the precise name) since I think we should be guaranteed that the hook is only used in that one direction.

Additionally when I stop to think about this I think it's a small preexisting bug today where when a fiber is being torn down we shouldn't execute any user-provided code on the fiber, instead only destructors. Right now though if you have a synchronous call-hook installed then we'll still run the synchronous call hook when exiting wasm.

To handle that, perhaps my above comment about moving self.inner.async_cx() down here isn't so applicable. Perhaps if async support is enabled and if the cx isn't available then Ok(()) is unconditionally returned to avoid calling any hook infrastructure? (also having the assert about s: CallHook I think)

Wasmtime GitHub notifications bot (Mar 03 2022 at 15:55):

alexcrichton created PR review comment:

Ditto to above

Wasmtime GitHub notifications bot (Mar 03 2022 at 15:55):

alexcrichton created PR review comment:

Is this still applicable? It looks like self.inner.async_cx() is disjoint from self.call_hook which is part of the match so I think this should be able to move in to the Async block below?

Wasmtime GitHub notifications bot (Mar 03 2022 at 15:55):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 03 2022 at 15:55):

alexcrichton created PR review comment:

Like the above for Func::new I think this is safe to .expect() and/or .unwrap() since it should be impossible to hit this with a None

Wasmtime GitHub notifications bot (Mar 03 2022 at 15:55):

alexcrichton created PR review comment:

FWIW if you so desire you can get rid of the pesky ref mut on ref mut hook by having the match expression be &mut self.call_hook

Wasmtime GitHub notifications bot (Mar 03 2022 at 15:55):

alexcrichton created PR review comment:

Could you add some documentation to this function as to why Option is the return value? (notably pointing out as well that this panics if async_support is not enabled, whereas the signature might naively appear to return None if async support is not enabled)

Wasmtime GitHub notifications bot (Mar 03 2022 at 15:55):

alexcrichton created PR review comment:

Ditto to other places, it's ok to .unwrap() here

Wasmtime GitHub notifications bot (Mar 03 2022 at 15:55):

alexcrichton created PR review comment:

I don't think that this is necessary because the suspend() call below will return an Err(()) which will propagate the error outwards as soon as it's seen with res?, meaning that we still shouldn't ever reach this spot with a null poll_cx and if we do its indicative of a bug elsewhere

Wasmtime GitHub notifications bot (Mar 03 2022 at 15:55):

alexcrichton created PR review comment:

Ditto to above

Wasmtime GitHub notifications bot (Mar 03 2022 at 15:55):

alexcrichton created PR review comment:

Ditto to above

Wasmtime GitHub notifications bot (Mar 03 2022 at 19:00):

aturon created PR review comment:

One worry here: async_trait means this will be using a Box under the hood, which means we incur an allocation for every hook invocation, i.e. two times per hostcall.

A different factoring might allow us to allocate once up front (for the hook itself), basically by making this a poll-style function rather than an async fn.

How worried are we about these costs? AIUI we care a lot about host/guest boundary crossing being cheap.

Wasmtime GitHub notifications bot (Mar 03 2022 at 19:00):

aturon submitted PR review.

Wasmtime GitHub notifications bot (Mar 03 2022 at 20:08):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 03 2022 at 20:08):

alexcrichton created PR review comment:

Oh that's actually a good point! I haven't ever done benchmarking myself about the costs with call hooks enabled, but I suspect that even as-is it's already at least an order of magnitude slower than "raw" syscalls. That being said though even when wiggle is layered on top I suspect it's even slower as we haven't done much work to optimize wiggle bits (I know there's a Mutex somewhere right now frobbed on all entrances and exits, but the mutex is always un-contended). In that sense it might be good to benchmark this locally in something like Viceroy perhaps and see if the overhead is meaningful?

Wasmtime GitHub notifications bot (Mar 03 2022 at 21:15):

aturon submitted PR review.

Wasmtime GitHub notifications bot (Mar 03 2022 at 21:15):

aturon created PR review comment:

Gotcha. Seems like it'd be interesting data to have around -- comparing no call hook, sync call hook, and async call hook.

Wasmtime GitHub notifications bot (Mar 04 2022 at 19:04):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 04 2022 at 19:04):

alexcrichton created PR review comment:

To assist in poking around with performance bits I wrote up https://github.com/bytecodealliance/wasmtime/pull/3883 which should be a good spot we can insert an async call hook and see the performance. My expectation is that given how slow most async things already are an extra allocation here isn't going to really be all that expensive. For comparison calling into wasm itself is on the order of microseconds instead of nanoseconds because of how our stack management currently works. Calling into the host already allocates a Box'd future per call so allocating 2 more probably isn't the end of the world.

Wasmtime GitHub notifications bot (Mar 07 2022 at 19:50):

acw updated PR #3876 from acw/call-hook-async-variant to main.

Wasmtime GitHub notifications bot (Mar 07 2022 at 19:51):

acw submitted PR review.

Wasmtime GitHub notifications bot (Mar 07 2022 at 19:51):

acw created PR review comment:

Cool. I've pulled these checks and replaced them with expect() (mostly; a couple unwraps) in a775e97.

Wasmtime GitHub notifications bot (Mar 07 2022 at 19:51):

acw submitted PR review.

Wasmtime GitHub notifications bot (Mar 07 2022 at 19:51):

acw created PR review comment:

Addressed in a775e97.

Wasmtime GitHub notifications bot (Mar 07 2022 at 19:52):

acw submitted PR review.

Wasmtime GitHub notifications bot (Mar 07 2022 at 19:52):

acw created PR review comment:

Addressed in a775e97.

Wasmtime GitHub notifications bot (Mar 07 2022 at 19:52):

acw submitted PR review.

Wasmtime GitHub notifications bot (Mar 07 2022 at 19:52):

acw created PR review comment:

Addressed in a775e97.

Wasmtime GitHub notifications bot (Mar 07 2022 at 19:52):

acw submitted PR review.

Wasmtime GitHub notifications bot (Mar 07 2022 at 19:52):

acw created PR review comment:

Addressed in a775e97.

Wasmtime GitHub notifications bot (Mar 07 2022 at 19:52):

acw created PR review comment:

Addressed in a775e97.

Wasmtime GitHub notifications bot (Mar 07 2022 at 19:52):

acw submitted PR review.

Wasmtime GitHub notifications bot (Mar 07 2022 at 19:52):

acw submitted PR review.

Wasmtime GitHub notifications bot (Mar 07 2022 at 19:52):

acw created PR review comment:

Addressed in a775e97.

Wasmtime GitHub notifications bot (Mar 07 2022 at 20:49):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 07 2022 at 20:49):

alexcrichton created PR review comment:

From the benchmark added there the host-calling-wasm case has negligible overhead with this design of async hooks because the cost of that transition is on the order of a microsecond where the allocations here are more like nanoseconds. I think setting up the stack and unconditionally doing a context switch is probably the expensive part.

For wasm-calling-host this imposes a ~40ns overhead. If the host function is defined via Func::wrap ("typed" and "synchronous") then the sync hook vs async hook is 7ns vs 46ns. For functiosn defined via Func::wrapN_async ("typed" and async) then the sync hook vs async hook is 29ns vs 69ns.

Now whether those specific numbers make sense in a broader context I'm not sure, but I would suspect that most of the cost is indeed the allocation of the returned future from this trait.

Wasmtime GitHub notifications bot (Mar 07 2022 at 21:14):

acw updated PR #3876 from acw/call-hook-async-variant to main.

Wasmtime GitHub notifications bot (Mar 07 2022 at 21:15):

acw submitted PR review.

Wasmtime GitHub notifications bot (Mar 07 2022 at 21:15):

acw created PR review comment:

Yes, seems right. Also, moving it removes an unnecessary test if the async feature is set but we're not actually using async callbacks.

This code was mirrored from memory_growing, so I've updated that function, too.

Addressed in fdbd3b9.

Wasmtime GitHub notifications bot (Mar 07 2022 at 21:20):

acw submitted PR review.

Wasmtime GitHub notifications bot (Mar 07 2022 at 21:20):

acw created PR review comment:

Let me put together a more generic test case for all of this, and make sure that everything still works if I remove this check. My memory is that this is the check that actually made sure things work, but let me confirm that. (I had a few different tests and validation runs, and I might be conflating two of them.)

Wasmtime GitHub notifications bot (Mar 07 2022 at 21:23):

acw submitted PR review.

Wasmtime GitHub notifications bot (Mar 07 2022 at 21:23):

acw created PR review comment:

Is there any reason not to just make the check be for both conditions (i.e., we have async_support and we're not being torn down), rather than panicking if async_support is not enabled?

Wasmtime GitHub notifications bot (Mar 07 2022 at 21:26):

acw updated PR #3876 from acw/call-hook-async-variant to main.

Wasmtime GitHub notifications bot (Mar 07 2022 at 21:27):

acw submitted PR review.

Wasmtime GitHub notifications bot (Mar 07 2022 at 21:27):

acw created PR review comment:

(I've added the basic docs, with the panic sill allowed, in 09d62a4)

Wasmtime GitHub notifications bot (Mar 08 2022 at 01:06):

acw updated PR #3876 from acw/call-hook-async-variant to main.

Wasmtime GitHub notifications bot (Mar 08 2022 at 15:59):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 08 2022 at 15:59):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 08 2022 at 15:59):

alexcrichton created PR review comment:

FWIW this might be a bit easier to do with inst.get_typed_func::<(), (), _>(&mut store, "export")

Wasmtime GitHub notifications bot (Mar 08 2022 at 15:59):

alexcrichton created PR review comment:

I think this ends up meaning that this test spin-loops for 2 seconds, right? If so could that instead be changed to something like "after 200 calls into the host return a trap" or similar? (ideally good to avoid having tests take too long)

Wasmtime GitHub notifications bot (Mar 08 2022 at 15:59):

alexcrichton created PR review comment:

Stylistic nit, but this can be struct HandlerR; (no need for the {})

Wasmtime GitHub notifications bot (Mar 23 2022 at 00:25):

acw updated PR #3876 from acw/call-hook-async-variant to main.

Wasmtime GitHub notifications bot (Mar 23 2022 at 00:29):

acw submitted PR review.

Wasmtime GitHub notifications bot (Mar 23 2022 at 00:29):

acw created PR review comment:

It turns out that the upstream check that required this test (the one in async_cx()) was not complete enough, necessitating this check. The updated version of async_cx in b9c035c addresses this issue, and allowed me to remove this code.

Wasmtime GitHub notifications bot (Mar 23 2022 at 00:39):

acw updated PR #3876 from acw/call-hook-async-variant to main.

Wasmtime GitHub notifications bot (Mar 23 2022 at 00:39):

acw submitted PR review.

Wasmtime GitHub notifications bot (Mar 23 2022 at 00:39):

acw created PR review comment:

Fixed in 207f2e6.

Wasmtime GitHub notifications bot (Mar 23 2022 at 00:40):

acw submitted PR review.

Wasmtime GitHub notifications bot (Mar 23 2022 at 00:40):

acw created PR review comment:

Fixed in 207f2e6.

Wasmtime GitHub notifications bot (Mar 23 2022 at 00:40):

acw submitted PR review.

Wasmtime GitHub notifications bot (Mar 23 2022 at 00:40):

acw created PR review comment:

Fixed in 207f2e6.

Wasmtime GitHub notifications bot (Mar 23 2022 at 00:45):

acw updated PR #3876 from acw/call-hook-async-variant to main.

Wasmtime GitHub notifications bot (Mar 23 2022 at 14:28):

alexcrichton created PR review comment:

My thinking is that the decision about whether to do something async or not is a higher-level concern than this function, so something shouldn't unconditionally call this and then act on the return value, instead it should have already made some prior decision to call this function and then it acts on the return value as necessary. (plus having two different None states would mean we'd probably have to return some sort of custom enum which is a bit of a pain)

Wasmtime GitHub notifications bot (Mar 23 2022 at 14:28):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 23 2022 at 14:47):

alexcrichton updated PR #3876 from acw/call-hook-async-variant to main.

Wasmtime GitHub notifications bot (Mar 23 2022 at 14:48):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 23 2022 at 16:29):

alexcrichton merged PR #3876.

Last updated: Apr 16 2025 at 17:03 UTC