Stream: git-wasmtime

Topic: wasmtime / PR #10388 Stack switching: Infrastructure and ...


view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2025 at 22:44):

frank-emrich opened PR #10388 from frank-emrich:stack-switching-infra to bytecodealliance:main:

This PR is part of a series that adds support for the Wasm stack switching proposal. The explainer document for the proposal is here. There's a tracking issue describing the overall progress and limitations here: #10248
The draft PR #10177 contains the whole (initial) implementation of the proposal.

This PR contains all of the necessary infrastructure and runtime support. In other words, this contains the entire implementation, except for codegen. Tests are added at the end, after codegen has also been added in a follow-up PR.

This was developed together with @dhil.

General implementation notes

In Wasm, continuations are represented by values of type (ref $ct), where $ct is a new composite type/heap type for continuations.
In the implementation, these are represented by values of type VMContObj. These are fat pointers, consisting of a sequence counter, and a pointer to a VMContRef. The latter type is used for the actual representation of continuations.
The sequence counter part of VMContObjs is used to check that every continuation value can only be used once.

The VMStoreContext is extended to contain a "stack chain": It indicates what stack we are currently executing on. Logically, this is a linked list of stacks, since each continuation has a parent field. The chain stored in the VMStoreContext always ends with a value representing the initial stack. This is the stack we were on when entering Wasm, which will usually be the main stack.

Memory Management

Currently, memory management is very basic: The StoreOpaque provides a new method for allocation a new VMContRef, and keeps track of all continuations created this way. Continuations are never deallocated at the moment, meaning that they live until the store itself is deallocated.

The stack memory used by each allocation (represented by type ContinuationStack) is always mmap-ed, with a guard page at the end. There is currently no support for growing stacks, or segmented stacks.

Backtrace generation

The existing backtrace generation mechanism is extended to be fully aware of continuations: When generating backtraces, the entire chain of continuation is traversed, not just the stack frames of the currently active stack/continuation.

Entering/Exiting Wasm

Prior to this PR, there were two separate mechanisms that save and restore some state of the runtime when entering/exiting Wasm:

  1. The functions enter_wasm and exit_wasm in func.rs
  2. CallThreadState saves and restores (on drop) parts of the VMStoreContext

This PR consolidates these two mechanism, because it requires some additional state to be updated and restored on enter/exit:
the type wasmtime::runtime::func::EntryStoreContext now stores all of the required runtime state and ensures that it's restored when exiting Wasm.

Tables

view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2025 at 22:44):

frank-emrich requested fitzgen for a review on PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2025 at 22:44):

frank-emrich requested wasmtime-fuzz-reviewers for a review on PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2025 at 22:44):

frank-emrich requested wasmtime-default-reviewers for a review on PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2025 at 22:44):

frank-emrich requested wasmtime-core-reviewers for a review on PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2025 at 23:44):

github-actions[bot] commented on PR #10388:

Subscribe to Label Action

cc @fitzgen

<details>
This issue or pull request has been labeled: "fuzzing", "wasmtime:api", "wasmtime:c-api", "wasmtime:config", "wasmtime:ref-types"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 01:05):

github-actions[bot] commented on PR #10388:

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:

[fuzzing-config]: https://github.com/bytecodealliance/wasmtime/blob/ca0e8d0a1d8cefc0496dba2f77a670571d8fdcab/crates/fuzzing/src/generators.rs#L182-L194
[fuzzing-docs]: https://docs.wasmtime.dev/contributing-fuzzing.html


<details>

To modify this label's message, edit the <code>.github/label-messager/wasmtime-config.md</code> file.

To add new label messages or remove existing label messages, edit the
<code>.github/label-messager.json</code> configuration file.

Learn more.

</details>

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:19):

fitzgen commented on PR #10388:

Haven't dug into the details of the PR yet, but I have some initial questions after reading the OP.

The stack memory used by each allocation (represented by type ContinuationStack) is always mmap-ed, with a guard page at the end. There is currently no support for growing stacks, or segmented stacks.

Is there a particular reason this does not use the InstanceAllocatorImpl::[de]allocate_fiber_stack methods via [store.engine().allocator().[de]allocate_fiber_stack()](https://github.com/bytecodealliance/wasmtime/blob/9da52ede33a9f8996832b0210579ef15296addbd/crates/wasmtime/src/engine.rs#L659)? That would make it so that this feature automatically integrates with the wasmtime::Config, its various knobs for stack sizes and such, and whether the pooling or on-demand allocator is in use.

Note that there is a potentially controversial change to how the pooling allocator works for tables: Since a VMContObj is 2 pointers wide, the maximum size of an entry in a table doubles. As a result, this PR doubles the amount of virtual memory space occupied by the table pool (but the amount of pages actually used stays the same).

Is this because (ref cont)/VMContObj tables are being shoe-horned into the same underlying representation as anyref/externref/VMGcRef tables? Because VMContObj has a different representation from VMGcRef, I think it would be best to add a new wasmtime::runtime::vm::Table variant for VMContObj, the same way that we have different variants for VMGcRef tables versus *const VMFuncRef tables.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:20):

fitzgen edited a comment on PR #10388:

Haven't dug into the details of the PR yet, but I have some initial questions after reading the OP.

The stack memory used by each allocation (represented by type ContinuationStack) is always mmap-ed, with a guard page at the end. There is currently no support for growing stacks, or segmented stacks.

Is there a particular reason this does not use the InstanceAllocatorImpl::[de]allocate_fiber_stack methods via store.engine().allocator().[de]allocate_fiber_stack()? That would make it so that this feature automatically integrates with the wasmtime::Config, its various knobs for stack sizes and such, and whether the pooling or on-demand allocator is in use.

Note that there is a potentially controversial change to how the pooling allocator works for tables: Since a VMContObj is 2 pointers wide, the maximum size of an entry in a table doubles. As a result, this PR doubles the amount of virtual memory space occupied by the table pool (but the amount of pages actually used stays the same).

Is this because (ref cont)/VMContObj tables are being shoe-horned into the same underlying representation as anyref/externref/VMGcRef tables? Because VMContObj has a different representation from VMGcRef, I think it would be best to add a new wasmtime::runtime::vm::Table variant for VMContObj, the same way that we have different variants for VMGcRef tables versus *const VMFuncRef tables.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:50):

fitzgen submitted PR review:

(review is still in progress, but I need to context switch for a little bit and might as well share what I have so far)

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:50):

fitzgen created PR review comment:

Were these going to be removed in the next PR, or should they be removed now? Maybe throw an item into the meta task list if not now.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:50):

fitzgen created PR review comment:

Not something that needs to be handled in this PR, but there is no reason we can't enable GC and also maintain the existing memory management for continuations, where they have the same lifetime as the Store. We don't have to implement full GC of continuations before we allow enabling both GC and stack switching at the same time.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:50):

fitzgen created PR review comment:

FWIW, we already have TRAP_INTERNAL_ASSERT that we use for debug-only assertions inside JIT code.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:50):

fitzgen created PR review comment:

I don't see a corresponding function definition in crates/c-api, I think it is missing?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:50):

fitzgen created PR review comment:

Ditto to the debug printing stuff in this file.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:50):

fitzgen created PR review comment:

fyi, I tend to do log::trace!(...) for debug prints (both temporary ones that I will remove, and for ones that are useful enough to keep in tree).

when running the CLI, you can see them by setting the WASMTIME_LOG=trace env var, or only a specific crate/module via WASMTIME_LOG=wasmtime::runtime::vm=trace, or multiple crates/modules at different levels via WASMTIME_LOG=debug,regalloc2=info,cranelift_codegen::machinst=trace.

When running tests instead of the CLI, WASMTIME_LOG becomes RUST_LOG. Most tests initialize the env-logger, but if you aren't seeing the logs, then you can put let _ = env_logger::try_init(); at the start of the test (or ideally in a shared test helper function that a bunch of related tests already call at the beginning of their execution).

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:50):

fitzgen created PR review comment:

At least for now, I'm not sure it is worth having separate config knobs for stacks created via cont.new versus API-level fibers. This effectively creates two different kinds of stacks in the system, and I'd prefer to try and keep things simple for as long as we can by having a single kind of stack that we config and allocate and handle in the system.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:50):

fitzgen created PR review comment:

Similarly, I don't see a function definition for this one either.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:52):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:52):

fitzgen created PR review comment:

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen submitted PR review:

(again, review still in progress, leaving intermediate notes)

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

Reminder to fill this in.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

Two things:

  1. Why does stack-switching need std?

  2. We should be able to enable stack-switching for a compiler-without-runtime build to be able to compile stack-switching Wasm to be run elsewhere, and similarly should be able to enable stack-switching in a runtime-without-compiler build to run elsewhere-compiled stack-switching Wasm modules. As written, enabling the stack-switching feature will always enable the runtime, making the compiler-without-runtime version impossible. That is, unless compilation of stack-switching isn't gated on any cargo feature, but that is kind of funky given that this cargo feature exists and I expect users would think "I want to compile Wasm programs that use stack-switching, therefore I will enable this feature". This is a long way of basically saying, I think this should not unconditionally enable the runtime and, once compilation is added in the next PR, we ultimately end up with something like this:

    rust stack-switching = [ "wasmtime-cranelift?/stack-switching", ]

Neither of these things necessarily needs to be addressed now, before this PR merges, but it may end up being easier doing it now before more stuff gets entangled, making it harder to pull the cfgs apart.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

This seems unused. Does it get used in follow up PRs?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

Because (AFAICT) these don't contain any usizes or pointers or anything else that changes with the target's word size, you could move them into the wasmtime_environ crate as

#[repr(u32)]
pub enum VMControlEffect {
    Return = 0,
    // ...
}

and avoid the wordy and kind of annoying split between the definitions of the enums and their discriminants.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

These sorts of config knobs that need to exist at the wasmtime_environ level usually go inside wasmtime_environ::Tunables. Unless there is a reason not to do that here, we should follow the existing convention.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

Nitpick: both the Engine and Module variants are interned, so perhaps we can rename this method to unwrap_module_type_index?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

This mem::drop dance is kind of funky -- can we instead pass ownership into catch_traps and let it handle the dropping, if necessary? Or are there callers that need to do different things?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

You'll also want to add some CI checks for these features being enabled/disabled in various combinations like this:

https://github.com/bytecodealliance/wasmtime/blob/main/.github/workflows/main.yml#L363-L365

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

If they are written to directly from Wasm JIT code, then probably yes. This struct should also be #[repr(C)] in that case.

If it is not written to directly from Wasm, then it would only need to be unsafe cells if you are ever violating Rust's mutable/exclusive xor immutable/shared semantics.

I haven't read enough of these changes to say any more than that yet.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

Can you run the function call microbenchmarks on main and on this branch? I want to make sure that we aren't regressing performance here, as this path is very hot, and if we are we may need to figure out how to cfg(...) some of this stuff so that it only happens when the stack-switching feature is enabled.

This will run the microbenchmarks and will check the results for statistical significance:

$ git checkout main
$ cargo bench --bench call -- --save-baseline main
$ git checkout stack-switching-infra
$ cargo bench --bench call -- --baseline main

I can help figure things out if we need to do something here.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

Can you add an item to the meta task list for the embedder API, if you haven't already?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 07:50):

dhil submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 07:50):

dhil created PR review comment:

I think this is a stray declaration from some of our downstream experiments. It has been needed to capture the C shadow stack pointer, that clang generates. It must be kept in sync with the stack switching done by the engine. To be clear, this is just due to the fact that clang doesn't know about stack switching.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 07:51):

dhil submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 07:51):

dhil created PR review comment:

Same reason as above.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 07:51):

dhil edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 15:30):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 15:30):

fitzgen created PR review comment:

FWIW, exposing these things the C API would be good in general, so if you want to do it here or in a new PR, you're more than welcome to!

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 15:34):

dhil submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 15:34):

dhil created PR review comment:

Yes sure! I am leaning towards adding this in a follow-up PR focused on the C API. What do you think @frank-emrich ?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 15:37):

dhil submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 15:37):

dhil created PR review comment:

As I discussed with @alexcrichton (https://github.com/bytecodealliance/wasmtime/pull/10177/files#r1965733715), I'd be OK with no bespoke stack switching configuration for now.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 16:09):

dhil submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 16:09):

dhil created PR review comment:

This is an interesting question, which Frank and I have discussed in the past. If my memory serves me right, then our layout is slightly different. However perhaps we can unify the layouts (or we can perhaps simply adopt the fibers' layout). I am not too sure about the implications, I think @frank-emrich has the key knowledge to best answer this question.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 17:59):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 17:59):

fitzgen created PR review comment:

The automatic pooling allocator integration would also get you fast allocation of new stacks from a pool, which would look nice on your benchmarks and what have you ;)

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2025 at 10:19):

dhil submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2025 at 10:20):

dhil created PR review comment:

My take is that they shouldn't make it into the main branch -- probably a stray include?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2025 at 10:21):

dhil submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2025 at 10:21):

dhil created PR review comment:

Yes, agreed! I totally see the appeal!

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2025 at 10:24):

dhil submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2025 at 10:24):

dhil created PR review comment:

Yes, it gets used during the code translation. It should arguably be added along with it in the next patch.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 19:09):

frank-emrich commented on PR #10388:

(Sorry for the long delay, I've moved to London and started a new job, where I needed to settle in first)

@fitzgen

Is there a particular reason this does not use the InstanceAllocatorImpl::[de]allocate_fiber_stack methods via store.engine().allocator().[de]allocate_fiber_stack()? That would make it so that this feature automatically integrates with the wasmtime::Config, its various knobs for stack sizes and such, and whether the pooling or on-demand allocator is in use.

The reason we do not use these in verbatim is that our stacks are not the same as wasmtime_fiber::FiberStack. Trying to make our stacks compatible with that type would be quite difficult and IMO not worth the effort.
However, our medium-term plan was to add similar allocation and deallocation functions to InstanceAllocatorImpl. This shouldn't be too much work, but something we didn't see through for now. In particular, @dhil is very interested in restoring support for pooled allocation of stack. We had implemented our own interface for that (outside of InstanceAllocatorImpl) and it greatly improves performance in those use cases where you frequently (de-) allocate continuations.

Is this because (ref cont)/VMContObj tables are being shoe-horned into the same underlying representation as anyref/externref/VMGcRef tables? Because VMContObj has a different representation from VMGcRef, I think it would be best to add a new wasmtime::runtime::vm::Table variant for VMContObj, the same way that we have different variants for VMGcRef tables versus *const VMFuncRef tables.

We actually extend StaticTable and TableElement in that file with appropriate variants for our types. The problem is that in TablePool::new we need to mmap enough memory to hold limits.total_tables * limits.table_elements entries, independent from what data is stored in those tables. In order to support tables storing continuations pointers, this effectively doubles the required size of the mmapped area.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 19:14):

frank-emrich submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 19:14):

frank-emrich created PR review comment:

From a technical perspective, there is no need for us to have our own config setting for the stack size.
I guess the reason we didn't just re-use the existing stack_size option is that the latter always felt like a soft limit to me: AFAIK you can still occupy stack space beyond that limit, for example with host functions.
But I have no strong feelings about re-using the existing stack_size config option, or merging our stack_switching_stack_size with async_stack_size.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 19:20):

frank-emrich submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 19:20):

frank-emrich created PR review comment:

Good question! These only exist for some hacky debug printing code that should not end up in main. Initially, when we thought that there would only be a single, big PR, the plan was to keep them during reviewing, and then remove them at the last moment before merging. Now that we will land this in smaller pieces, we could either remove them now, or once all PRs from the series have landed.

The reason why we don't want to land this is explained here: On the codegen side, we just emit the addresses of Rust string literals as literals into generated code, with no relocation information. So this crashes and burns if you ever try to re-use the same generated code from a different invocation of wamtime.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 19:22):

frank-emrich submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 19:22):

frank-emrich created PR review comment:

Haha, these were in wasmtime_environ originally, but then @alexcrichton suggested moving them into the runtime. I don't mind it either way.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 20:43):

frank-emrich submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 20:43):

frank-emrich created PR review comment:

Yeah, that drop call is definitely not pretty. We could get away with not explicitly calling drop, but managing the scopes here slightly differently. But that seems equally funky.

The main reason for all this new EntryStoreContext business is the following: Previously, there were two places that do some updates to the runtime on entering and exiting Wasm: In catch_traps (together with CallThreadState's drop) and here in func.rs (using enter_wasm/exit_wasm). I believe @alexcrichton mentioned that this split is mostly for legacy reasons due to how to things were separated into different crates until recently.

For stack switching, we now require an additional update to the runtime on entering and exiting. Intuitively, it felt to me like these changes should be made in enter_wasm/exit_wasm, but that doesn't quite work: We need to undo these changes if you panic out of a Wasm execution, so I would indeed need to do all of this in catch_traps instead, so that CallThreadState's drop can do the steps needed on exiting the runtime.

But updating the stack chain in catch_traps to signal that we are entering Wasm seemed awkward to me, so I decided to just consolidate all the logic for changes that need to happen on entering and exiting Wasm into one place, which resulted in the birth of EntryStoreContext.

TL;DR Technically, all that's needed for stack switching could all be done in catch_traps, but that felt like it's morally the wrong spot to me.

Any suggestions? Should I just move it to catch_traps?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2025 at 18:26):

fitzgen commented on PR #10388:

(Sorry for the long delay, I've moved to London and started a new job, where I needed to settle in first)

No worries. Congrats on the move and the new gig!

The reason we do not use these in verbatim is that our stacks are not the same as wasmtime_fiber::FiberStack. Trying to make our stacks compatible with that type would be quite difficult and IMO not worth the effort.

I'd like to dive more into this because I would (perhaps naively?) assume that there is no fundamental reason they couldn't be the same. This is also one of the biggest concerns I have about the the stack-switching implementation: the complexity of having multiple kinds of stacks that the runtime must manage; when there are different kinds of resources, we will eventually need different config knobs for tuning them; and etc. Basically, this feels like a bunch of cognitive overhead for both maintainers and users. Keeping things uniform keeps things simpler. For example, https://github.com/bytecodealliance/wasmtime/issues/9350 (which is admittedly a beast of a PR; I have been splitting it into smaller pieces and landing those incrementally) will ultimately reduce complexity in Wasmtime by unifying our handling of Wasm linear memories and GC heaps. I just point this out so that you know I'm not trying to pick on you, and that I am attempting to hold myself to these standards as well.

But perhaps I am missing some key point that makes it so that these stack-switching stacks and fiber stacks really do have different requirements that do not overlap enough that sharing an implementation makes sense?

You say it would be quite difficult to unify them, can you expand on what difficulties your foresee?

In particular, @dhil is very interested in restoring support for pooled allocation of stack. We had implemented our own interface for that (outside of InstanceAllocatorImpl) and it greatly improves performance in those use cases where you frequently (de-) allocate continuations.

Totally, that's exactly what I'd expect: a free list pop should be much faster than mmap, especially under load in a concurrent, multi-threaded embedding of Wasmtime.

We actually extend StaticTable and TableElement in that file with appropriate variants for our types. The problem is that in TablePool::new we need to mmap enough memory to hold limits.total_tables * limits.table_elements entries, independent from what data is stored in those tables. In order to support tables storing continuations pointers, this effectively doubles the required size of the mmapped area.

Ah okay, I get it now. You're right that we definitely don't want to cut the capacity of all existing tables in half in order to support a proposal that 99% of users won't be leveraging yet.

Two things:

  1. And this is a pre-existing issue, that calculation of table size should be a little more robust, and use something like mem::size_of::<TableElementUnion>() so that it is more future-proof for changes like we have here where we are adding new kinds of table elements.
  2. We could plumb through a boolean for whether the stack-switching proposal is enabled to this method, or the whole WasmFeatures or something, and then adjust the size of a single element in the table size calculation based on that value.

In the meantime, halving the capacity just for continuation tables (rather than all other existing tables) is fine. And we can do the above suggestions in follow up PRs.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2025 at 18:45):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2025 at 18:45):

fitzgen created PR review comment:

AFAIK you can still occupy stack space beyond that limit, for example with host functions.

We are generally a little loosey-goosey with our nomenclature here, but to make things precise for a moment:

libcalls are part of Wasmtime's trusted compute base, host functions are not (modulo some hand waving around their usage of unsafe).

Given all that: we will execute libcalls on async fibers, but will never execute host functions on async fibers. When Wasm calls an imported host function, we first switch from the async fiber to the native stack, then run the host function, and finally switch back to the Wasm's async fiber stack to pass the function results back to Wasm JIT code.

A libcall should never access stack memory beyond the async_stack_size limit (and we rely on guard pages to help enforce this). The Wasm JIT code is given max_wasm_stack bytes for its own use, it must always be true that async_stack_size > max_wasm_stack, and therefore that leaves async_stack_size - max_wasm_stack bytes of stack space for libcalls in the worst case.

This is all a little bit of an aside and I'm mostly just explaining all this to make sure we are on the same page here. I don't think any of this needs to change for stack-switching, and I would also expect that we would call libcalls, but never host functions, on stack-switching stacks, same way we do things on our async fibers.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2025 at 18:48):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2025 at 18:48):

fitzgen created PR review comment:

The reason why we don't want to land this is explained here: On the codegen side, we just emit the addresses of Rust string literals as literals into generated code, with no relocation information. So this crashes and burns if you ever try to re-use the same generated code from a different invocation of wamtime.

Ah okay! Yes, definitely should not land these bits in that case.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2025 at 18:50):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2025 at 18:50):

fitzgen created PR review comment:

Ha, lets keep it as-is for now then. I don't want to make you bounce back and forth.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2025 at 19:01):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2025 at 19:01):

fitzgen created PR review comment:

The main reason for all this new EntryStoreContext business is the following: Previously, there were two places that do some updates to the runtime on entering and exiting Wasm: In catch_traps (together with CallThreadState's drop) and here in func.rs (using enter_wasm/exit_wasm).

...

Any suggestions? Should I just move it to catch_traps?

Do you mind splitting out just the EntryStoreContext bits (without even the stack chain preservation) into its own PR? I think that would help me get a better grasp of these refactorings and provide better review and feedback, especially since the wasm entry/exit paths are both very fiddly and also perf sensitive.

I do like the idea of unifying these code paths a lot, but I just want to be very careful about changes to them in general. Honestly, it is fairly likely that we won't change the mechanisms any further than what you have already done in this PR here, but it would help a lot with alleviating my nibbling paranoia that we are introducing subtle bugs or perf regressions.

And I apologize for asking you to keep splitting these PRs even finer-grained. I know it can be a pain (again, see https://github.com/bytecodealliance/wasmtime/pull/10503) but it is super super super helpful for reviewers and making sure that we are following along with all the changes. Thanks!


Last updated: Apr 17 2025 at 08:04 UTC