Stream: git-wasmtime

Topic: wasmtime / PR #10388 Stack switching: Infrastructure and ...


view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2025 at 22:44):

frank-emrich opened PR #10388 from frank-emrich:stack-switching-infra to bytecodealliance:main:

This PR is part of a series that adds support for the Wasm stack switching proposal. The explainer document for the proposal is here. There's a tracking issue describing the overall progress and limitations here: #10248
The draft PR #10177 contains the whole (initial) implementation of the proposal.

This PR contains all of the necessary infrastructure and runtime support. In other words, this contains the entire implementation, except for codegen. Tests are added at the end, after codegen has also been added in a follow-up PR.

This was developed together with @dhil.

General implementation notes

In Wasm, continuations are represented by values of type (ref $ct), where $ct is a new composite type/heap type for continuations.
In the implementation, these are represented by values of type VMContObj. These are fat pointers, consisting of a sequence counter, and a pointer to a VMContRef. The latter type is used for the actual representation of continuations.
The sequence counter part of VMContObjs is used to check that every continuation value can only be used once.

The VMStoreContext is extended to contain a "stack chain": It indicates what stack we are currently executing on. Logically, this is a linked list of stacks, since each continuation has a parent field. The chain stored in the VMStoreContext always ends with a value representing the initial stack. This is the stack we were on when entering Wasm, which will usually be the main stack.

Memory Management

Currently, memory management is very basic: The StoreOpaque provides a new method for allocation a new VMContRef, and keeps track of all continuations created this way. Continuations are never deallocated at the moment, meaning that they live until the store itself is deallocated.

The stack memory used by each allocation (represented by type ContinuationStack) is always mmap-ed, with a guard page at the end. There is currently no support for growing stacks, or segmented stacks.

Backtrace generation

The existing backtrace generation mechanism is extended to be fully aware of continuations: When generating backtraces, the entire chain of continuation is traversed, not just the stack frames of the currently active stack/continuation.

Entering/Exiting Wasm

Prior to this PR, there were two separate mechanisms that save and restore some state of the runtime when entering/exiting Wasm:

  1. The functions enter_wasm and exit_wasm in func.rs
  2. CallThreadState saves and restores (on drop) parts of the VMStoreContext

This PR consolidates these two mechanism, because it requires some additional state to be updated and restored on enter/exit:
the type wasmtime::runtime::func::EntryStoreContext now stores all of the required runtime state and ensures that it's restored when exiting Wasm.

Tables

view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2025 at 22:44):

frank-emrich requested fitzgen for a review on PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2025 at 22:44):

frank-emrich requested wasmtime-fuzz-reviewers for a review on PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2025 at 22:44):

frank-emrich requested wasmtime-default-reviewers for a review on PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2025 at 22:44):

frank-emrich requested wasmtime-core-reviewers for a review on PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2025 at 23:44):

github-actions[bot] commented on PR #10388:

Subscribe to Label Action

cc @fitzgen

<details>
This issue or pull request has been labeled: "fuzzing", "wasmtime:api", "wasmtime:c-api", "wasmtime:config", "wasmtime:ref-types"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 01:05):

github-actions[bot] commented on PR #10388:

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:

[fuzzing-config]: https://github.com/bytecodealliance/wasmtime/blob/ca0e8d0a1d8cefc0496dba2f77a670571d8fdcab/crates/fuzzing/src/generators.rs#L182-L194
[fuzzing-docs]: https://docs.wasmtime.dev/contributing-fuzzing.html


<details>

To modify this label's message, edit the <code>.github/label-messager/wasmtime-config.md</code> file.

To add new label messages or remove existing label messages, edit the
<code>.github/label-messager.json</code> configuration file.

Learn more.

</details>

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:19):

fitzgen commented on PR #10388:

Haven't dug into the details of the PR yet, but I have some initial questions after reading the OP.

The stack memory used by each allocation (represented by type ContinuationStack) is always mmap-ed, with a guard page at the end. There is currently no support for growing stacks, or segmented stacks.

Is there a particular reason this does not use the InstanceAllocatorImpl::[de]allocate_fiber_stack methods via [store.engine().allocator().[de]allocate_fiber_stack()](https://github.com/bytecodealliance/wasmtime/blob/9da52ede33a9f8996832b0210579ef15296addbd/crates/wasmtime/src/engine.rs#L659)? That would make it so that this feature automatically integrates with the wasmtime::Config, its various knobs for stack sizes and such, and whether the pooling or on-demand allocator is in use.

Note that there is a potentially controversial change to how the pooling allocator works for tables: Since a VMContObj is 2 pointers wide, the maximum size of an entry in a table doubles. As a result, this PR doubles the amount of virtual memory space occupied by the table pool (but the amount of pages actually used stays the same).

Is this because (ref cont)/VMContObj tables are being shoe-horned into the same underlying representation as anyref/externref/VMGcRef tables? Because VMContObj has a different representation from VMGcRef, I think it would be best to add a new wasmtime::runtime::vm::Table variant for VMContObj, the same way that we have different variants for VMGcRef tables versus *const VMFuncRef tables.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:20):

fitzgen edited a comment on PR #10388:

Haven't dug into the details of the PR yet, but I have some initial questions after reading the OP.

The stack memory used by each allocation (represented by type ContinuationStack) is always mmap-ed, with a guard page at the end. There is currently no support for growing stacks, or segmented stacks.

Is there a particular reason this does not use the InstanceAllocatorImpl::[de]allocate_fiber_stack methods via store.engine().allocator().[de]allocate_fiber_stack()? That would make it so that this feature automatically integrates with the wasmtime::Config, its various knobs for stack sizes and such, and whether the pooling or on-demand allocator is in use.

Note that there is a potentially controversial change to how the pooling allocator works for tables: Since a VMContObj is 2 pointers wide, the maximum size of an entry in a table doubles. As a result, this PR doubles the amount of virtual memory space occupied by the table pool (but the amount of pages actually used stays the same).

Is this because (ref cont)/VMContObj tables are being shoe-horned into the same underlying representation as anyref/externref/VMGcRef tables? Because VMContObj has a different representation from VMGcRef, I think it would be best to add a new wasmtime::runtime::vm::Table variant for VMContObj, the same way that we have different variants for VMGcRef tables versus *const VMFuncRef tables.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:50):

fitzgen submitted PR review:

(review is still in progress, but I need to context switch for a little bit and might as well share what I have so far)

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:50):

fitzgen created PR review comment:

Were these going to be removed in the next PR, or should they be removed now? Maybe throw an item into the meta task list if not now.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:50):

fitzgen created PR review comment:

Not something that needs to be handled in this PR, but there is no reason we can't enable GC and also maintain the existing memory management for continuations, where they have the same lifetime as the Store. We don't have to implement full GC of continuations before we allow enabling both GC and stack switching at the same time.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:50):

fitzgen created PR review comment:

FWIW, we already have TRAP_INTERNAL_ASSERT that we use for debug-only assertions inside JIT code.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:50):

fitzgen created PR review comment:

I don't see a corresponding function definition in crates/c-api, I think it is missing?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:50):

fitzgen created PR review comment:

Ditto to the debug printing stuff in this file.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:50):

fitzgen created PR review comment:

fyi, I tend to do log::trace!(...) for debug prints (both temporary ones that I will remove, and for ones that are useful enough to keep in tree).

when running the CLI, you can see them by setting the WASMTIME_LOG=trace env var, or only a specific crate/module via WASMTIME_LOG=wasmtime::runtime::vm=trace, or multiple crates/modules at different levels via WASMTIME_LOG=debug,regalloc2=info,cranelift_codegen::machinst=trace.

When running tests instead of the CLI, WASMTIME_LOG becomes RUST_LOG. Most tests initialize the env-logger, but if you aren't seeing the logs, then you can put let _ = env_logger::try_init(); at the start of the test (or ideally in a shared test helper function that a bunch of related tests already call at the beginning of their execution).

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:50):

fitzgen created PR review comment:

At least for now, I'm not sure it is worth having separate config knobs for stacks created via cont.new versus API-level fibers. This effectively creates two different kinds of stacks in the system, and I'd prefer to try and keep things simple for as long as we can by having a single kind of stack that we config and allocate and handle in the system.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:50):

fitzgen created PR review comment:

Similarly, I don't see a function definition for this one either.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:52):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 17:52):

fitzgen created PR review comment:

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen submitted PR review:

(again, review still in progress, leaving intermediate notes)

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

Reminder to fill this in.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

Two things:

  1. Why does stack-switching need std?

  2. We should be able to enable stack-switching for a compiler-without-runtime build to be able to compile stack-switching Wasm to be run elsewhere, and similarly should be able to enable stack-switching in a runtime-without-compiler build to run elsewhere-compiled stack-switching Wasm modules. As written, enabling the stack-switching feature will always enable the runtime, making the compiler-without-runtime version impossible. That is, unless compilation of stack-switching isn't gated on any cargo feature, but that is kind of funky given that this cargo feature exists and I expect users would think "I want to compile Wasm programs that use stack-switching, therefore I will enable this feature". This is a long way of basically saying, I think this should not unconditionally enable the runtime and, once compilation is added in the next PR, we ultimately end up with something like this:

    rust stack-switching = [ "wasmtime-cranelift?/stack-switching", ]

Neither of these things necessarily needs to be addressed now, before this PR merges, but it may end up being easier doing it now before more stuff gets entangled, making it harder to pull the cfgs apart.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

This seems unused. Does it get used in follow up PRs?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

Because (AFAICT) these don't contain any usizes or pointers or anything else that changes with the target's word size, you could move them into the wasmtime_environ crate as

#[repr(u32)]
pub enum VMControlEffect {
    Return = 0,
    // ...
}

and avoid the wordy and kind of annoying split between the definitions of the enums and their discriminants.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

These sorts of config knobs that need to exist at the wasmtime_environ level usually go inside wasmtime_environ::Tunables. Unless there is a reason not to do that here, we should follow the existing convention.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

Nitpick: both the Engine and Module variants are interned, so perhaps we can rename this method to unwrap_module_type_index?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

This mem::drop dance is kind of funky -- can we instead pass ownership into catch_traps and let it handle the dropping, if necessary? Or are there callers that need to do different things?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

You'll also want to add some CI checks for these features being enabled/disabled in various combinations like this:

https://github.com/bytecodealliance/wasmtime/blob/main/.github/workflows/main.yml#L363-L365

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

If they are written to directly from Wasm JIT code, then probably yes. This struct should also be #[repr(C)] in that case.

If it is not written to directly from Wasm, then it would only need to be unsafe cells if you are ever violating Rust's mutable/exclusive xor immutable/shared semantics.

I haven't read enough of these changes to say any more than that yet.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

Can you run the function call microbenchmarks on main and on this branch? I want to make sure that we aren't regressing performance here, as this path is very hot, and if we are we may need to figure out how to cfg(...) some of this stuff so that it only happens when the stack-switching feature is enabled.

This will run the microbenchmarks and will check the results for statistical significance:

$ git checkout main
$ cargo bench --bench call -- --save-baseline main
$ git checkout stack-switching-infra
$ cargo bench --bench call -- --baseline main

I can help figure things out if we need to do something here.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2025 at 21:24):

fitzgen created PR review comment:

Can you add an item to the meta task list for the embedder API, if you haven't already?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 07:50):

dhil submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 07:50):

dhil created PR review comment:

I think this is a stray declaration from some of our downstream experiments. It has been needed to capture the C shadow stack pointer, that clang generates. It must be kept in sync with the stack switching done by the engine. To be clear, this is just due to the fact that clang doesn't know about stack switching.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 07:51):

dhil submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 07:51):

dhil created PR review comment:

Same reason as above.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 07:51):

dhil edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 15:30):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 15:30):

fitzgen created PR review comment:

FWIW, exposing these things the C API would be good in general, so if you want to do it here or in a new PR, you're more than welcome to!

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 15:34):

dhil submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 15:34):

dhil created PR review comment:

Yes sure! I am leaning towards adding this in a follow-up PR focused on the C API. What do you think @frank-emrich ?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 15:37):

dhil submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 15:37):

dhil created PR review comment:

As I discussed with @alexcrichton (https://github.com/bytecodealliance/wasmtime/pull/10177/files#r1965733715), I'd be OK with no bespoke stack switching configuration for now.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 16:09):

dhil submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 16:09):

dhil created PR review comment:

This is an interesting question, which Frank and I have discussed in the past. If my memory serves me right, then our layout is slightly different. However perhaps we can unify the layouts (or we can perhaps simply adopt the fibers' layout). I am not too sure about the implications, I think @frank-emrich has the key knowledge to best answer this question.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 17:59):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2025 at 17:59):

fitzgen created PR review comment:

The automatic pooling allocator integration would also get you fast allocation of new stacks from a pool, which would look nice on your benchmarks and what have you ;)

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2025 at 10:19):

dhil submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2025 at 10:20):

dhil created PR review comment:

My take is that they shouldn't make it into the main branch -- probably a stray include?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2025 at 10:21):

dhil submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2025 at 10:21):

dhil created PR review comment:

Yes, agreed! I totally see the appeal!

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2025 at 10:24):

dhil submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2025 at 10:24):

dhil created PR review comment:

Yes, it gets used during the code translation. It should arguably be added along with it in the next patch.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 19:09):

frank-emrich commented on PR #10388:

(Sorry for the long delay, I've moved to London and started a new job, where I needed to settle in first)

@fitzgen

Is there a particular reason this does not use the InstanceAllocatorImpl::[de]allocate_fiber_stack methods via store.engine().allocator().[de]allocate_fiber_stack()? That would make it so that this feature automatically integrates with the wasmtime::Config, its various knobs for stack sizes and such, and whether the pooling or on-demand allocator is in use.

The reason we do not use these in verbatim is that our stacks are not the same as wasmtime_fiber::FiberStack. Trying to make our stacks compatible with that type would be quite difficult and IMO not worth the effort.
However, our medium-term plan was to add similar allocation and deallocation functions to InstanceAllocatorImpl. This shouldn't be too much work, but something we didn't see through for now. In particular, @dhil is very interested in restoring support for pooled allocation of stack. We had implemented our own interface for that (outside of InstanceAllocatorImpl) and it greatly improves performance in those use cases where you frequently (de-) allocate continuations.

Is this because (ref cont)/VMContObj tables are being shoe-horned into the same underlying representation as anyref/externref/VMGcRef tables? Because VMContObj has a different representation from VMGcRef, I think it would be best to add a new wasmtime::runtime::vm::Table variant for VMContObj, the same way that we have different variants for VMGcRef tables versus *const VMFuncRef tables.

We actually extend StaticTable and TableElement in that file with appropriate variants for our types. The problem is that in TablePool::new we need to mmap enough memory to hold limits.total_tables * limits.table_elements entries, independent from what data is stored in those tables. In order to support tables storing continuations pointers, this effectively doubles the required size of the mmapped area.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 19:14):

frank-emrich submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 19:14):

frank-emrich created PR review comment:

From a technical perspective, there is no need for us to have our own config setting for the stack size.
I guess the reason we didn't just re-use the existing stack_size option is that the latter always felt like a soft limit to me: AFAIK you can still occupy stack space beyond that limit, for example with host functions.
But I have no strong feelings about re-using the existing stack_size config option, or merging our stack_switching_stack_size with async_stack_size.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 19:20):

frank-emrich submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 19:20):

frank-emrich created PR review comment:

Good question! These only exist for some hacky debug printing code that should not end up in main. Initially, when we thought that there would only be a single, big PR, the plan was to keep them during reviewing, and then remove them at the last moment before merging. Now that we will land this in smaller pieces, we could either remove them now, or once all PRs from the series have landed.

The reason why we don't want to land this is explained here: On the codegen side, we just emit the addresses of Rust string literals as literals into generated code, with no relocation information. So this crashes and burns if you ever try to re-use the same generated code from a different invocation of wamtime.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 19:22):

frank-emrich submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 19:22):

frank-emrich created PR review comment:

Haha, these were in wasmtime_environ originally, but then @alexcrichton suggested moving them into the runtime. I don't mind it either way.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 20:43):

frank-emrich submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2025 at 20:43):

frank-emrich created PR review comment:

Yeah, that drop call is definitely not pretty. We could get away with not explicitly calling drop, but managing the scopes here slightly differently. But that seems equally funky.

The main reason for all this new EntryStoreContext business is the following: Previously, there were two places that do some updates to the runtime on entering and exiting Wasm: In catch_traps (together with CallThreadState's drop) and here in func.rs (using enter_wasm/exit_wasm). I believe @alexcrichton mentioned that this split is mostly for legacy reasons due to how to things were separated into different crates until recently.

For stack switching, we now require an additional update to the runtime on entering and exiting. Intuitively, it felt to me like these changes should be made in enter_wasm/exit_wasm, but that doesn't quite work: We need to undo these changes if you panic out of a Wasm execution, so I would indeed need to do all of this in catch_traps instead, so that CallThreadState's drop can do the steps needed on exiting the runtime.

But updating the stack chain in catch_traps to signal that we are entering Wasm seemed awkward to me, so I decided to just consolidate all the logic for changes that need to happen on entering and exiting Wasm into one place, which resulted in the birth of EntryStoreContext.

TL;DR Technically, all that's needed for stack switching could all be done in catch_traps, but that felt like it's morally the wrong spot to me.

Any suggestions? Should I just move it to catch_traps?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2025 at 18:26):

fitzgen commented on PR #10388:

(Sorry for the long delay, I've moved to London and started a new job, where I needed to settle in first)

No worries. Congrats on the move and the new gig!

The reason we do not use these in verbatim is that our stacks are not the same as wasmtime_fiber::FiberStack. Trying to make our stacks compatible with that type would be quite difficult and IMO not worth the effort.

I'd like to dive more into this because I would (perhaps naively?) assume that there is no fundamental reason they couldn't be the same. This is also one of the biggest concerns I have about the the stack-switching implementation: the complexity of having multiple kinds of stacks that the runtime must manage; when there are different kinds of resources, we will eventually need different config knobs for tuning them; and etc. Basically, this feels like a bunch of cognitive overhead for both maintainers and users. Keeping things uniform keeps things simpler. For example, https://github.com/bytecodealliance/wasmtime/issues/9350 (which is admittedly a beast of a PR; I have been splitting it into smaller pieces and landing those incrementally) will ultimately reduce complexity in Wasmtime by unifying our handling of Wasm linear memories and GC heaps. I just point this out so that you know I'm not trying to pick on you, and that I am attempting to hold myself to these standards as well.

But perhaps I am missing some key point that makes it so that these stack-switching stacks and fiber stacks really do have different requirements that do not overlap enough that sharing an implementation makes sense?

You say it would be quite difficult to unify them, can you expand on what difficulties your foresee?

In particular, @dhil is very interested in restoring support for pooled allocation of stack. We had implemented our own interface for that (outside of InstanceAllocatorImpl) and it greatly improves performance in those use cases where you frequently (de-) allocate continuations.

Totally, that's exactly what I'd expect: a free list pop should be much faster than mmap, especially under load in a concurrent, multi-threaded embedding of Wasmtime.

We actually extend StaticTable and TableElement in that file with appropriate variants for our types. The problem is that in TablePool::new we need to mmap enough memory to hold limits.total_tables * limits.table_elements entries, independent from what data is stored in those tables. In order to support tables storing continuations pointers, this effectively doubles the required size of the mmapped area.

Ah okay, I get it now. You're right that we definitely don't want to cut the capacity of all existing tables in half in order to support a proposal that 99% of users won't be leveraging yet.

Two things:

  1. And this is a pre-existing issue, that calculation of table size should be a little more robust, and use something like mem::size_of::<TableElementUnion>() so that it is more future-proof for changes like we have here where we are adding new kinds of table elements.
  2. We could plumb through a boolean for whether the stack-switching proposal is enabled to this method, or the whole WasmFeatures or something, and then adjust the size of a single element in the table size calculation based on that value.

In the meantime, halving the capacity just for continuation tables (rather than all other existing tables) is fine. And we can do the above suggestions in follow up PRs.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2025 at 18:45):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2025 at 18:45):

fitzgen created PR review comment:

AFAIK you can still occupy stack space beyond that limit, for example with host functions.

We are generally a little loosey-goosey with our nomenclature here, but to make things precise for a moment:

libcalls are part of Wasmtime's trusted compute base, host functions are not (modulo some hand waving around their usage of unsafe).

Given all that: we will execute libcalls on async fibers, but will never execute host functions on async fibers. When Wasm calls an imported host function, we first switch from the async fiber to the native stack, then run the host function, and finally switch back to the Wasm's async fiber stack to pass the function results back to Wasm JIT code.

A libcall should never access stack memory beyond the async_stack_size limit (and we rely on guard pages to help enforce this). The Wasm JIT code is given max_wasm_stack bytes for its own use, it must always be true that async_stack_size > max_wasm_stack, and therefore that leaves async_stack_size - max_wasm_stack bytes of stack space for libcalls in the worst case.

This is all a little bit of an aside and I'm mostly just explaining all this to make sure we are on the same page here. I don't think any of this needs to change for stack-switching, and I would also expect that we would call libcalls, but never host functions, on stack-switching stacks, same way we do things on our async fibers.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2025 at 18:48):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2025 at 18:48):

fitzgen created PR review comment:

The reason why we don't want to land this is explained here: On the codegen side, we just emit the addresses of Rust string literals as literals into generated code, with no relocation information. So this crashes and burns if you ever try to re-use the same generated code from a different invocation of wamtime.

Ah okay! Yes, definitely should not land these bits in that case.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2025 at 18:50):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2025 at 18:50):

fitzgen created PR review comment:

Ha, lets keep it as-is for now then. I don't want to make you bounce back and forth.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2025 at 19:01):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2025 at 19:01):

fitzgen created PR review comment:

The main reason for all this new EntryStoreContext business is the following: Previously, there were two places that do some updates to the runtime on entering and exiting Wasm: In catch_traps (together with CallThreadState's drop) and here in func.rs (using enter_wasm/exit_wasm).

...

Any suggestions? Should I just move it to catch_traps?

Do you mind splitting out just the EntryStoreContext bits (without even the stack chain preservation) into its own PR? I think that would help me get a better grasp of these refactorings and provide better review and feedback, especially since the wasm entry/exit paths are both very fiddly and also perf sensitive.

I do like the idea of unifying these code paths a lot, but I just want to be very careful about changes to them in general. Honestly, it is fairly likely that we won't change the mechanisms any further than what you have already done in this PR here, but it would help a lot with alleviating my nibbling paranoia that we are introducing subtle bugs or perf regressions.

And I apologize for asking you to keep splitting these PRs even finer-grained. I know it can be a pain (again, see https://github.com/bytecodealliance/wasmtime/pull/10503) but it is super super super helpful for reviewers and making sure that we are following along with all the changes. Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 20 2025 at 21:00):

frank-emrich commented on PR #10388:

@fitzgen

I'd like to dive more into this because I would (perhaps naively?) assume that there is no fundamental reason they couldn't be the same. This is also one of the biggest concerns I have about the the stack-switching implementation: the complexity of having multiple kinds of stacks that the runtime must manage; when there are different kinds of resources, we will eventually need different config knobs for tuning them; and etc. Basically, this feels like a bunch of cognitive overhead for both maintainers and users. Keeping things uniform keeps things simpler. For example, #9350 (which is admittedly a beast of a PR; I have been splitting it into smaller pieces and landing those incrementally) will ultimately reduce complexity in Wasmtime by unifying our handling of Wasm linear memories and GC heaps. I just point this out so that you know I'm not trying to pick on you, and that I am attempting to hold myself to these standards as well.

But perhaps I am missing some key point that makes it so that these stack-switching stacks and fiber stacks really do have different requirements that do not overlap enough that sharing an implementation makes sense?

You say it would be quite difficult to unify them, can you expand on what difficulties your foresee?

I totally get the wish to make things more maintainable by avoiding duplicating as much stack management logic as possible.

The short answer is: If we had a FiberStack type that means "just a big chunk of memory used for an independent stack, allocated with the virtual memory mechanism of the current platform" we could share them between our stack switching implementation and wasmtime_fiber. But in its current form, the wasmtime_fiber::FiberStack is just naturally tailored to be used together with wasmtime::Fiber, and bakes in details about how stack switching is done there.

Longer answer: Currently, wasmtime_fiber::FiberStack is just a wrapper around a platform specific implementation. If we look at unix::FiberStack, the documentation there shows how a unix::FiberStack is used by a unix::Fiber. This influences the layout of the data at the top of the stack (see the diagram in crates/fiber/src/unix.rs). In contrast, our layout of the header near the top of a stack is different (see crates/wasmtime/src/runtime/vm/stack_switching/stack/unix.rs).

For example, wasmtime_fiber::FiberStack expects that the wasmtime_fiber_switch function stores the IP and RP on the actual stack where we left off on the stack we depart. In contrast, when you, me and @alexcrichton discussed the stack_switch CLIF instruction we decided that it would be nicer to store only the data GPRs on the stack itself, and write the remaining data (like IP, RBP) elsewhere. We now write this to the header of the stack near its top.

It may be possible to move both implementations (i.e., wasmtime_fiber and our stack switching) closer to each other, so that they used the exact same layout of FiberStack, but that may require some compromising, ultimately making things more brittle. For example, both implementations would need to agree on a type for passing status information and payloads (see wasmtime_fiber::RunResult). Do let me know if you would like me to elaborate on the details of what would be required further.

If we leave the Unix world, things become even more complicated: The Windows implementation of wasmtime_fiber::FiberStack doesn't manage the stack memory directly (making wasmtime_fiber::FiberStack mostly just a dummy), and wasmtime_fiber::Fiber uses the Windows Fiber API to create and run stacks. My plan for the Windows implementation of the stack switching proposal was not to use that API, but do the stack switching in generated code. Thus, there is quite a difference between what wasmtime_fiber::FiberStack currently does on Windows, and what we'd need.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 20 2025 at 21:07):

frank-emrich commented on PR #10388:

@fitzgen Regarding tables with continuations

Ah okay, I get it now. You're right that we definitely don't want to cut the capacity of all existing tables in half in order to support a proposal that 99% of users won't be leveraging yet.

[...] We could plumb through a boolean for whether the stack-switching proposal is enabled to this method, or the whole WasmFeatures or something, and then adjust the size of a single element in the table size calculation based on that value.

In the meantime, halving the capacity just for continuation tables (rather than all other existing tables) is fine. And we can do the above suggestions in follow up PRs.

Just halving the max allowed capacity for the time being and then plumbing through whether the stack switching feature is enabled in the longer terms sounds good to me.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 20 2025 at 21:23):

frank-emrich submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 20 2025 at 21:23):

frank-emrich created PR review comment:

I've run these commands, the results are here:
https://gist.github.com/frank-emrich/f7f80adc8005900a3e245d2400149b38

(But no comparing the stack-switching-infra branch against actual main, but the merge base with main, since main has moved forward quite a bit since the last merge)

I'm not quite sure how to interpret the output, would you mind having a look?

I will re-run this once I follow through with your suggestion of splitting out most of the EntryStoreContext into a separate PR.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 21 2025 at 11:10):

frank-emrich updated PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 21 2025 at 21:47):

frank-emrich updated PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 21 2025 at 22:27):

frank-emrich submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 21 2025 at 22:27):

frank-emrich created PR review comment:

No problem, making the changes to this critical bit of code more reviewable is perfectly sensible. I've created a PR with the EntryStoreContext stuff minus stack-switching fields here: #10626
Once that is landed it will be easy to see what additional stuff this PR needs to add to EntryStoreContext (not much, really)

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2025 at 16:37):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2025 at 16:37):

fitzgen created PR review comment:

Are these benchmark results for this whole branch, or for just the EntryStoreContext changes in https://github.com/bytecodealliance/wasmtime/pull/10626?

Unfortunately, it looks like there are a bunch of regressions, and enough of them that it doesn't seem likely to be a statistical fluke. But that doesn't mean that this is all a non-starter. What this likely means is that there are some functions that need to be marked #[inline] and/or split into fast and slow paths, roughly like

#[inline]
fn foo(&self, ...) -> Option<Bar> {
    if !self.is_foo_enabled() {
        return None;
    }

    self.foo_slow_path()
}

#[inline(never)]
fn foo_slow_path(&self) -> Option<Bar> {
    // ...
}

A good place to start addressing this would be in https://github.com/bytecodealliance/wasmtime/pull/10626, since the set of changes is smaller and easier to get a handle on the total effect of changes. I can help out and do a little profiling later today.

The workflow is basically run a regressing benchmark in profiling mode and attach perf/samply/instruments/vtune/whatever your profiler of choice is, look at the top self-time functions, and then check whether it is necessary to be called (i.e. can't be conditional on some feature being used), whether it is being inlined, whether it can be split into fast and slow paths, etc...

Again, I can do a bit of this later today, so don't stress out too much. I don't want to hold these PRs up.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2025 at 16:42):

fitzgen commented on PR #10388:

Just halving the max allowed capacity for the time being and then plumbing through whether the stack switching feature is enabled in the longer terms sounds good to me.

Just making sure we are on the same page: specifically only continuation-ref tables should have their capacity halved. funcref and VMGcRef tables should be unaffected.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2025 at 22:06):

frank-emrich updated PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2025 at 20:56):

fitzgen commented on PR #10388:

It may be possible to move both implementations (i.e., wasmtime_fiber and our stack switching) closer to each other, so that they used the exact same layout of FiberStack, but that may require some compromising, ultimately making things more brittle. For example, both implementations would need to agree on a type for passing status information and payloads (see wasmtime_fiber::RunResult). Do let me know if you would like me to elaborate on the details of what would be required further.

I guess that what I am imagining is that, despite whatever headers or whatnot that they layer onto the stack, they share the same underying mechanism (and Config knobs and all that) to allocate/deallocate them and manage them in the runtime (i.e. they come from the same pool when using the pooling allocator). This would not require that we unify their stack switching code paths (since one is in a mix of Rust code and inline assembly and the other is implemented in JIT code) but it would mean that we would have some kind of VMRawStack type or some such that manages the underlying memory allocation/mmap/fiber, is allocated/deallocated via the InstanceAllocator trait, the pooling allocator (for example) would pool these objects, and then both wasmtime_fiber::FiberStack and ContinuationStack would be thin-ish wrappers around that VMRawStack allocation.

That said, this refactoring doesn't need to happen now, before this PR merges. I think it can be left for follow ups. However, we wouldn't do things like enable the stack-switching feature by default in Wasmtime, for example, until after this was resolved.

If we leave the Unix world, things become even more complicated: The Windows implementation of wasmtime_fiber::FiberStack doesn't manage the stack memory directly (making wasmtime_fiber::FiberStack mostly just a dummy), and wasmtime_fiber::Fiber uses the Windows Fiber API to create and run stacks. My plan for the Windows implementation of the stack switching proposal was not to use that API, but do the stack switching in generated code. Thus, there is quite a difference between what wasmtime_fiber::FiberStack currently does on Windows, and what we'd need.

My understanding of the constraints for Windows is that, even if we emit inline code for bookkeeping the metadata required in Windows, the allocation and deallocation of fiber stacks must happen through CreateFiber[Ex] (or ConvertThreadToFiber) and DeleteFiber. In this case, having a shared VMRawStack that understands and abstracts this stuff away should hopefully actually make things easier for us here.


Anyways, let me know when all the TODOs and previous comments have been addressed, and this is ready for another round of review. Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (May 22 2025 at 21:14):

frank-emrich updated PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (May 23 2025 at 17:55):

posborne commented on PR #10388:

@fitzgen I (hopefully) have addressed all the review comments outstanding on this, though happy to address anything I missed or new things that need to be addressed. I'm working on getting a merge of upstream back into the branch validated which I'll use as a base to get criterion numbers to compare to see what perf regressions we might need to fix up.

With the pooling allocator for tables, I did some small renaming -- the change will eagerly use as much space as is available in the allocated pages rather than being strictly half; in general, I don't think this will cause confusion for users. Another approach could be to just move to have consumers specify the number of bytes / pages for tables rather than number of (nominal) elements to avoid the mildly awkward dance that is now present.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2025 at 17:16):

posborne updated PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2025 at 17:23):

posborne updated PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2025 at 20:01):

posborne updated PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2025 at 20:20):

posborne requested abrown for a review on PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2025 at 20:20):

posborne updated PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2025 at 20:20):

posborne requested wasmtime-compiler-reviewers for a review on PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2025 at 21:44):

posborne updated PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2025 at 21:51):

posborne updated PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2025 at 21:55):

posborne updated PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (May 28 2025 at 16:04):

posborne updated PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (May 28 2025 at 16:36):

posborne commented on PR #10388:

Apologies on the noise; I'm going to move testing to a fork short term to push through hopefully the last set of changes to do a more complete conditional compilation pass based on what is done for gc.

view this post on Zulip Wasmtime GitHub notifications bot (May 28 2025 at 17:24):

fitzgen commented on PR #10388:

@fitzgen I (hopefully) have addressed all the review comments outstanding on this, though happy to address anything I missed or new things that need to be addressed.

Great! I'll give another round of review in that case.

Another approach could be to just move to have consumers specify the number of bytes / pages for tables rather than number of (nominal) elements to avoid the mildly awkward dance that is now present.

Yes, this is what we should probably do (in a different / follow up PR)

view this post on Zulip Wasmtime GitHub notifications bot (May 28 2025 at 19:22):

posborne updated PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen submitted PR review:

Looking good! I have a ton of comments below, they are all nitpicks really, so this should be good to land as soon as they are fixed.

Thanks everyone!

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

Can you add an offset assertion for this field to to the test_vmstore_context::field_offsets test just below here?

And similar for all the other new offset definitions in environ, we want to make sure that they actually match our expectations at runtime.

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

Ditto

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

Very nitty pick: weird to have an empty line after a #[cfg(...)], better to have it right on top of the thing it is conditionally turning on/off.

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

Slightly cleaner / more idiomatic to use if let Some to avoid the unsafe constructor and .cast::<T>() to avoid as conversions which can be a little wider and more footgun-y than desirable:

    let init_value = if let Some(contref) = NonNull::new(init_value_contref.cast::<VMContRef>()) {
        // ...
    } else {
        None
    };

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

Can we rename this to something like VMHostArray or something like that so that readers do not mistake it for the wasmtime::runtime::vm::gc::VMArrayRef type?

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

I don't think this is necessary (anymore)? I don't see any casts below.

If I am just missing it, can we put the allow right on the cast, instead of on the whole function? Or alternatively define and use a helper like https://github.com/bytecodealliance/wasmtime/blob/be0ba4b83e60c17d49b3c7149334fbf587487cbc/crates/wasmtime/src/runtime/vm/gc/enabled.rs#L28

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

Each (non-absent) VMContRef has a pointer to the last ancestor, not just the youngest VMContRef, right?

If so, it probably makes sense to depict that in this ascii diagram:

///                 |---|   VMContRef    |

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

Hmm... This is a little scary. I think that ideally we should never be running host functions on a continuation stack (as opposed to libcalls, which can (potentially temporarily before switching to the native stack) execute on the fiber stack).

In general, our invariants for async code and our existing fiber stacks, in order to support suspension, are that we always switch off the async stack and back to the native stack to run host functions, let them suspend and do async-y / event-loop-y stuff, and when Wasm is finally resumed we switch back to the fiber stack. The invariants here around which methods expect certain stack state are pretty subtle and not always super well documented, and this is kind of making me nervous again about the split between continuation vs fiber stacks.

Growing memory, for example, might try to stack switch back to the main thread to run the host's async limiter hook (if configured) and then when that resolves will try and switch back to the wasm fiber stack to continue the memory growth. I think that pair of operations is currently safe to perform from a non-fiber stack, and I think we will switch back to the correct place. But I am not 100% confident. A test case that configures an async limiter, runs Wasm that creates and switches to a new continuation stack and then does a memory.grow from that continuation stack is something we should definitely have, at minimum.

I just pinged @alexcrichton about this and he also pointed out that if we are calling host code on continuation stacks, we won't ever be able to have Wasm stacks that look different from native stacks (no resizable stacks (modulo virtual memory tricks) or segmented stacks, continuation stacks must have a guard page at the end, etc...). That is all probably okay, but we should be aware of the constraints we are adopting for ourselves here.

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

https://doc.rust-lang.org/nightly/std/primitive.usize.html#method.next_multiple_of

            size.next_multiple_of(page_size)

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

                let target = tos.sub(tos_neg_offset).cast::<usize>();

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

                usize::try_from(tos).unwrap() - 0x20 - s

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

VMRuntimeLimits was renamed to VMStoreContext, so this (and elsewhere in this comment) should be updated accordingly.

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

/// Additionally, a `CommonStackInformation` object is associated with

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

            let s = usize::try_from(args_capacity).unwrap() * std::mem::size_of::<ValRaw>();

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

Ugh, annoying that .max() is not const

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

Mind turning this if into a match and then adding the comments for why we can skip tracing for the other states into each of the relevant match arms? This will be a little more future proof in the face of evolving states.

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

"It must never return" but then at the bottom there is a comment that "after this function returns, ..."

Which is it supposed to be?

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

#[cfg(feature = "pooling-allocator")]

perhaps?

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

    #[cfg(feature = "gc")]

perhaps?

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

Can you add a TODO item to the stack-switching implementation meta issue to extend our stack-capturing fuzzer to capture backtraces across stack suspensions and assert that the backtrace is still what we expect it to be?

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

Can this be rewritten to use slice::chunks and zip? And also narrow the unsafe to just the relevant blocks (because I had a hard time figuring out why it was needed at a glance). I think this would make the code a bit safer and easier to read. Something like this (although preserving comments where appropriate):

        for (conts, parent_limits) in continuations_vec.chunks(2).zip(stack_limits_vec.iter().skip(1)) {
            let continuation = conts[0];
            let continuation = unsafe { &*continuation };

            let parent_limits = unsafe { &*parent_limits };

            let parent_continuation = conts.get(1).map(|p| unsafe { &*p });

            // ...

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

Table changes look good to me :+1:

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:15):

fitzgen created PR review comment:

Might also make sense to have a fn VMContObj::from_raw_parts(value: *mut u8, revision: u64) -> Option<Self> helper method or something to avoid repeating these same few lines all over the place

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:49):

frank-emrich submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:49):

frank-emrich created PR review comment:

Each (non-absent) VMContRef has a pointer to the last ancestor, not just the youngest VMContRef, right?

No, it's just the youngest one that has the last_ancestor pointer set. That may seem a bit odd, but we would never read the last_ancestor field of the intermediate VMContRefs and it would be quite cumbersome to set these pointers on the intermediate continuations when suspending.

At least we can express precisely when the last_ancestor field contains usable data
in terms of the continuation's state (as represented by my_vmcontref.common_stack_information.state). There's a comment with the details on the last_ancestor field in the struct definition.

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:55):

frank-emrich submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 21:55):

frank-emrich created PR review comment:

Ha, well spotted! I must have forgotten updating the comment here, fiber_start is allowed to return now because wasmtime_continuation_start contains the logic for going back to the parent these days.

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 22:17):

frank-emrich submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 29 2025 at 22:17):

frank-emrich created PR review comment:

Ah, sorry, I must have overlooked your last sentence in the linked thread:

[...] and I would also expect that we would call libcalls, but never host functions, on stack-switching stacks, same way we do things on our async fibers

We do indeed allow calling host functions from our continuation stacks at the moment, and even re-entering Wasm from such a host function leading to calls chains chains of the following shape:

   Wasm function  (on main stack)
-> Wasm function  (on continuation stack 1)
-> host function  (on continuation stack 1)
-> Wasm function  (on continuation stack 1)
-> Wasm function  (on continuation stack 2)
-> host function  (on continuation stack 2)
...

I think @dhil had a use-case where allowing such re-entrance is useful, but I see how running host functions on continuation stacks has its problems.

It's easy to check if we are currently running on a continuation stack, but that would have to happen at every call to a host function, right? But I guess there must already be such a check for whether we are on a Fiber stack (for the async feature) that we can adapt?

view this post on Zulip Wasmtime GitHub notifications bot (May 30 2025 at 19:24):

posborne created PR review comment:

Patch will land in a new VMContObj::from_raw_parts impl.

view this post on Zulip Wasmtime GitHub notifications bot (May 30 2025 at 19:24):

posborne submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 30 2025 at 19:35):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 30 2025 at 19:35):

fitzgen created PR review comment:

Those sequence of calls should definitely be supported, but ideally we would automatically switch to the native host stack and then back again while executing the actual host functions. That said, I think this PR is fine for now, and this is something we can address in the future.

It's easy to check if we are currently running on a continuation stack, but that would have to happen at every call to a host function, right? But I guess there must already be such a check for whether we are on a Fiber stack (for the async feature) that we can adapt?

We don't dynamically check, we have different paths for constructing async and sync wasmtime::Funcs (and we have various assertions that the constructor you use matches the engine's config):

view this post on Zulip Wasmtime GitHub notifications bot (May 30 2025 at 19:36):

fitzgen edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (May 30 2025 at 19:39):

posborne submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 30 2025 at 19:39):

posborne created PR review comment:

Looking at this again, this whole match_feature! should be dropped at is now covered by an earlier chunk of code (basically doing the macro by hand to support both features). Good catch.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2025 at 17:58):

posborne submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2025 at 17:58):

posborne created PR review comment:

@fitzgen In addressing the latest review items, I'm trying to work out what I should target in terms of conversions of types into usize. I see a few cases with what I believe is the requested conversion.

I'll admit the TryFrom/unwrap for types like u32 feels gross to me with it being preferred to fail compilation for types that can't be converted always be cast correctly.

As an aside, I did to a quick check in godbolt to confirm that the TryFrom for u32 -> usize does optimize out, so shouldn't be a perf impact, so I suppose this is mostly a readabilty/cosmetic concern.


In this particular case tos is a *mut u8 so the TryFrom approach isn't available and I'm assuming it should remain an as usize cast.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2025 at 21:37):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2025 at 21:37):

fitzgen created PR review comment:

Here are the general rules to follow:

First, if there are infallible From/Into impls available, use those.

For conversions with usize, Rust doesn't add/remove infallible conversions to u32/u64 depending on the target, so that source code is more portable by default. In these cases, we should use TryFrom/TryInto and unwrap(), and the compiler will boil it all away, as you mentioned.

For other technically-fallible conversions that shouldn't actually fail due to program invariants, like u64 to u32 where we know we never exceed u32::MAX, we should also use TryFrom/TryInto and unwrap() so that things fail loudly if our invariants aren't upheld.

For conversions to usize from pointers, we use as today, but eventually should be using ptr.addr() and strict provenance APIs once we can rely on Rust 1.84 and greater.


If I messed up any specific review comment, feel free to just leave a comment on that explaining my whoopsie, as long as we are abiding by the above general rules.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2025 at 21:50):

posborne submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2025 at 21:50):

posborne created PR review comment:

Thanks for the guidance, makes sense to me; I talked a bit with @sunfishcode in our 1:1 about strict provenance this afternoon as well, so thanks for covering that. I'll update based on those recommendations.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2025 at 21:56):

posborne submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2025 at 21:56):

posborne created PR review comment:

It does appear that the wasmtime MSRV on main is now 1.85.0 which should mean that we could rely on the stabilized strict provenance APIs (or use sptr::Strict).

view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2025 at 17:32):

posborne updated PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2025 at 17:35):

posborne updated PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2025 at 17:36):

posborne submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2025 at 17:36):

posborne created PR review comment:

Adopted this general approach, worked out well.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2025 at 17:38):

posborne submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2025 at 17:38):

posborne created PR review comment:

The tests in the environ crate for stack switching seemed to cover the other cases already; I did land a commit reorganizing the location of the tests there to be in their own module to aid discovery and be a bit more in line with the prevailing pattern for test inclusion.

Let me know if those don't cover what you had in mind for assertions for the environ bits.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2025 at 17:42):

posborne commented on PR #10388:

@fitzgen I think there's now changes to address the latest round of feedback; with how gh permissions work I can't directly modify the description for the tracking issue but I have notes and will add a comment with the new items to add that a maintainer or @frank-emrich should be able to pull into the description body.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2025 at 17:47):

posborne submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2025 at 17:47):

posborne created PR review comment:

Missed this in my first pass; should land a patch shortly with this change.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2025 at 18:33):

fitzgen submitted PR review:

LGTM modulo the comment that you forgot to address

view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2025 at 19:14):

posborne updated PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2025 at 19:16):

posborne submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2025 at 19:16):

posborne created PR review comment:

Updated now; there wasn't a previous document comment on VMStackState::Returned which I grouped with Fresh but didn't dig into that case in detail not being particularly familiar with the gc impl at this point.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2025 at 22:47):

alexcrichton submitted PR review:

This all looks good to me as well, thanks again for pushing on this to everyone involved!

One minor thing I noticed reading over this and I also filed a few points in https://github.com/bytecodealliance/wasmtime/issues/10248 under miscellaneous TODOs at the end

view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2025 at 22:47):

alexcrichton created PR review comment:

Should this be left out?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2025 at 22:47):

alexcrichton commented on PR #10388:

(I'm going to go ahead and flag this for merge though to carry through Nick's approval, my small thing can be fixed after)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2025 at 23:08):

alexcrichton merged PR #10388.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 05 2025 at 15:51):

posborne created PR review comment:

Yeah, missed these when I removed most of the other bespoke debug assertion bits; will add to #10248 TODOs.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 05 2025 at 15:51):

posborne submitted PR review.


Last updated: Dec 13 2025 at 19:03 UTC