Stream: git-wasmtime

Topic: wasmtime / issue #13322 Stack-switching: Corruption of Ca...


view this post on Zulip Wasmtime GitHub notifications bot (May 07 2026 at 23:18):

alexcrichton opened issue #13322:

This input:

use wasmtime::*;

fn main() -> Result<()> {
    let mut config = Config::new();
    config.wasm_stack_switching(true);
    config.wasm_function_references(true);
    config.wasm_exceptions(true);
    let engine = Engine::new(&config)?;
    let mut store = Store::new(&engine, ());

    let module = Module::new(
        &engine,
        r#"
        (module
          (type $ft (func))
          (tag $t (type $ft))
          (type $ct (cont $ft))

          (import "host" "h" (func $h_import))

          (func (export "inner"))

          (func $callee
            (call $h_import)
            (suspend $t)
          )
          (elem declare func $callee)

          (func (export "go") (result i32)
            (block $h (result (ref null $ct))
              (resume $ct (on $t $h) (cont.new $ct (ref.func $callee)))
              (return (i32.const 0))
            )
            (drop)
            (i32.const 1)
          )
        )
    "#,
    )?;

    let h = Func::wrap(&mut store, |mut caller: Caller<'_, ()>| -> Result<()> {
        let inner = caller
            .get_export("inner")
            .and_then(|e| e.into_func())
            .expect("inner export");
        let inner = inner.typed::<(), ()>(&caller)?;
        inner.call(&mut caller, ())?;
        Ok(())
    });

    let instance = Instance::new(&mut store, &module, &[h.into()])?;
    let go = instance.get_typed_func::<(), i32>(&mut store, "go")?;

    let result = go.call(&mut store, ())?;
    assert_eq!(result, 1);
    Ok(())
}

yields:

$ cargo run
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.08s
     Running `target/debug/repro`

thread 'main' (1071502) panicked at /home/alex/code/wasmtime2/crates/wasmtime/src/runtime/vm/traphandlers.rs:493:13:
assertion failed: self.unwind.replace(None).is_none()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

LLM-generated description, possibly wrong, is:

<details>

EntryStoreContext::enter_wasm saves the new stack_chain, breaking nested embedder→wasm calls during stack switching

Scope: crates/wasmtime/src/runtime/func.rs:1587-1606
(EntryStoreContext::enter_wasm),
crates/wasmtime/src/runtime/func.rs:1614-1629
(EntryStoreContext::exit_wasm),
crates/wasmtime/src/runtime/vm/traphandlers.rs:667-696
(CallThreadState::swap).

Severity: Stack switching is currently :work_in_progress: (work-in-progress) on
x86_64 Cranelift, so this is not a security issue per Wasmtime's tier
policy today. It is, however, a soundness bug that must be fixed
before stack switching can graduate to a stable feature: a guest that
does nothing more than call a normal host import — where the host
happens to call back into the same Store via the embedder API —
silently invalidates the running continuation's cx.stack_chain. The
same bug also affects any future stack-switching-aware fiber-suspension
path through CallThreadState::swap.

Reproducible: Yes. Test program in this folder
(enter-wasm-stack-chain-repro) exercises only public Wasmtime API
(Func::wrap, Caller::get_export, Func::typed, Func::call) plus
the gated wasm_stack_switching(true) config bit. Running with
RECURSIVE=1 produces Ok(0) instead of the expected Ok(1).

Summary

EntryStoreContext::enter_wasm is responsible for saving the
pre-entry values of various VMStoreContext fields so that
exit_wasm (run on Drop) and CallThreadState::swap (run during
async fiber suspension) can put the store back the way it was before
this wasm activation started. Five of the six fields it saves are
read without being written, so the saved value is the pre-entry
value as intended:

last_wasm_exit_pc:        *(*vm_store_context).last_wasm_exit_pc.get(),
last_wasm_exit_trampoline_fp: *(*vm_store_context).last_wasm_exit_trampoline_fp.get(),
last_wasm_entry_fp:       *(*vm_store_context).last_wasm_entry_fp.get(),
last_wasm_entry_sp:       *(*vm_store_context).last_wasm_entry_sp.get(),
last_wasm_entry_trap_handler: *(*vm_store_context).last_wasm_entry_trap_handler.get(),

stack_chain, however, is written before being read:

unsafe {
    let vm_store_context = store.0.vm_store_context();
    let new_stack_chain = VMStackChain::InitialStack(initial_stack_information);
    *vm_store_context.stack_chain.get() = new_stack_chain;            // (1) write

    Self {
        
        stack_chain: (*(*vm_store_context).stack_chain.get()).clone(), // (2) read AFTER (1)
        
    }
}

So Self.stack_chain ends up holding the new
InitialStack(initial_stack_information) rather than the value
stack_chain had on entry to enter_wasm. Both consumers of
Self.stack_chain (drop-time exit_wasm, async-fiber swap) write
that value back into the live vm_store_context, so any pre-entry
value (a Continuation(...) from being inside a resume body, or
the outer activation's InitialStack(...) for a nested host→wasm
call) is unrecoverable.

This is a regression from commit
192f2fcdadfec9d0cf6b58548a85a7307450cbf5 ("Replace setjmp/longjmp
usage in Wasmtime"). Before that commit enter_wasm did the read
first, then the write:

let stack_chain = (*store.0.vm_store_context().stack_chain.get()).clone();   // read FIRST
let new_stack_chain = VMStackChain::InitialStack(initial_stack_information);
*store.0.vm_store_context().stack_chain.get() = new_stack_chain;             // write AFTER

Self { , stack_chain,  }

The setjmp/longjmp removal commit collapsed the load+store into a
single expression but moved the store before the struct literal,
silently flipping the saved value from "previous" to "new".

What the contract is supposed to be

CallThreadState::swap (vm/traphandlers.rs:656-696) is the most
explicit description:

/// Swaps the state in this `CallThreadState`'s `VMStoreContext` with
/// the state in `EntryStoreContext` that was saved when this
/// activation was created.
///
/// This method is using during suspension of a fiber to restore the
/// store back to what it originally was and prepare it to be resumed
/// later on. … That restores a store to just before this activation
/// was called but saves off the fields of this activation to get
/// restored/resumed at a later time.

"Restores a store to just before this activation was called" can only
hold if Self.stack_chain is the value vm_store_context.stack_chain
had before enter_wasm ran. The same is true of exit_wasm,
which mirror-images swap for the synchronous path.

The reproducer

src/main.rs here uses only public Wasmtime API. The wasm module
contains:

The host import $h_import is Func::wrap-defined. When
RECURSIVE=1, it looks up the inner export through the
Caller<'_, T> and calls it via the embedder API. That recursive
embedder→wasm call is what triggers the inner
EntryStoreContext::enter_wasm/exit_wasm pair that overwrites the
running continuation's cx.stack_chain.

Compile with:

cargo build --release

Without recursion (h does nothing):

$ ./target/release/repro
[wasm trace] 10
[wasm trace] 1
[host] h called; recursive=false
[wasm trace] 2
[wasm trace] 12          ← suspend reached the handler
go returned: Ok(1)
OK: suspend correctly delivered to outer handler.

With recursion (h calls inner via embedder API):

$ RECURSIVE=1 ./target/release/repro
[wasm trace] 10
[wasm trace] 1
[host] h called; recursive=true
[wasm trace] 100         ← inner runs
[host] inner returned to host
[wasm trace] 2
[wasm trace] 11          ← resume took the "normal completion" path
go returned: Ok(0)
BUG OBSERVED: go returned Ok(0); …

Trace 12 is unreachable except via the suspend handler; trace 11
is unreachable except via "the resume completed without ever firing
$t". Adding nothing more than a recursive embedder call between the
two flips the outcome from one to the other — exactly the symptom
that follows from cx.stack_chain being clobbered to InitialStack
during inner's enter_wasm/exit_wasm pair, after which
search_handler's is_initial_stack short-circuit returns "no
match" and the (suspend $t) cannot find its outer handler.

(I verified in a temporarily-instrumented build that the inner
enter_wasm reads prev_chain = Continuation(...) and the
matching inner exit_wasm then writes
InitialStack(<inner csi pointer>) back into the live
VMStoreContext. That instrumentation is not part of the test
program shipped here.)

Suggested fix

Restore the pre-regression order and save the previous value of
stack_chain before writing the new one. The minimal patch:

unsafe {
    let vm_store_context = store.0.vm_store_context();
    let prev_stack_chain = mem::replace(
        &mut *vm_store_context.stack_chain.get(),
        VMStackChain::InitialStack(initial_stack_information),
    );

    Self {
        stack_limit,
        last_wasm_exit_pc:        *(*vm_store_context).last_wasm_exit_pc.get(),
        last_wasm_exit_trampoline_fp: *(*vm_store_context).last_wasm_exit_trampoline_fp.get(),
        last_wasm_entry_fp:       *(*vm_store_context).last_wasm_entry_fp.get(),
        last_wasm_entry_sp:       *(*vm_store_context).last_wasm_entry_sp.get(),
        last_wasm_entry_trap_handler: *(*vm_store_context).last_wasm_entry_trap_handler.get(),
        stack_chain:              prev_stack_chain,
        vm_store_context,
    }
}

This mirrors the way
[message truncated]

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2026 at 23:18):

alexcrichton added the wasm-proposal:stack-switching label to Issue #13322.


Last updated: Jun 01 2026 at 09:49 UTC