alexcrichton opened issue #13298:
This input:
;;! stack_switching = true ;;! exceptions = true ;;! function_references = true (module (type $ft (func)) (tag $t (type $ft)) (type $ct (cont $ft)) (func $callee (suspend $t)) (elem declare func $callee) (func (export "go") (local $k (ref null $ct)) (local.set $k (cont.new $ct (ref.func $callee))) (block $h (result (ref null $ct)) (resume $ct (on $t $h) (local.get $k)) (unreachable) ) (drop) (unreachable) ) ) (assert_trap (invoke "go") "unreachable")fails with:
$ cargo run --release wast ./reports/001-stack-switching-stale-trap-handler/repro.wast -Wgc,exceptions,function-references,stack-switching Finished `release` profile [optimized] target(s) in 0.12s Running `target/release/wasmtime wast ./reports/001-stack-switching-stale-trap-handler/repro.wast -Wgc,exceptions,function-references,stack-switching` zsh: segmentation fault (core dumped) cargo run --release wast -Wgc,exceptions,function-references,stack-switchingAn LLM-generated summary, possibly incorrect, of this issue is:
<details>
Stack switching: parent-stack trap after
resumereads stalelast_wasm_entry_sp/last_wasm_entry_trap_handlerScope:
crates/wasmtime/src/runtime/vm/stack_switching.rs,
crates/cranelift/src/func_environ/stack_switching/instructions.rs,
crates/wasmtime/src/runtime/vm/traphandlers.rs.Severity: Crash (SIGSEGV) on a code path that should produce a clean wasm
trap. Stack switching is currently :work_in_progress: (work-in-progress) on x86_64 Cranelift,
so this is not yet a security issue per the stability tiers, but it is a
soundness/runtime bug that must be fixed before stack switching can graduate
to a tier-1 feature.Required configuration:
Config::wasm_stack_switching(true)(and its
prerequisites:wasm_function_references(true),wasm_exceptions(true)).
The bug only manifests onunix + x86_64, which is the only platform on
which stack switching currently compiles.Summary
VMStackLimits(the per-stack snapshot ofVMStoreContexttaken on
stack_switch) only containsstack_limitandlast_wasm_entry_fp. It is
missinglast_wasm_entry_spandlast_wasm_entry_trap_handler. As a
result, when a continuation runs and then hands control back to its parent
(viasuspendor by returning normally),VMStoreContext.last_wasm_entry_sp
andVMStoreContext.last_wasm_entry_trap_handlerstill hold the values that
were written by the continuation's array-to-wasm trampoline. Those values
point into the (now suspended or torn-down) continuation's stack frame.The next time the parent's wasm traps via a hardware signal (e.g.
unreachable→ SIGILL, OOB memory access → SIGSEGV), the wasmtime signal
handler readsentry_trap_handler()and uses those stalesp/pcvalues
to set RSP and RIP viastore_handler_in_ucontext. The kernel resumes the
process with RSP and RIP pointing into the continuation's stack while RBP
points into the parent's stack. The result is an immediate SIGSEGV (the
observed symptom in the reproducer) or, depending on what is left in the
continuation's stack, silent corruption or a confused stack-switch back to
the parent that swallows the trap.The broken invariant
The contract — written into the doc comments of
write_limits_to_vmcontextandload_limits_from_vmcontext— says that on
resume/suspend,last_wasm_entry_spis saved and restored along with
stack_limit:crates/cranelift/src/func_environ/stack_switching/instructions.rs:718 /// Sets `last_wasm_entry_sp` and `stack_limit` fields in /// `VMRuntimelimits` using the values from the `VMStackLimits` of this /// object. pub fn write_limits_to_vmcontext<'a>(...)crates/cranelift/src/func_environ/stack_switching/instructions.rs:1343 // Note that the resume_contref libcall a few lines further below // manipulates the stack limits as follows: // 1. Copy stack_limit, last_wasm_entry_sp and last_wasm_exit* values from // VMRuntimeLimits into the currently active continuation (i.e., the // one that will become the parent of the to-be-resumed one) // // 2. Copy `stack_limit` and `last_wasm_entry_sp` in the // `VMStackLimits` of `resume_contref` into the `VMRuntimeLimits`.But the actual
VMStackLimitsstruct only holds two fields:crates/wasmtime/src/runtime/vm/stack_switching.rs:73 #[repr(C)] #[derive(Debug, Default, Clone)] pub struct VMStackLimits { /// Saved version of `stack_limit` field of `VMStoreContext` pub stack_limit: usize, /// Saved version of `last_wasm_entry_fp` field of `VMStoreContext` pub last_wasm_entry_fp: usize, }…and the cranelift lowering of
write_limits_to_vmcontextand
load_limits_from_vmcontextonly copies those two fields:crates/cranelift/src/func_environ/stack_switching/instructions.rs:746-756 let pointer_size = u8::try_from(env.pointer_type().bytes()).unwrap(); let stack_limit_offset = env.offsets.ptr.vmstack_limits_stack_limit(); let last_wasm_entry_fp_offset = env.offsets.ptr.vmstack_limits_last_wasm_entry_fp(); copy_to_vm_runtime_limits( stack_limit_offset, pointer_size.vmstore_context_stack_limit(), ); copy_to_vm_runtime_limits( last_wasm_entry_fp_offset, pointer_size.vmstore_context_last_wasm_entry_fp(), );
last_wasm_entry_spandlast_wasm_entry_trap_handler, however, are
written by the array-to-wasm trampoline every time wasm is entered:crates/cranelift/src/compiler.rs:1700-1726 (save_last_wasm_entry_context) let fp = builder.ins().get_frame_pointer(pointer_type); builder.ins().store(MemFlags::trusted(), fp, vm_store_context, ptr_size.vmstore_context_last_wasm_entry_fp()); let sp = builder.ins().get_stack_pointer(pointer_type); builder.ins().store(MemFlags::trusted(), sp, vm_store_context, ptr_size.vmstore_context_last_wasm_entry_sp()); let trap_handler = builder.ins() .get_exception_handler_address(pointer_type, block, 0); builder.ins().store(MemFlags::trusted(), trap_handler, vm_store_context, ptr_size.vmstore_context_last_wasm_entry_trap_handler());
fiber_start(which runs on every continuation's stack just before the
continuation's wasm body) reaches the wasm body via
VMFuncRef::array_call, which goes through that trampoline:crates/wasmtime/src/runtime/vm/stack_switching/stack/unix.rs:298 unsafe extern "C" fn fiber_start( func_ref: *mut VMFuncRef, caller_vmctx: *mut VMContext, args: *mut VMHostArray<ValRaw>, return_value_count: u32, ) { ... VMFuncRef::array_call(func_ref, None, caller_vmxtx, params_and_returns); ... }So the timeline of
VMStoreContext.last_wasm_entry_{sp,fp,trap_handler}is:
Host enters wasm via
array_callon the parent stack. Trampoline writes
parent_sp,parent_fp,parent_trap_pctoVMStoreContext.Parent wasm executes
resume. Cranelift IR saves the parent's
last_wasm_entry_fpintoparent_csi(line 1366) and overwrites
VMStoreContext.last_wasm_entry_fpwith the resumed continuation's
value (line 1367).last_wasm_entry_spandlast_wasm_entry_trap_handler
are not touched here.
stack_switchto the continuation's stack.wasmtime_continuation_start
runsfiber_start→VMFuncRef::array_call→ array trampoline. The
trampoline writescont_sp,cont_fp,cont_trap_pcto
VMStoreContext.Continuation wasm runs and either suspends (back into the parent's
resumeIR) or returns (back into the parent'sresumeIR). Either
path reaches code that callsparent_csi.write_limits_to_vmcontext
(lines 1477 and 1586). Onlystack_limitandlast_wasm_entry_fpare
restored.Parent wasm continues.
VMStoreContext.last_wasm_entry_spis still
cont_sp.VMStoreContext.last_wasm_entry_trap_handleris still
cont_trap_pc.Parent wasm traps. The signal handler in
signals.rs:163-185calls
info.test_if_trap(...)which callsset_jit_trapfollowed by
entry_trap_handler(traphandlers.rs:953-961):
rust pub(crate) fn entry_trap_handler(&self) -> Handler { unsafe { let vm_store_context = self.vm_store_context.get().as_ref(); let fp = *vm_store_context.last_wasm_entry_fp.get(); let sp = *vm_store_context.last_wasm_entry_sp.get(); let pc = *vm_store_context.last_wasm_entry_trap_handler.get(); Handler { pc, sp, fp } } }This returns
Handler { pc: cont_trap_pc, sp: cont_sp, fp: parent_fp }
— three values from two different stacks.
store_handler_in_ucontextwrites those into the kernel'sucontext,
so the kernel resumes the process withRSP=cont_sp,RBP=parent_fp,
RIP=cont_trap_pc.The net effect is a longjmp to a PC in the continuation's array trampoline
exception block, but with RBP from a different stack. Pushes/pops via RSP
go to the continuation's stack while local-variable accesses via [RBP] go
to the parent's. In the simplest case the very first such access (or the
trampoline's epilogue pop) faults, which is what the reproducer below
exhibits.Reproducer
repro.wast(preferred form):;;! stack_switching = true ;;! exceptions = true ;;! function_references = true (module (type $ft (func)) (tag $t (type $ft)) (type $ct (cont $ft)) (func $callee (suspend $t)) (elem declare func $callee) (func (export "go") (local $k (ref null $ct)) (local.set $k (cont.new $ct (ref.func $callee))) (block $h (result (ref null $ct)) (resume $ct (on $t $h) (local.get $k)) (unreachable) ) (drop) (unreachable) ) ) (assert_trap (invoke "go") "unreachable")A standal
[message truncated]
alexcrichton added the wasm-proposal:stack-switching label to Issue #13298.
Last updated: Jun 01 2026 at 09:49 UTC