fitzgen labeled issue #4535:
This is a follow up to https://github.com/bytecodealliance/wasmtime/pull/4431
In that PR we don't save entry SP and exit FP/return pointer for calls into/out of components because they use a different set of trampolines. However, simply saving the entry SP and exit FP/return pointer isn't something we can simply add to the existing component trampolines because they are defined in CLIF and CLIF doesn't have a way to talk about these particular architecture-specific details. Mach insts do via operand constraints given to regalloc, but CLIF itself doesn't. So we would need to either have two layered trampolines that bounce from the first to the second when calling into / out of components (very not ideal) or we need to add an instruction to CLIF or something to grab the current SP/FP/return pointer (probably we should do this, but it requires some thought/design).
fitzgen opened issue #4535:
This is a follow up to https://github.com/bytecodealliance/wasmtime/pull/4431
In that PR we don't save entry SP and exit FP/return pointer for calls into/out of components because they use a different set of trampolines. However, simply saving the entry SP and exit FP/return pointer isn't something we can simply add to the existing component trampolines because they are defined in CLIF and CLIF doesn't have a way to talk about these particular architecture-specific details. Mach insts do via operand constraints given to regalloc, but CLIF itself doesn't. So we would need to either have two layered trampolines that bounce from the first to the second when calling into / out of components (very not ideal) or we need to add an instruction to CLIF or something to grab the current SP/FP/return pointer (probably we should do this, but it requires some thought/design).
fitzgen commented on issue #4535:
Also when fixing this we need to re-enable the
attempt_to_leave_during_malloc
component model test.
alexcrichton commented on issue #4535:
To elaborate a bit more on the issue here -- this will be a repeat for me/@fitzgen but wanted to write stuff down anyway.
The stack unwinding in #4431 relies on precisely knowing the stack pointer when we enter WebAssembly along with the frame pointer and last program counter when we exit WebAssembly. This is not generally available in Rust itself so we are relying on handwritten assembly trampolines for these purposes instead.
Entry into WebAssembly
Entry into WebAssembly happens via one of two routes:
- A "typed" route using the
wasmtime::TypedFunc
API or when invoking an core instance'sstart
function (which has a known fixed signature of no inputs and no outputs). In these cases Rust does an indirect call directly to the Cranelift-generated code for the corresponding wasm function.- An "untyped" route which is used by
wasmtime::Func::call
as well aswasmtime::component::{Func,TypedFunc}::call
. In this situation Rust will call a Cranelift-compiled trampoline. The Cranelift trampoline will load arguments from a stack parameter and then make an indirect call to the actual Cranelift-compiled wasm function which is also supplied as an argument.Today this all records the entry stack pointer via the
host_to_wasm_trampoline
defined in inline assembly. Concretely Wasmtime will "prepare" an invocation which stores the Cranelift-generated function to call (be it a raw function in case (1) or a trampoline for case (2)) into theVMContext::callee
field and then invoke thehost_to_wasm_trampoline
inline asm symbol.This entry isn't too relevant to the component model since we're already doing what's necessary for the stack unwinding, recording the sp on entry. Nevertheless I want to describe the situation so I want to describe some oddities here as well:
- The actual trampoline used in (2) to load arguments from the stack is not actually always defined by Cranelift. Instead sometimes it's a monomorphized Rust function
host_to_wasm_trampoline
from theFunc::wrap
API. This means we unfortunately cannot rely on Cranelift to supply all these trampolines which means we can't rely on the trampolines to do things that Rust itself can't do.- The entry trampoline currently requires the ability to tail-call to the actual callee. This is a technical limitation due to using the exact same trampoline for every single entry point, regardless of signature.
Ideally we would always enter WebAssembly via a Cranelift-compiled trampoline. That would mean we could do anything in the trampoline that Cranelift would do and ideally remove the need to have inline asm for this. We might still need multiple trampolines for untyped entry points and typed entry points, but overall we should ideally be able to do better here.
Exiting WebAssembly
Exiting back to the host happens in a few locations, and this is the focus of this issue where it's missing support in the component model:
- Exiting from core wasm will either end up in something defined by
Func::wrap
orFunc::new
(roughly). Both of these use aVMHostFunctionContext
which internally has two function pointers. One is theVMCallerCheckedAnyfunc
which wasm actually calls and the other is the actual host function pointer defined in Rust being invoked. The function pointer contained within theVMCallerCheckedAnyfunc
is a trampoline written in inline assembly which spills the fp/pc combo intoVMRuntimeLimits
. The function pointer to invoke contained within theVMHostFunctionContext
has the "system-v ABI" since it receives arguments in native platform registers. ForFunc::wrap
this is a Rust function and forFunc::new
this is a Cranelift-generated trampoline which spills arguments to the stack and then calls a static address specified at compile time (usingFunc::new
requires Cranelift at runtime).- Exiting from a component will always exits via a lowered host function. Concretely what happens is that a
VMComponentContext
has an arraylowering_anyfuncs: [VMCallerCheckedAnyfunc; component.num_lowerings]
. This array is what core wasm actually calls and is exclusively populated by Cranelift-compiled trampolines (viacompile_lowered_trampoline
). These trampolines are similar to the Cranelift-compiled trampolines forFunc::new
but call a host function of type signatureVMLoweringCallee
. This is where fp/pc are not recorded while we exit wasm. There's not clear way to use the same trick asFunc::{wrap,new}
which have a singular inline asm trampoline for all signatures since the callee to defer to depends on theLoweringIndex
.- Finally exiting wasm can also happen via libcalls implemented in Wasmtime. Currently each libcall gets a unique inline-asm-defined trampoline that records the pc/fp combo and then does a direct tail-call to the actual libcall itself.
Proposal to fix this issue
Overall I find the current trampoline story as pretty complicated and also pretty inefficient. There's typically at least one extra indirect call for all of these transitions and additionally there's very little cache-locality. The fix I'm going to propose here isn't a silver bullet though and will only solve some issues, but I think is still worth pursuing.
I think we should add few new pseudo-instructions to Cranelift:
- Something to get the current frame pointer
- Something to get the current stack pointer
- Something to get the return address of the current function
- Something to get the address of a label in a function (this may already exist, not sure)
With these tools we can start trying to eventually move all of the trampolines above to Cranelift exclusively and remove both Rust-defined and inline-asm defined trampolines:
- For components, and this issue,
compile_lowered_trampoline
could be updated to use the cranelift instructions to record the pc/fp combo into theVMRuntimeLimits
. This would remove the need for any extra trampoline when exiting a component and would solve the issue at hand.- For libcalls we could use the cranelift instructions to manually save fp/pc just before a libcall out to the runtime. This would remove all trampolines related to libcalls.
- For
Func::new
the cranelift-generated trampoline could act similar tocompile_lowered_trampoline
and store the fp/pc combo toVMRuntimeLimits
and avoid the need for two trampolines.- Untyped host-to-wasm trampolines could do the sp-saving internally rather than relying on the external trampoline to do so.
Those are at least the easy ones we could knock out with more Cranelift features. Otherwise there are still a number of places that we are requiring trampolines:
- Exit trampolines with
Func::wrap
could ideally be generated by Cranelift but would still require two indirect calls. One call to get to the trampoline from the original core wasm and then a second call from the trampoline to the host function itself. The main problem here is getting a trampoline. Assuming trampolines are provided by Cranelift then they become available at runtiem when modules are loaded, which meansFunc::wrap
needs to, at some point, dynamically look up a trampoline and find a corresponding one in a previous module's compiled image. This is not trivial.- Entry trampolines to
TypedFunc
are similarly somewhat nontrivial, but I think surmountable. Today aStore
has a registry of untyped trampolines per-function signature, and I think it could also have a registry of typed trampolines per-function signature. This typed trampoline would then be used to enter wasm instead of today's calling the raw wasm function. In this situation the callee would be passed as an argument to the trampoline in the same manner untyped trampolines receive the callee.
Anyway that's a long winded way of saying that we need a few cranelift instructions to modify
compile_lowered_trampoline
to fix the original issue here. I do not want to lose sight of how complicated our trampoline story is today though. We're already taking a hit to call overhead into and out of wasm as part of #4431 which we have no means of recovering right now, and I think reducing the trampolines in play and focusing more on Cranelift-generated trampolines is the way forward (e.g. inlining two trampolines into one). Otherwise I also think we will need fancier trampolines for other features such as the out-of-band fuel checking (requires a pinned register) and exceptions (which may require before/after stuff in the trampoline instead of just "before stuff" they do today).
fitzgen commented on issue #4535:
You kind of mentioned this above, but to be super explicit: the hard part in my mind is deciding what we want to do when
- the
cranelift
feature is not enabled, so we don't have a JIT at our disposal,- and then the embedder does
let f = Func::wrap(...); f.call(...)
In this scenario, there is no already-compiled Wasm module for us to pluck trampolines from, and because we don't have a JIT available, we can't just create the necessary trampolines.
But also, in this scenario we don't actually need any trampolines because there isn't actually any Wasm involved (in #4431, this would show up as an empty contiguous sequence of Wasm frames). So maybe we can somehow relax things a bit (waves hands) to allow skipping the trampolines when both caller and callee are the host?
If one of caller or callee was Wasm, then we would be able to use trampolines from that Wasm. We would just need to figure out how we would lazily connect the trampolines to the
Func
if theFunc::wrap
happened before the Wasm module was loaded into the engine.
But yeah, agreed that we should simplify and improve our trampolines story, but this issue was originally supposed to just track support for saving entry SP and exit FP/return pointer for component trampolines at all. Might need to split this into two issues.
alexcrichton commented on issue #4535:
Definitely agreed on that I went overboard and should split this to a separate issue, while we're here talking about it though the other issue we identified was
FuncRef::from(Func::wrap(...))
because right now aFuncRef
is a glorified*mut VMCallerCheckedAnyfunc
which is "ready to be called by wasm" and that's not possible to do with a statically available trampoline today since wasm, if it calls thefuncref
, must call the trampoline which we won't have until thatFuncRef
makes its way into a module.(I know
FuncRef
isn't really a type in Wasmtime but it's basically that we currently have to be able to get a*mut VMCallerCheckedAnyfunc
from aFunc
at any time which isn't possible if trampolines are required to be inModule
images)
alexcrichton closed issue #4535:
This is a follow up to https://github.com/bytecodealliance/wasmtime/pull/4431
In that PR we don't save entry SP and exit FP/return pointer for calls into/out of components because they use a different set of trampolines. However, simply saving the entry SP and exit FP/return pointer isn't something we can simply add to the existing component trampolines because they are defined in CLIF and CLIF doesn't have a way to talk about these particular architecture-specific details. Mach insts do via operand constraints given to regalloc, but CLIF itself doesn't. So we would need to either have two layered trampolines that bounce from the first to the second when calling into / out of components (very not ideal) or we need to add an instruction to CLIF or something to grab the current SP/FP/return pointer (probably we should do this, but it requires some thought/design).
Last updated: Dec 23 2024 at 13:07 UTC