fitzgen opened PR #6262 from fitzgen:trampoline-overhaul
to bytecodealliance:main
:
This commit splits
VMCallerCheckedFuncRef::func_ptr
into three new function pointers:VMCallerCheckedFuncRef::{wasm,array,native}_call
. Each one has a dedicated calling convention, so callers just choose the version that works for them. This is as opposed to the previous behavior where we would chain together many trampolines that converted between calling conventions, sometimes up to four on the way into Wasm and four more on the way back out. See [0] for details.Thanks to @bjorn3 for the initial idea of having multiple function pointers for different calling conventions.
This is generally a nice ~5-10% speed up to our call benchmarks across the board: both Wasm-to-host and host-to-Wasm. The one exception is typed calls from Wasm to the host, which have a minor regression. We hypothesize that this is because the old hand-written assembly trampolines did not maintain a call frame and do a tail call, but the new Cranelift-generated trampolines do maintain a call frame and do a regular call. The regression is only a couple nanoseconds, which seems well-explained by these differences explain, and ultimately is not a big deal.
However, this does lead to a ~5% code size regression for compiled modules. Before, we compiled a trampoline per escaping function's signature and we deduplicated these trampolines by signature. Now we compile two trampolines per escaping function: one for if the host calls via the array calling convention and one for it the host calls via the native calling convention. Additionally, we compile a trampoline for every type in the module, in case there is a native calling convention function from the host that we
call_indirect
of that type. Much of this is in the.eh_frame
section in the compiled module, because each of our trampolines needs an entry there. Note that the.eh_frame
section is not required for Wasmtime's correctness, and you can disable its generation to shrink compiled module code size; we just emit it to play nice with external unwinders and profilers. We believe there are code size gains available for follow up work to offset this code size regression in the future.Backing up a bit: the reason each Wasm module needs to provide these Wasm-to-native trampolines is because
wasmtime::Func::wrap
and friends allow embedders to create functions even when there is no compiler available, so they cannot bring their own trampoline. Instead the Wasm module has to supply it. This in turn means that we need to look up and patch in these Wasm-to-native trampolines during roughly instantiation time. But instantiation is super hot, and we don't want to add more passes over imports or any extra work on this path. So we integrate withwasmtime::InstancePre
to patch these trampolines in ahead of time.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
fitzgen requested abrown for a review on PR #6262.
fitzgen requested wasmtime-compiler-reviewers for a review on PR #6262.
fitzgen requested pchickey for a review on PR #6262.
fitzgen requested wasmtime-core-reviewers for a review on PR #6262.
fitzgen requested wasmtime-default-reviewers for a review on PR #6262.
fitzgen requested alexcrichton for a review on PR #6262.
fitzgen updated PR #6262.
fitzgen updated PR #6262.
fitzgen updated PR #6262.
fitzgen updated PR #6262.
fitzgen updated PR #6262.
alexcrichton submitted PR review.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
I feel like I'm surely going to forget to write this down, so could you update
RELEASES.md
in this PR with the various changes made? I'm imagining the## Changed
section for this for example.
alexcrichton created PR review comment:
Could this construction of the callee be deduplicated with the above? (and perhaps with
call
too)
alexcrichton created PR review comment:
Mind adding a comment why these are added? Or otherwise leaving them out of this PR?
alexcrichton created PR review comment:
Should this perhaps be
#[doc(hidden)]
? (this'll end up in thewasmtime
crate)
alexcrichton created PR review comment:
Could this be preemptively extracted to a helper function?
alexcrichton created PR review comment:
I think this may be the s390x issue as this'll want to be a native-endian load
alexcrichton created PR review comment:
Mind adding a note here about this being sorted as well?
alexcrichton created PR review comment:
TODO
alexcrichton created PR review comment:
To cut down on the encoding space necessary for this, could these two arrays get fused? And to go a bit further could this perhaps be stored in the
funcs
array directly?
alexcrichton created PR review comment:
It's probably ok to jettison this now
alexcrichton created PR review comment:
Could this and the two types below have some more documentation about the ABI? E.g. for the structs below a definition of the ABI and for this function pointer a definition of what each of the arguments are?
alexcrichton created PR review comment:
Could the inner field here be made private?
alexcrichton created PR review comment:
Looks like this file became executable?
alexcrichton created PR review comment:
The large doc block above about why this can be
None
I think might be good to insert here instead?
alexcrichton created PR review comment:
Could the reformatting here and below be undone?
alexcrichton created PR review comment:
Shouldn't this always be
Some(...)
by the time it reaches theVMFunctionImport
phase?
alexcrichton created PR review comment:
I worry about this because it can accidentally paper over non-send/sync in
kind: FuncKind
andty: Option<Box<FuncType>>
. The latter is probably tested elsewhere but there's not a ton of protection here againstFuncKind
.I don't have a great way to solve this other than unergonomic wrappers, but I'd probably still prefer that to the impl here on the whole struct.
alexcrichton created PR review comment:
I know we split this into a two-phase operation, but given that the push/fill is always back-to-back it may be better to just remove this
fill_func_refs
perhaps? (folding it into the push operation that is)
alexcrichton created PR review comment:
In retrospect given that this isn't really perf critical or used all that often, perhaps this would be better as a simple
Vec<Box<....>>
?
alexcrichton created PR review comment:
I think the only use case for this is the profilers, and they don't have a great use for this. The profilers already have a lot of duplication of looking at all of these trampolines which isn't great, too.
This is fine to defer to another PR, but I wanted to write this down: these should all get removed and all the profilers should switch to using this logic which generically reads an ELF file and registers everything. There's specific support for DWARF in the jitdump profiler but I think that's all best removed at this point.
Again, future PR, just want to leave a reminder to myself to do this or open an issue later.
alexcrichton created PR review comment:
Like with
FuncData
, this feels a bit too broad to me. Could these impls be placed onVMCallerCheckedFuncRef
instead?
alexcrichton created PR review comment:
cc @saulecabrera we opted to leave these as
todo!()
in winch, but this effectively breaks compilation with winch in Wasmtime (and tests are additionally disabled lower in this PR).Is that ok with you though?
alexcrichton created PR review comment:
Like the above comments, I think this should be removed in favor of something more targeted ideally.
alexcrichton created PR review comment:
Also, could you add documentation about this parameter? Mostly that this isn't "number of arguments" but rather "capacity of the
args_and_results
area" which I realize is perhaps obvious given the name but is not necessarily obvious given that this is generally a "call a function" function where one typically expects to pass a list of arguments.
alexcrichton created PR review comment:
Or should we perhaps
abort()
if this ever gets seen?
saulecabrera created PR review comment:
That's ok with me, thanks for the heads up. I'll work on the new version of the trampolines for winch once this PR lands.
fitzgen created PR review comment:
Because we don't have a trampoline for every function, these have to become
Option
s in thefuncs
array, and while it is a code size win, it is only 0.0160605% savings onspidermonkey.wasm
. Nonetheless I kind of like the rearranged structure anyways, since looking up a trampoline is an O(1) operation now.
fitzgen created PR review comment:
I'll do this in a follow up.
fitzgen created PR review comment:
Eh, I'd rather just keep it around until we completely remove the asm trampolines.
fitzgen created PR review comment:
Oh good call, because the case that the above comment is talking about can't happen with function imports, only
funcref
s.
fitzgen created PR review comment:
Ah unfortunately the borrows don't really work out becuase
store
is already borrowed immutably for thefunc_refs()
method and so we can't also get and pass in the store's module registry.
fitzgen created PR review comment:
Eh, it is still used whenever passing
funcref
s into Wasm via any method other than function imports that are pre-resolved inInstancePre
. That is a lot of paths, even if it isn't the hottest path. Since we've already got it, I'm inclined to leave it.
fitzgen created PR review comment:
Ah yes, sorry I meant to tag you @saulecabrera but I forgot. Thanks for your help getting this PR landed, and willingness to work on follow ups!
fitzgen updated PR #6262.
fitzgen updated PR #6262.
alexcrichton submitted PR review.
fitzgen has enabled auto merge for PR #6262.
fitzgen updated PR #6262.
fitzgen merged PR #6262.
Last updated: Dec 23 2024 at 12:05 UTC