cfallin opened PR #12061 from cfallin:patchable-abi-is-happy-to-preserve-all-your-registers to bytecodealliance:main:
This ABI is intended for use in scenarios where we want a very lightweight callsite that can be turned on and off by patching in one instruction. (The actual patchable call instruction is not in this PR; that will be a separate PR.)
The idea is that we define a call to clobber no registers -- not even the arguments! And we restrict signatures such that on all of our supported architectures, all arguments go into registers only. Those two requirements together mean that all callsites for this ABI should have only a raw call instruction, with no loads/stores to stackslots; and have the minimum possible impact on regalloc, by only imposing constraints on args to ensure they are in certain registers but not altering those registers.
Given this, we could implement, e.g., breakpoints with patchable callsites (off by default) at every sequence point in compiled code. In a typical use-case with Wasmtime-compiled Wasm, that would put a bunch of uses of vmctx constrained to the first argument register in every code path, but vmctx likely already sits there most of the time anyway (for any call to other Wasm functions or for libcalls). Thus, the impact is just the one instruction and nothing else.
This PR adds the calling convention itself and tests that show that two consecutive callsites can be compiled with no register setup re-occurring from one call to the next (thus demonstrating no clobbers).
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
cfallin requested wasmtime-compiler-reviewers for a review on PR #12061.
cfallin requested alexcrichton for a review on PR #12061.
cfallin commented on PR #12061:
(Switching to draft actually -- want to implement the callee side in this one too, sorry)
cfallin updated PR #12061.
cfallin has marked PR #12061 as ready for review.
cfallin commented on PR #12061:
OK, good to go now!
cfallin updated PR #12061.
cfallin updated PR #12061.
cfallin requested wasmtime-core-reviewers for a review on PR #12061.
cfallin submitted PR review.
cfallin created PR review comment:
One thing to note here: riscv64 now counts upward rather than downward when saving clobbers, because it has to worry about 16-aligning vector regs (which were never previously callee-saves) and this is simpler to do (and to match the size computation) with a forward order. This is why a bunch of riscv64 tests were reblessed.
Pulley copied this logic but (i) I didn't want to alter the special "pulley-managed clobber saves" stuff and (ii) I think its vector stores/loads allow unaligned access, unlike riscv64 in the base case.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Question for you, which I realize is basically unrelated to this PR -- this function feels like it's duplicated logic relative to
get_regs_clobbered_by_callwhere this more-or-less returns the inverse of the other. How come in the backends this has its own implementation vs having one function delegate to the other?
alexcrichton created PR review comment:
Similar question to aarch64 (or feel free to resolve with no comment if you respond to aarch64)
alexcrichton created PR review comment:
Similar to my question about aarch64, would it be possible to implement this method in terms of
get_regs_clobbered_by_call?
alexcrichton created PR review comment:
Similar question to aarch64 (or feel free to resolve with no comment if you respond to aarch64)
alexcrichton created PR review comment:
Similar question to aarch64 (or feel free to resolve with no comment if you respond to aarch64)
alexcrichton created PR review comment:
Is it worth clarifying here that the return register, if applicable, is clobbered? I realize that's sort of like a "system" register which isn't tracked by regalloc, but I believe that's the one register that's clobbered with this ABI? (not that it matters, we always save it in the prologue anyway)
fitzgen submitted PR review.
fitzgen created PR review comment:
Okay great, this is exactly what I was hoping would be here. We should document these constraints in the
ir::CallConvdoc comments for cranelift embedders tho.
fitzgen created PR review comment:
Alternatively it probably makes sense to restrict this calling convention to only as many arguments as fit in registers (portably?) and disallow returns. We can enforce this in CLIF validation.
cfallin created PR review comment:
Right, yep, this is documented in this doc-comment (see 11 lines up): "only up to four arguments of integer type, and no return values".
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Yes, I believe it is documented: from line 55 in
call_conv.rsI have/// The ABI is based on the native register-argument ABI on each /// respective platform, but puts severe restrictions on allowable /// signatures: only up to four arguments of integer type, and no /// return values. It does not support tail-calls, and disallows /// any extension modes on arguments.
cfallin submitted PR review.
cfallin created PR review comment:
That's a great question! They are mostly but not entirely the same: witness the beauty that is the aarch64 calling convention, where it is specified that half of the vector registers (the low/f64 half) is callee-saved while half is caller-saved. The result of that is that we have the special "caller and callee conventions match so ignore callsites" logic that gives us the right code in the common case, but we have to conservatively over-approximate otherwise.
In non-aarch64 cases I agree it could be refactored, though that's another cleanup project I'd rather defer till after this PR if that's OK :-)
fitzgen submitted PR review.
fitzgen created PR review comment:
D'oh -- thanks!
cfallin commented on PR #12061:
(There is something weird going on with riscv64 which I spent a bit of time trying to debug with qemu+gdb today -- will resolve that before this moves further)
cfallin updated PR #12061.
cfallin updated PR #12061.
cfallin updated PR #12061.
cfallin commented on PR #12061:
riscv64 issue was some weirdness with non-aligned stackslot area sizes causing problems with the new alignment of clobber-saves with the bottom-up ordering. I've reverted the changes that flipped the order to bottom-up.
cfallin has enabled auto merge for PR #12061.
cfallin merged PR #12061.
Last updated: Dec 06 2025 at 06:05 UTC