cfallin opened PR #12148 from cfallin:clobber-not-lest-ye-be-clobbered to bytecodealliance:main:
It turns out that the s390x ABI is special wrt our others: the s390x System-V ABI provides an area to all callees to save clobbered GPRs. However this is only large enough for r6-r15, the callee-saves in the standard ABI.
The implementation implicitly assumed this, and when I adjusted the definition of clobbered registers for the
patchableABI, stating that r0-r15 are clobbered instead, the code happily generated a store-multiple (stmg) instruction that saved too much data in too little space, overwriting other bits of the stack.Also, the clobber-save/restore sequence code only saved the bottom 64 bits of clobbered vector/float registers (which are 128 bits), implicitly encoding the fact that the SysV ABI specifies only the bottom half as callee-saved.
This PR implements
patchableproperly on s390x by open-coding the sequences to save all vector registers and r0-r5 in the explicit clobber-save area (still using the SysV-defined one for r6-r15).<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
cfallin requested alexcrichton for a review on PR #12148.
cfallin requested wasmtime-compiler-reviewers for a review on PR #12148.
cfallin commented on PR #12148:
Note that this is spun out of #12133, and makes tests pass there, in lieu of hacking up a Cranelift runtest harness for editing patchable code here (the failure would only be observable with large enough code to cause unexpected register clobbers anyway).
alexcrichton submitted PR review:
cc @uweigand on this as well, you're probably a more apt reviewer than I.
I'll go ahead and approve though to help unblock https://github.com/bytecodealliance/wasmtime/pull/12133 and I'm assuming follow-ups here are reasonable and feasible
uweigand submitted PR review:
We do have 32 VRs, so if the point is to save them all, this is not correct (it only saves 16). Apart from that LGTM from a correctness perspective.
To improve performance, we could use a second STORE MULTIPLE (STMG) for GPRs 0..5, and in fact we could use a VECTOR STORE MULTIPLE (VSTM) - not yet used in cranelift elsewhere - for VRs 0..31. That would certainly reduce code size, and also improve execution time a bit.
cfallin updated PR #12148.
cfallin commented on PR #12148:
Ah, yep, sorry, I forgot that we have 32 vector registers on this platform. Thanks for the review!
I've updated to use
STMG/LMG; opting not to try to add vector store/load multiple for now since the encoding isn't there but we can do that as followup if needed. (As we discussed in the Cranelift weekly, this isn't too performance-critical on the callee side; in Wasmtime, at least, the save/restore sequence will execute only when hitting a breakpoint.)
cfallin has enabled auto merge for PR #12148.
cfallin merged PR #12148.
Last updated: Dec 13 2025 at 19:03 UTC