uweigand opened PR #10531 from uweigand:s390x-retval-loads to bytecodealliance:main:
This is a follow-on to https://github.com/bytecodealliance/wasmtime/pull/10502 implementing the same logic for s390x.
Like other platforms, we now no longer emit any instructions handling return values after the call instruction; instead, everything is done within a pseudo Call instruction. Unlike other platforms, this also has to handle vector lane swapping when calling between ABIs that mix BE and LE lane orders. The pseudo Call instruction needs enough type information to make this choice, therefore I had to add a type field to the RetLocation::Reg case (the ::Stack case already had one).
All other changes are contained within the s390x back-end. To simplify the implementation, the patch also implements a number of clean-ups:
- Introduce a MemArg::SpillOffset variant
- Remove the (unnecessary on this platform) island code size check
- Merge the CallInd instructions into the base Call instructions, using a new CallInstDest type to carry the call destination
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
uweigand requested cfallin for a review on PR #10531.
uweigand requested wasmtime-compiler-reviewers for a review on PR #10531.
uweigand commented on PR #10531:
Hi @cfallin , this is the s390x implementation of the new return-value logic. Note that I'm seeing some slight code-gen (performance) regressions, e.g. in
vec-abi-128.clif:; basr %r14, %r4 -; vl %v20, 0xb0(%r15) +; vl %v1, 0xb0(%r15) +; vst %v1, 0xc0(%r15) ; lgr %r2, %r6 -; vst %v20, 0(%r2) -; lmg %r6, %r15, 0xf0(%r15) +; vl %v19, 0xc0(%r15) +; vst %v19, 0(%r2) +; lmg %r6, %r15, 0x100(%r15) ; br %r14Looks like the register allocator does not allocate the vector return value into a register but to a spill slot, even though most registers would be available. Maybe the problem is that on the platform all vector registers are clobbered by calls, and therefore in the clobber list of the pseudo Call instruction. But that shouldn't prevent the allocator from chosing the register as output - not sure if this can be modeled ...
uweigand updated PR #10531.
bjorn3 commented on PR #10531:
A register can only be in the clobber list or the output list, not both at the same time.
github-actions[bot] commented on PR #10531:
Subscribe to Label Action
cc @cfallin, @fitzgen
<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64", "isle"Thus the following users have been cc'd because of the following labels:
- cfallin: isle
- fitzgen: isle
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
fitzgen submitted PR review:
LGTM but I will fully admit I am least familiar with s390x out of all of our backends.
Not sure if you also want to take a look at this, @cfallin.
cfallin commented on PR #10531:
Yep, this is on my queue as well; I'll make a pass over it. Thanks Ulrich for getting this together so quickly!
cfallin submitted PR review:
Looks great to me as well. Thanks!
cfallin merged PR #10531.
Last updated: Dec 06 2025 at 06:05 UTC