beetrees opened PR #10691 from beetrees:f16-f128-s390x-mvp to bytecodealliance:main:
This PR adds initial support for passing
f16andf128values around to the s390x backend. Support is added for theload,store,bitcast,f16constandf128constCLIF instructions.Note that the s390x ABI specification currently does not specify the ABI for
f16. However, Clang recently added support forf16in https://github.com/llvm/llvm-project/pull/109164 (as opposed to LLVM just supporting it at the LLVM IR level) using a straightforward extrapolation of the ABI (passingf16in floating point registers just likef32andf64), so on that basis I've not put thef16ABI behind theenable_llvm_abi_extensionssetting.
f16/f128issue: https://github.com/bytecodealliance/wasmtime/issues/8312
beetrees requested wasmtime-compiler-reviewers for a review on PR #10691.
beetrees requested cfallin for a review on PR #10691.
cfallin commented on PR #10691:
cc @uweigand -- would you mind reviewing the s390x backend changes here? (Thanks!)
github-actions[bot] commented on PR #10691:
Subscribe to Label Action
cc @cfallin, @fitzgen
<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:machinst", "isle"Thus the following users have been cc'd because of the following labels:
- cfallin: isle
- fitzgen: isle
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
uweigand commented on PR #10691:
Thanks for working on s390x! For f128, current hardware actually has two different ways of representing them - there's the "traditional" way of splitting an f128 value across a pair of 64-bit floating-point registers, but since z14 we can also hold f128 values in a single vector register, with a full set of arithmetic operations available.
This patch is based on the traditional FPR pair approach, but I would much prefer using single vector registers, for several reasons:
- z14 is the minimum architecture level for cranelift, so the VR-based instructions are always available
- Handling a single VR is much simpler than handling a pair of FPRs
- Specifically, while your patch hasn't run into this problem yet, once you actually want to use any of the traditional f128 arithmetic instructions, those only operate on architected pairs (N, N+2), not any two random FPRs. But the cranelift regalloc actually isn't capable of allocating pairs, so those would have to be hardcoded everywhere
Note that the one place where we'd still require pairs (even when using VRs everywhere else) is for function arguments and returns, as the ABI was defined for older systems. But for the ABI we have to hard-code registers anyway, so that should be much less of an issue. (For our own non-system ABIs, we could even chose to pass f128 in VR as well.)
LLVM (and GCC) will also use VRs to hold f128 if you build with -march=z14 or higher.
beetrees updated PR #10691.
beetrees updated PR #10691.
beetrees commented on PR #10691:
Done.
f128is now stored in a single vector register.Note that the one place where we'd still require pairs (even when using VRs everywhere else) is for function arguments and returns, as the ABI was defined for older systems. But for the ABI we have to hard-code registers anyway, so that should be much less of an issue. (For our own non-system ABIs, we could even chose to pass f128 in VR as well.)
I'm unsure what you mean by this; according to the s390x ABI specification
f128is always passed and returned indirectly.
uweigand commented on PR #10691:
Done.
f128is now stored in a single vector register.Thanks, I'll have a closer look shortly.
I'm unsure what you mean by this; according to the s390x ABI specification
f128is always passed and returned indirectly.Sorry, I must have gotten confused. You're correct, of course.
uweigand commented on PR #10691:
OK, this all LGTM now. Thanks again!
cfallin submitted PR review:
Thanks for the review. @uweigand! Giving this an approval based on that, and merging.
cfallin submitted PR review:
Thanks for the review, @uweigand! Giving this an approval based on that, and merging.
cfallin commented on PR #10691:
@alexcrichton it looks like the merge queue checks failed on a diff of a C++ header -- IIRC you had worked on this recently?
alexcrichton commented on PR #10691:
I think that was a failure to download
wasm.hhand it manifested as a weird error, so I'm going to re-queue, but I could very well be wrong...
alexcrichton merged PR #10691.
tgross35 commented on PR #10691:
For posterity, the most recently published principles of operation doc now specifies
f16https://www.ibm.com/docs/en/module_1678991624569/pdf/SA22-7832-14.pdf
uweigand commented on PR #10691:
For posterity, the most recently published principles of operation doc now specifies
f16https://www.ibm.com/docs/en/module_1678991624569/pdf/SA22-7832-14.pdfTo be clear, this was already in the previous (z16) version. Both for z16 and z17, the "BFP tiny" (
f16) format is only used in a few special operations involving the AI Unit (NNPA and related instructions); there is no general arithmetic (or even conversion) support for this data type.
Last updated: Dec 06 2025 at 07:03 UTC