uweigand opened PR #2874 from s390x-backend
to main
:
This adds support for the IBM z/Architecture (s390x-ibm-linux).
The status of the s390x backend in its current form is:
- Wasmtime is fully functional and passes all tests on s390x.
- All back-end features supported, with the exception of SIMD.
- There is still a lot of potential for performance improvements.
- Currently the only supported processor type is z15.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin assigned PR #2874 to cfallin.
uweigand updated PR #2874 (assigned to cfallin) from s390x-backend
to main
.
uweigand updated PR #2874 (assigned to cfallin) from s390x-backend
to main
.
uweigand updated PR #2874 (assigned to cfallin) from s390x-backend
to main
.
cfallin submitted PR Review.
cfallin created PR Review Comment:
It might be good to have a note here about the fact that no FP is used; this is slightly different than e.g. what the ASCII-art doc comment in
abi_impl.rs
describes for vanilla/common ABI details. Specifically, documenting how the SP is decremented once, including the maximum-needed outgoing args space, would be helpful, I think.
cfallin created PR Review Comment:
AArch64-ism left in the comments here (but I think the same thing is true of this platform, namely that no addressing modes that modify registers are used?).
cfallin submitted PR Review.
cfallin created PR Review Comment:
Same comment here as above -- assert our assumption re: register range.
cfallin created PR Review Comment:
Can we assert that this is only used in a way that is consistent with the regalloc metadata we generate (if I understand correctly, that would be
rt2 == rt + 1
or similar)?
cfallin created PR Review Comment:
It might be useful to factor out helpers for bit-manipulation like this (this is a sign-extend IIUC, so maybe
sign_extend_to_u64
?); I know we've been far from consistent about this in the rest of the backends, but there is no better time to start :-)Same comment to the masking below, too (
mask_to_bits
or similar).
cfallin created PR Review Comment:
Should this be
Load64SExt8
? (I'm not sure if a 1-to-64-bit extension is common -- perhaps this arises inBextend
? -- but this caught my eye, and deserves a comment if correct to note why it differs)
cfallin created PR Review Comment:
This and the below load-op fusion cases are I think similar to this bug we saw on x64: #2576.
The fundamental issue was that we were lowering (and thus matching on the load and sinking it) a compare multiple times, whenever its flags output was used, because we do not handle flags the same way we handle other registers (and it looks like this backend is the same). The input-matching and sinking support in the core lowering logic assumes that one will only match and then sink once; in certain cases the repeats interact poorly with instruction ordering and cause undefined values to be used.
We really should come up with an impossible-to-do-by-construction way of encoding this restriction in the
LowerCtx
API, but until then, if this is also the case here (again, I'm pattern-matching on the bug but am not certain everything is the same in this backend!), the simplest fix is just to avoid load-op merging with compares.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
I would assume that
Load64SExt8
is wrong here too. The loaded bool is likely stored as a zero extended byte, not sign extended byte and as such it would need to be a load + arithmetic shift left of 63 bits I think.
uweigand updated PR #2874 (assigned to cfallin) from s390x-backend
to main
.
uweigand submitted PR Review.
uweigand created PR Review Comment:
OK, added comment.
uweigand submitted PR Review.
uweigand created PR Review Comment:
This comment is obsolete anyway (copied from a old version) - the current mapper design doesn't even distinguish between a pre- and post-map any more. I've removed the comment.
uweigand submitted PR Review.
uweigand created PR Review Comment:
No, it will actually be used with larger ranges. I've now just implemented the correct semantics for all cases.
uweigand submitted PR Review.
uweigand created PR Review Comment:
Here, we cannot implement the correct semantics, because the remapper will not map one contiguous range to another contiguous range. But here this really doesn't matter -- this code will never get called since these instructions are never present before register allocation. (The only user of these instructions is the prologue/epilogue code.)
uweigand submitted PR Review.
uweigand created PR Review Comment:
This is also moot. There actually are no 1-bit memory access instructions in cranelift, so this can never happen. I've removed all the 1-bit cases here.
uweigand created PR Review Comment:
OK, done.
uweigand submitted PR Review.
uweigand created PR Review Comment:
Good point. It actually depends on the caller -- if
lower_icmp_to_flags
is used to lower a plainIcmp
, then it is actually correct to sink memory loads (just like for any other instruction where we're sinking memory accesses). However,lower_icmp_to_flags
is also used to combine longer instruction sequences, and in those cases, sinking memory loads is only legal if all intermediate instructions only have a single user (there is currently no obvious way to check for that condition in the back-end, unfortunately). I've now added amay_sink_memory
flag to the routine, which is set by the caller (somewhat conservatively at the moment, this can be improved once we have support for better checks).
uweigand submitted PR Review.
uweigand edited PR Review Comment.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Right, I agree -- the issue is precisely when
icmp
's result is matched more than once as an operand of another instruction. The fix looks good -- thanks!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Ah, yes, this makes sense -- thanks.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Perfect, this is very helpful -- thank you!
cfallin submitted PR Review.
cfallin merged PR #2874.
Last updated: Jan 24 2025 at 00:11 UTC