elliottt opened PR #5073 from trevor/ssa-s390x
to main
:
Remove uses of
reg_mod
from the s390x backend. This required moving away from usingr0
/r1
as the result registers from a few different pseudo instructions, standardizing instead onr2/r3
. That change was necessary as regalloc2 will not correctly allocate registers that aren't listed in the allocatable set, whichr0
/r1
are not.
<!--Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
elliottt updated PR #5073 from trevor/ssa-s390x
to main
.
elliottt edited PR #5073 from trevor/ssa-s390x
to main
:
Remove uses of
reg_mod
from the s390x backend. This required moving away from usingr0
/r1
as the result registers from a few different pseudo instructions, standardizing instead onr2/r3
. That change was necessary as regalloc2 will not correctly allocate registers that aren't listed in the allocatable set, whichr0
/r1
are not.cc @uweigand, this is the same set of changes you and @cfallin iterated on in #4856.
<!--Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
elliottt has marked PR #5073 as ready for review.
uweigand submitted PR review.
uweigand submitted PR review.
uweigand created PR review comment:
Doesn't this swap the two inputs now? The original logic results in
src1
if the condition is true andsrc2
if it is false. The new logic appears to result insrc1
if the condition is false ...
uweigand created PR review comment:
Similarly, these helpers shouldn't be needed anymore.
uweigand created PR review comment:
The whole reason for using an
emit_producer
/emit_consumer
pair here, instead of just doing(with_flags_reg producer consumer)
, is that we needed to place theemit_mov
in between. If this is no longer necessary, I guess we could just usewith_flags_reg
. (But that can also be a standalone follow-up cleanup.)
uweigand created PR review comment:
Why the
extern
when it is already fully defined here?
uweigand created PR review comment:
Again the question, why have another definition here when it is already defined in ISLE? I think it should be one place or the other, not both.
uweigand created PR review comment:
So these changes I really do not like, for the same reason originally discussed here: https://github.com/bytecodealliance/wasmtime/pull/4856#discussion_r962932362
I really think the final assembly output needs to remain valid s390x assembler text, that could be run through the system assembler. We fixed that problem originally by using a
pretty_print_reg_mod
helper that prints both virtual regs before regalloc, and just the single physreg after regalloc.Similarly, I think we should define
pretty_print_regpair
and/orpretty_print_regpair_mod
helpers that print all participating virtual regs before regalloc, and just the single physreg that ends up in the assembler text after regalloc.That should also remove many of the changes to the filetests.
uweigand created PR review comment:
I believe this shouldn't be needed at all anymore, see my patch attached here: https://github.com/bytecodealliance/wasmtime/pull/4856#issuecomment-1237240252
uweigand created PR review comment:
I think there should still be a FIXME somewhere that these aren't actually allocated but hard-coded. But maybe that FIXME should now go into
s390x_get_operands
instead?
uweigand created PR review comment:
Might as well use
rd1
for clarity then.
uweigand created PR review comment:
Chris' original patch always uses
ri
for the input side of tied operands. I think it would be preferable to be consistent with that everywhere.
elliottt submitted PR review.
elliottt created PR review comment:
How about adding this FIXME where the allocatable registers are defined?
elliottt submitted PR review.
elliottt created PR review comment:
This is so that we have the constructors available in the ISLE source, but don't emit the actual declaration.
elliottt submitted PR review.
elliottt created PR review comment:
It's a little unfortunate to have the redundancy here, but this is currently the best way to have the constructors from an enum available when the declaration is in rust.
elliottt submitted PR review.
elliottt created PR review comment:
The other backends had drifted from being valid assembly, and I had assumed that the same was true of the s390x backend; I'll remove the changes to the pretty-printer :+1:
elliottt created PR review comment:
Great point, that will be more resilient long-term as well.
elliottt submitted PR review.
uweigand created PR review comment:
Fine with me, thanks!
uweigand submitted PR review.
uweigand submitted PR review.
uweigand created PR review comment:
I guess my question is, why does the declaration have to be in rust? Why not just have it in ISLE, and use the auto-generated definition from Rust code, just like with the other enums defined in ISLE?
elliottt updated PR #5073 from trevor/ssa-s390x
to main
.
elliottt submitted PR review.
elliottt created PR review comment:
The precise-output test changes should be mostly register changes now.
elliottt submitted PR review.
elliottt created PR review comment:
Unfortunately the reason in this case is that it allows us to derive
Copy
forRegPair
andWritableRegPair
. Without derivingCopy
the register pair values get moved into the instructions that use them, and borrow errors are generated for any subsequent use of those values. This might be a pattern that we should enable in ISLE, but for now it's easiest to define those types in rust instead.
elliottt updated PR #5073 from trevor/ssa-s390x
to main
.
elliottt submitted PR review.
elliottt created PR review comment:
I've moved this comment to the first fixed allocation of
%r2
/%r3
instead, as that made it easier to provide some context.
elliottt submitted PR review.
elliottt created PR review comment:
Now that I've reverted everything back to how it was before, I'll have a look at adding the helper that you mentioned to print the additional regs when they don't have a physical allocation.
elliottt submitted PR review.
elliottt created PR review comment:
You're right, it absolutely does. However, it's not used anywhere, so I'll remove it.
elliottt updated PR #5073 from trevor/ssa-s390x
to main
.
elliottt edited PR review comment.
elliottt updated PR #5073 from trevor/ssa-s390x
to main
.
elliottt submitted PR review.
elliottt created PR review comment:
I migrated
select_bool_reg
andselect_bool_imm
, which removed the need foremit_consumer
.
elliottt updated PR #5073 from trevor/ssa-s390x
to main
.
elliottt submitted PR review.
elliottt created PR review comment:
I made the changes to introduce
pretty_print_regpair
,pretty_print_regpair_mod
, andpretty_print_regpair_mod_lo
. This definitely cleaned up the pretty-printing code quite a bit, so thank you for the suggestion!I also opted to make
RegPair
andWritableRegPair
structs instead of enums, and marking them asprimitive
types in ISLE. This required moving some functions to rust likeregpair
,regpair_lo
, etc, but made working with their values in rust much nicer.
uweigand submitted PR review.
uweigand submitted PR review.
uweigand created PR review comment:
Not sure if there can be a phase where only one of
hi
andlo
are real regs? Also, it would be good to assert in the real reg case that we have a valid even/odd pair.
uweigand created PR review comment:
Should be
ri
in the comment now.
uweigand created PR review comment:
This also now seems to generate an extra
lgr
(see thectz_64
test case).
uweigand created PR review comment:
There's actually no need now to assert we get the specific r2/r3 pair here. The emit code might as well accept any valid register pair, i.e. only assert that we have an even/odd GPR pair, and that the tied input (if any) matches the output.
Then the only place where r2/r3 are still hardcoded is in
s390x_get_operands
.
uweigand created PR review comment:
So the old
select_bool_imm/reg
were written the way they were specifically to avoid unnecessary register moves. It appears this new way brings those unnecessary moves back (see e.g. the many tests with an extralgr
inicmp.clif
. This is not a show-stopper for this PR, we can look into it afterwards -- just wondering if there's something we can to do improve on this again ...
uweigand created PR review comment:
Perfect!
uweigand created PR review comment:
Ah, I see. Thanks for the explanation.
uweigand created PR review comment:
cmov_imm_regpair_lo/hi
were only used by the old division code, so they're really not needed anymore
uweigand submitted PR review.
uweigand created PR review comment:
Yes, this is looking really nice now, thanks!
elliottt submitted PR review.
elliottt created PR review comment:
That's much cleaner, thanks!
elliottt updated PR #5073 from trevor/ssa-s390x
to main
.
elliottt submitted PR review.
elliottt created PR review comment:
I changed this to require that both
hi
andlo
are real registers, and to assert that they are valid.
elliottt updated PR #5073 from trevor/ssa-s390x
to main
.
elliottt submitted PR review.
elliottt created PR review comment:
I went ahead and reverted this commit, as it's not strictly necessary for removing uses of
reg_mod
.
elliottt submitted PR review.
elliottt created PR review comment:
I dug into this a bit, and have an explanation. Here's the program before register allocation:
Inst 0: args %v128=%r2 Inst 1: lcgr %v131, %v128 Inst 2: ngrk %v132, %v128, %v131 Inst 3: flogr %v133/%v134, %v132 Inst 4: locghie %v135<-%v133, -1 Inst 5: lghi %v136, 63 Inst 6: sgrk %v137, %v136, %v135 Inst 7: lgr %v130, %v129 Inst 8: lgr %r2, %v130 Inst 9: br %r14
There are two live-ranges that contribute to the move: one that covers the output of instruction 3 to where it's consumed in instruction 4 (
a
), and one that covers the output of instruction 4 to where it's consumed in instruction 6 (b
). As the second live-rangeb
is longer thana
, it will be allocated first. As it has no fixed register constraints, the allocator is free to choose anything, and in the test output it chooses%r4
. As theflogr
instruction has a constraint that says it must use%r2
/%r3
for its output register, allocating a register for%v133
produces a move from%r2
to%r4
once the live-rangea
is processed.It would be great to fix this problem by merging those two live-ranges into a single bundle, but regalloc2 is currently unable to merge live ranges when one contains a fixed register constraint, like
a
. It's not all bad news though, getting cranelift producing lowered output that's in SSA will enable some simplifications in regalloc2 that will enable merging these two live-ranges, so these sort of unnecessary moves should be less common in the future.
elliottt requested cfallin for a review on PR #5073.
cfallin submitted PR review.
uweigand created PR review comment:
I see, thanks for having a look. This looks all good to me now.
uweigand submitted PR review.
uweigand submitted PR review.
elliottt merged PR #5073.
Last updated: Dec 23 2024 at 13:07 UTC