Stream: git-wasmtime

Topic: wasmtime / PR #5073 Remove uses of `reg_mod` from s390x


view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 00:00):

elliottt opened PR #5073 from trevor/ssa-s390x to main:

Remove uses of reg_mod from the s390x backend. This required moving away from using r0/r1 as the result registers from a few different pseudo instructions, standardizing instead on r2/r3. That change was necessary as regalloc2 will not correctly allocate registers that aren't listed in the allocatable set, which r0/r1 are not.
<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 00:48):

elliottt updated PR #5073 from trevor/ssa-s390x to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 00:49):

elliottt edited PR #5073 from trevor/ssa-s390x to main:

Remove uses of reg_mod from the s390x backend. This required moving away from using r0/r1 as the result registers from a few different pseudo instructions, standardizing instead on r2/r3. That change was necessary as regalloc2 will not correctly allocate registers that aren't listed in the allocatable set, which r0/r1 are not.

cc @uweigand, this is the same set of changes you and @cfallin iterated on in #4856.
<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 03:37):

elliottt has marked PR #5073 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 12:13):

uweigand submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 12:13):

uweigand submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 12:13):

uweigand created PR review comment:

Doesn't this swap the two inputs now? The original logic results in src1 if the condition is true and src2 if it is false. The new logic appears to result in src1 if the condition is false ...

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 12:13):

uweigand created PR review comment:

Similarly, these helpers shouldn't be needed anymore.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 12:13):

uweigand created PR review comment:

The whole reason for using an emit_producer / emit_consumer pair here, instead of just doing (with_flags_reg producer consumer), is that we needed to place the emit_mov in between. If this is no longer necessary, I guess we could just use with_flags_reg. (But that can also be a standalone follow-up cleanup.)

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 12:13):

uweigand created PR review comment:

Why the extern when it is already fully defined here?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 12:13):

uweigand created PR review comment:

Again the question, why have another definition here when it is already defined in ISLE? I think it should be one place or the other, not both.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 12:13):

uweigand created PR review comment:

So these changes I really do not like, for the same reason originally discussed here: https://github.com/bytecodealliance/wasmtime/pull/4856#discussion_r962932362

I really think the final assembly output needs to remain valid s390x assembler text, that could be run through the system assembler. We fixed that problem originally by using a pretty_print_reg_mod helper that prints both virtual regs before regalloc, and just the single physreg after regalloc.

Similarly, I think we should define pretty_print_regpair and/or pretty_print_regpair_mod helpers that print all participating virtual regs before regalloc, and just the single physreg that ends up in the assembler text after regalloc.

That should also remove many of the changes to the filetests.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 12:13):

uweigand created PR review comment:

I believe this shouldn't be needed at all anymore, see my patch attached here: https://github.com/bytecodealliance/wasmtime/pull/4856#issuecomment-1237240252

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 12:13):

uweigand created PR review comment:

I think there should still be a FIXME somewhere that these aren't actually allocated but hard-coded. But maybe that FIXME should now go into s390x_get_operands instead?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 12:13):

uweigand created PR review comment:

Might as well use rd1 for clarity then.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 12:13):

uweigand created PR review comment:

Chris' original patch always uses ri for the input side of tied operands. I think it would be preferable to be consistent with that everywhere.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 16:49):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 16:49):

elliottt created PR review comment:

How about adding this FIXME where the allocatable registers are defined?

https://github.com/bytecodealliance/wasmtime/blob/0667a412d70c6b4d9f97502a351db54373c18c91/cranelift/codegen/src/isa/s390x/inst/regs.rs#L94-L95

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 16:54):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 16:54):

elliottt created PR review comment:

This is so that we have the constructors available in the ISLE source, but don't emit the actual declaration.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 16:55):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 16:55):

elliottt created PR review comment:

It's a little unfortunate to have the redundancy here, but this is currently the best way to have the constructors from an enum available when the declaration is in rust.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 16:56):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 16:56):

elliottt created PR review comment:

The other backends had drifted from being valid assembly, and I had assumed that the same was true of the s390x backend; I'll remove the changes to the pretty-printer :+1:

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 16:57):

elliottt created PR review comment:

Great point, that will be more resilient long-term as well.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 16:57):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 17:37):

uweigand created PR review comment:

Fine with me, thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 17:37):

uweigand submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 17:38):

uweigand submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 17:38):

uweigand created PR review comment:

I guess my question is, why does the declaration have to be in rust? Why not just have it in ISLE, and use the auto-generated definition from Rust code, just like with the other enums defined in ISLE?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 18:20):

elliottt updated PR #5073 from trevor/ssa-s390x to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 18:24):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 18:24):

elliottt created PR review comment:

The precise-output test changes should be mostly register changes now.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 19:08):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 19:08):

elliottt created PR review comment:

Unfortunately the reason in this case is that it allows us to derive Copy for RegPair and WritableRegPair. Without deriving Copy the register pair values get moved into the instructions that use them, and borrow errors are generated for any subsequent use of those values. This might be a pattern that we should enable in ISLE, but for now it's easiest to define those types in rust instead.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 19:15):

elliottt updated PR #5073 from trevor/ssa-s390x to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 19:16):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 19:16):

elliottt created PR review comment:

I've moved this comment to the first fixed allocation of %r2/%r3 instead, as that made it easier to provide some context.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 19:20):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 19:20):

elliottt created PR review comment:

Now that I've reverted everything back to how it was before, I'll have a look at adding the helper that you mentioned to print the additional regs when they don't have a physical allocation.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 20:06):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 20:06):

elliottt created PR review comment:

You're right, it absolutely does. However, it's not used anywhere, so I'll remove it.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 20:10):

elliottt updated PR #5073 from trevor/ssa-s390x to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 20:17):

elliottt edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 20:32):

elliottt updated PR #5073 from trevor/ssa-s390x to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 20:33):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 20:33):

elliottt created PR review comment:

I migrated select_bool_reg and select_bool_imm, which removed the need for emit_consumer.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 22:18):

elliottt updated PR #5073 from trevor/ssa-s390x to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 22:24):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 22:24):

elliottt created PR review comment:

I made the changes to introduce pretty_print_regpair, pretty_print_regpair_mod, and pretty_print_regpair_mod_lo. This definitely cleaned up the pretty-printing code quite a bit, so thank you for the suggestion!

I also opted to make RegPair and WritableRegPair structs instead of enums, and marking them as primitive types in ISLE. This required moving some functions to rust like regpair, regpair_lo, etc, but made working with their values in rust much nicer.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 11:36):

uweigand submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 11:36):

uweigand submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 11:36):

uweigand created PR review comment:

Not sure if there can be a phase where only one of hi and lo are real regs? Also, it would be good to assert in the real reg case that we have a valid even/odd pair.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 11:36):

uweigand created PR review comment:

Should be ri in the comment now.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 11:36):

uweigand created PR review comment:

This also now seems to generate an extra lgr (see the ctz_64 test case).

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 11:36):

uweigand created PR review comment:

There's actually no need now to assert we get the specific r2/r3 pair here. The emit code might as well accept any valid register pair, i.e. only assert that we have an even/odd GPR pair, and that the tied input (if any) matches the output.

Then the only place where r2/r3 are still hardcoded is in s390x_get_operands.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 11:36):

uweigand created PR review comment:

So the old select_bool_imm/reg were written the way they were specifically to avoid unnecessary register moves. It appears this new way brings those unnecessary moves back (see e.g. the many tests with an extra lgr in icmp.clif. This is not a show-stopper for this PR, we can look into it afterwards -- just wondering if there's something we can to do improve on this again ...

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 11:36):

uweigand created PR review comment:

Perfect!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 11:36):

uweigand created PR review comment:

Ah, I see. Thanks for the explanation.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 11:36):

uweigand created PR review comment:

cmov_imm_regpair_lo/hi were only used by the old division code, so they're really not needed anymore

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 11:37):

uweigand submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 11:37):

uweigand created PR review comment:

Yes, this is looking really nice now, thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 16:32):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 16:32):

elliottt created PR review comment:

That's much cleaner, thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 17:17):

elliottt updated PR #5073 from trevor/ssa-s390x to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 17:18):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 17:18):

elliottt created PR review comment:

I changed this to require that both hi and lo are real registers, and to assert that they are valid.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 17:56):

elliottt updated PR #5073 from trevor/ssa-s390x to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 17:57):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2022 at 17:57):

elliottt created PR review comment:

I went ahead and reverted this commit, as it's not strictly necessary for removing uses of reg_mod.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2022 at 00:00):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2022 at 00:00):

elliottt created PR review comment:

I dug into this a bit, and have an explanation. Here's the program before register allocation:

  Inst 0: args %v128=%r2
  Inst 1: lcgr %v131, %v128
  Inst 2: ngrk %v132, %v128, %v131
  Inst 3: flogr %v133/%v134, %v132
  Inst 4: locghie %v135<-%v133, -1
  Inst 5: lghi %v136, 63
  Inst 6: sgrk %v137, %v136, %v135
  Inst 7: lgr %v130, %v129
  Inst 8: lgr %r2, %v130
  Inst 9: br %r14

There are two live-ranges that contribute to the move: one that covers the output of instruction 3 to where it's consumed in instruction 4 (a), and one that covers the output of instruction 4 to where it's consumed in instruction 6 (b). As the second live-range b is longer than a, it will be allocated first. As it has no fixed register constraints, the allocator is free to choose anything, and in the test output it chooses %r4. As the flogr instruction has a constraint that says it must use %r2/%r3 for its output register, allocating a register for %v133 produces a move from %r2 to %r4 once the live-range a is processed.

It would be great to fix this problem by merging those two live-ranges into a single bundle, but regalloc2 is currently unable to merge live ranges when one contains a fixed register constraint, like a. It's not all bad news though, getting cranelift producing lowered output that's in SSA will enable some simplifications in regalloc2 that will enable merging these two live-ranges, so these sort of unnecessary moves should be less common in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2022 at 00:00):

elliottt requested cfallin for a review on PR #5073.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2022 at 00:02):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2022 at 06:19):

uweigand created PR review comment:

I see, thanks for having a look. This looks all good to me now.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2022 at 06:19):

uweigand submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2022 at 06:19):

uweigand submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2022 at 16:22):

elliottt merged PR #5073.


Last updated: Nov 22 2024 at 17:03 UTC