Stream: git-wasmtime

Topic: wasmtime / issue #3785 [RFC] ISLE: Migrate call and retur...


view this post on Zulip Wasmtime GitHub notifications bot (Feb 09 2022 at 13:49):

github-actions[bot] commented on issue #3785:

Subscribe to Label Action

cc @cfallin, @fitzgen

<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64", "isle"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Feb 25 2022 at 00:42):

uweigand commented on issue #3785:

Rebased now that both prerequisite PRs were merged. This is probably still not quite ready to merge, but I'd certainly appreciate any comments!

view this post on Zulip Wasmtime GitHub notifications bot (May 16 2022 at 17:02):

uweigand commented on issue #3785:

Hi @cfallin, this doesn't implement the ABISig refactor yet, it's just a rebase to current mainline.

However, this shows the regalloc issue I was mentioning earlier today. You'll notice the patch changes the reftypes.clif file, which shows the regalloc regression I've been seeing. Note the one additional move in (new) line 80, and the extra use of call-clobbered register %r10.

view this post on Zulip Wasmtime GitHub notifications bot (May 16 2022 at 17:08):

uweigand edited a comment on issue #3785:

Hi @cfallin, this doesn't implement the ABISig refactor yet, it's just a rebase to current mainline.

However, this shows the regalloc issue I was mentioning earlier today. You'll notice the patch changes the reftypes.clif file, which shows the regalloc regression I've been seeing. Note the one additional move in (new) line 80, and the extra use of call-saved register %r10.

view this post on Zulip Wasmtime GitHub notifications bot (May 16 2022 at 21:26):

cfallin commented on issue #3785:

@uweigand I spent a few hours digging into exactly why the extra move occurs here, to see if there is anything actionable. To summarize, here is what's going on:

So how could we address this?

So there's a plausible path forward (last option above), but it's not an easy fix.

In general, there is going to be "noise" like this with any shift in complex heuristics; I do apologize, and wish we could always avoid regressions like this, but it's also an impossible request (not saying you are making it, but stating this out loud since it's the topic at hand) to always avoid regressions. In general the best we can do is measure and hill-climb toward better performance in aggregate. In any case, I'll file an issue over in regalloc2 to note the above idea, and at some point I might have a slice of time (it would probably take ~a few days to a week) to get to it.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2022 at 00:42):

cfallin commented on issue #3785:

OK, so given all the above, I got nerd-sniped for the rest of the afternoon and did bytecodealliance/regalloc2#49. It seems to address the issue above (in fact the code is a little shorter than with regalloc.rs).

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2022 at 09:43):

uweigand commented on issue #3785:

Hi @cfallin, thanks for the detailed analysis and the quick fix! I can confirm that with regalloc2 0.1.3 the code generated for the reftypes.clif test case is improved, both without this PR and with this PR applied. So this regalloc change looks like a clear improvement to me.

However, I'm still wondering why we are seeing any regalloc differences from this PR - both with and without your regalloc2 change applied. Without the regalloc2 change, we're seeing the regression shown in the PR. But even with the regalloc2 change, we are seeing a change in generated code (not a regression, but still a change):

7,8c7,8
<         "  bras %r1, 12 ; data %f + 0 ; lg %r5, 0(%r1)",
<         "  basr %r14, %r5",
---
>         "  bras %r1, 12 ; data %f + 0 ; lg %r4, 0(%r1)",
>         "  basr %r14, %r4",
12,13c12,13
<         "  llcr %r5, %r2",
<         "  chi %r5, 0",
---
>         "  llcr %r3, %r2",
>         "  chi %r3, 0",
28,29c28,29
<         "  la %r4, 160(%r15)",
<         "  lg %r4, 0(%r4)",
---
>         "  la %r5, 160(%r15)",
>         "  lg %r4, 0(%r5)",

This seems strange given that the code regalloc sees before and after this PR is nearly identical.
Before this PR:

  Entry block: 0
  v131 := v146
  v134 := v140
  v135 := v139
Block 0:
    (original IR block: block0)
    (successor: Block 1)
    (successor: Block 3)
    (instruction range: 0 .. 11)
  Inst 0: lgr %v128, %r2
  Inst 1: lgr %v129, %r3
  Inst 2: lgr %r2, %v128
  Inst 3: bras %r1, 12 ; data %f + 0 ; lg %v147, 0(%r1)
  Inst 4: basr %r14, %v147
  Inst 5: lr %v130, %r2
  Inst 6: la %v146, 0(%r15)
  Inst 7: stg %v128, 0(%v131)
  Inst 8: llcr %v145, %v130
  Inst 9: chi %v145, 0
  Inst 10: jgnlh label1 ; jg label3

After this PR:

  Entry block: 0
  v130 := v148
  v131 := v146
  v134 := v140
  v135 := v139
Block 0:
    (original IR block: block0)
    (successor: Block 1)
    (successor: Block 3)
    (instruction range: 0 .. 11)
  Inst 0: lgr %v128, %r2
  Inst 1: lgr %v129, %r3
  Inst 2: lgr %r2, %v128
  Inst 3: bras %r1, 12 ; data %f + 0 ; lg %v147, 0(%r1)
  Inst 4: basr %r14, %v147
  Inst 5: lr %v148, %r2
  Inst 6: la %v146, 0(%r15)
  Inst 7: stg %v128, 0(%v131)
  Inst 8: llcr %v145, %v130
  Inst 9: chi %v145, 0
  Inst 10: jgnlh label1 ; jg label3

The only difference is that before the PR, insts 5 and 8 use the same vreg (v130), while after the PR, they use two different vregs (v130 and v148), which are marked as aliases.

If aliased vregs are indeed treated identically by regalloc, why does this change still appear to make a difference?

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2022 at 17:40):

cfallin commented on issue #3785:

Ah, so I think what is happening is that the aliasing rewrites toward the new vreg, not the old one: v130 is made an alias of v148, so all instances of v130 become v148 as seen by regalloc.

This means that the order of vregs seen by regalloc2 is not quite the same, so allocation decisions may be made in a slightly different order.

This aligns with what is seen in your diff above, I think: the instruction sequence is exactly the same, but the specific register numbers are different in a few cases.

view this post on Zulip Wasmtime GitHub notifications bot (May 17 2022 at 17:56):

uweigand commented on issue #3785:

This means that the order of vregs seen by regalloc2 is not quite the same, so allocation decisions may be made in a slightly different order.

I see, that does indeed look like it explains the difference. I think with the regalloc2 patch we should be all good then. Thanks again!

view this post on Zulip Wasmtime GitHub notifications bot (Jun 28 2022 at 13:04):

uweigand commented on issue #3785:

Hi @cfallin, this latest version now has the ABISig refactoring and a bunch of other general interface cleanups.

As far as I am concerned, this should now be ready to be merged, so I'd appreciate another review - any comments welcome!

view this post on Zulip Wasmtime GitHub notifications bot (Jun 28 2022 at 17:12):

uweigand commented on issue #3785:

Updated to resolve merge conflicts due to the uses_defs_clobbers change.


Last updated: Jan 24 2025 at 00:11 UTC