wasmtime / PR #6805 aarch64: Move AMode computation into... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #6805 aarch64: Move AMode computation into...

Wasmtime GitHub notifications bot (Aug 04 2023 at 23:13):

alexcrichton opened PR #6805 from alexcrichton:aarch64-amode to bytecodealliance:main:

This commit moves the computation of the AMode enum for addressing
from Rust into ISLE. This enables deleting a good deal of Rust code in
favor of (hopefully) more readable ISLE code.

This does not mirror the preexisting logic exactly but instead takes a
different approach for generating the AMode. Previously the entire
chain of iadds input into an address were unfolded into 32-bit and
64-bit operations and then those were re-combined as possible into an
AMode (possibly emitting add instructions. Instead now pattern
matching is used to represent this. The net result is that amodes are
emitted slightly differently here and there in a number of updated test
cases.

I've tried to verify in all test cases that the number of instructions
has not increased and the same logical operation is happening. The exact
AMode may differ but at least instruction-wise this shouldn't be a
regression. My hope is that if anything needs changing that can be
represented with updates to the rule precedence in ISLE or various other
special cases.

One part I found a little surprising was that the immediate with a
load/store instruction is not actually used much of the time. I naively
thought that the mid-end optimizations would move iadd immediates into
the load/store immediate but that is not the case. This necessitated two
extra ISLE rules to manually peel off immediates and fold them into the
load/store immediate.

I'll also note here that there were a number of related refactorings and such to enable this change, so I'd recommend reading commit-by-commit to get a better idea of why things are changed and why what got updated go updated.

Wasmtime GitHub notifications bot (Aug 04 2023 at 23:13):

alexcrichton requested abrown for a review on PR #6805.

Wasmtime GitHub notifications bot (Aug 04 2023 at 23:13):

alexcrichton requested wasmtime-compiler-reviewers for a review on PR #6805.

Wasmtime GitHub notifications bot (Aug 09 2023 at 04:47):

cfallin requested cfallin for a review on PR #6805.

Wasmtime GitHub notifications bot (Aug 09 2023 at 04:51):

cfallin created PR review comment:

This is a little odd -- we no longer fold the sign-extend into its two uses with builtin extend modes on the add and in the AMode; and also we have two separate adds of constants that don't get folded together. Any idea what's causing this?

Wasmtime GitHub notifications bot (Aug 09 2023 at 04:51):

cfallin submitted PR review:

I'll yoink this review from @abrown as I'm happy to take aarch64 stuff and am probably a little more familiar...

Overall this PR is fantastic -- the pattern-matching is so much cleaner. Thanks for porting this over!

I just spotted one possibly questionable regression below but if we can explain it and justify it as a rare case then I don't think it should prevent us from merging; just want to be sure.

Wasmtime GitHub notifications bot (Aug 09 2023 at 04:51):

cfallin submitted PR review:

I'll yoink this review from @abrown as I'm happy to take aarch64 stuff and am probably a little more familiar...

Overall this PR is fantastic -- the pattern-matching is so much cleaner. Thanks for porting this over!

I just spotted one possibly questionable regression below but if we can explain it and justify it as a rare case then I don't think it should prevent us from merging; just want to be sure.

Wasmtime GitHub notifications bot (Aug 09 2023 at 18:02):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Aug 09 2023 at 18:02):

alexcrichton created PR review comment:

For the constants I know those aren't combined because one is an iconst operand to an add and one is the offset listed on the load instruction. Those aren't currently combined with egraphs (or anywhere else I think) so it ends up coming out here as two constants. Looks like wasm translation doesn't use the offset on load and store though so this at least shouldn't affect that.

The sign-extend I think doens't pattern-match here due to its depth in the iadd tree which looks like:
                +
             -------
            /        \

           +          +
          / \        / \

         +  v0      +  v0
        / \        / \
       /   32     /   32
      /          /
  sext(v1)   sext(v1)
there the sextend nodes are so deep in the tree the pattern matching isn't picking them out.

That's at least why it's not matching, but as to the impact of it I'm not really sure. I don't think this affects wasm much since wasm typically has one main sign-extend to pattern match, but I'm also not super-certain about that.

Wasmtime GitHub notifications bot (Aug 09 2023 at 18:23):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Aug 09 2023 at 18:23):

cfallin created PR review comment:

OK, yeah, that makes sense. The old handwritten code would traverse the tree arbitrarily deep and gather all the constant offsets, extends, etc., which is why it can do better here I think. But it's possible that this doesn't matter too much in practice.

If we can just spot-check performance (bz2 and spidermonkey from Sightglass or whatnot) to make sure, I think I'm happy with this regression at the fringe.

Wasmtime GitHub notifications bot (Aug 09 2023 at 19:13):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Aug 09 2023 at 19:13):

alexcrichton created PR review comment:

Sightglass shows:

execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 142380.50 ± 5778.82 (confidence = 99%)

  new-amode.so is 1.02x to 1.02x faster than main.so!

  [8232969 8267115.57 8308418] main.so
  [8072036 8124735.07 8159535] new-amode.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 1706947.06 ± 1093497.57 (confidence = 99%)

  new-amode.so is 1.00x to 1.02x faster than main.so!

  [151473126 155506902.82 158095691] main.so
  [145110211 153799955.76 160548952] new-amode.so

and everything else is "No difference in performance."

I haven't done much to make this machine super reliable about performance gathering, so I'd interpret this as "no difference" across the board and the 2% here is probably noise.

Wasmtime GitHub notifications bot (Aug 09 2023 at 19:13):

alexcrichton edited PR review comment.

Wasmtime GitHub notifications bot (Aug 09 2023 at 19:33):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Aug 09 2023 at 19:33):

cfallin created PR review comment:

Sounds good -- ship it!

Wasmtime GitHub notifications bot (Aug 09 2023 at 21:17):

alexcrichton merged PR #6805.

Last updated: Apr 17 2025 at 18:04 UTC