alexcrichton opened PR #6805 from alexcrichton:aarch64-amode
to bytecodealliance:main
:
This commit moves the computation of the
AMode
enum for addressing
from Rust into ISLE. This enables deleting a good deal of Rust code in
favor of (hopefully) more readable ISLE code.This does not mirror the preexisting logic exactly but instead takes a
different approach for generating theAMode
. Previously the entire
chain ofiadd
s input into an address were unfolded into 32-bit and
64-bit operations and then those were re-combined as possible into an
AMode
(possibly emittingadd
instructions. Instead now pattern
matching is used to represent this. The net result is that amodes are
emitted slightly differently here and there in a number of updated test
cases.I've tried to verify in all test cases that the number of instructions
has not increased and the same logical operation is happening. The exact
AMode
may differ but at least instruction-wise this shouldn't be a
regression. My hope is that if anything needs changing that can be
represented with updates to the rule precedence in ISLE or various other
special cases.One part I found a little surprising was that the immediate with a
load/store instruction is not actually used much of the time. I naively
thought that the mid-end optimizations would move iadd immediates into
the load/store immediate but that is not the case. This necessitated two
extra ISLE rules to manually peel off immediates and fold them into the
load/store immediate.
I'll also note here that there were a number of related refactorings and such to enable this change, so I'd recommend reading commit-by-commit to get a better idea of why things are changed and why what got updated go updated.
alexcrichton requested abrown for a review on PR #6805.
alexcrichton requested wasmtime-compiler-reviewers for a review on PR #6805.
cfallin requested cfallin for a review on PR #6805.
cfallin created PR review comment:
This is a little odd -- we no longer fold the sign-extend into its two uses with builtin extend modes on the
add
and in theAMode
; and also we have two separate adds of constants that don't get folded together. Any idea what's causing this?
cfallin submitted PR review:
I'll yoink this review from @abrown as I'm happy to take aarch64 stuff and am probably a little more familiar...
Overall this PR is fantastic -- the pattern-matching is so much cleaner. Thanks for porting this over!
I just spotted one possibly questionable regression below but if we can explain it and justify it as a rare case then I don't think it should prevent us from merging; just want to be sure.
cfallin submitted PR review:
I'll yoink this review from @abrown as I'm happy to take aarch64 stuff and am probably a little more familiar...
Overall this PR is fantastic -- the pattern-matching is so much cleaner. Thanks for porting this over!
I just spotted one possibly questionable regression below but if we can explain it and justify it as a rare case then I don't think it should prevent us from merging; just want to be sure.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
For the constants I know those aren't combined because one is an
iconst
operand to an add and one is the offset listed on theload
instruction. Those aren't currently combined with egraphs (or anywhere else I think) so it ends up coming out here as two constants. Looks like wasm translation doesn't use the offset onload
andstore
though so this at least shouldn't affect that.The sign-extend I think doens't pattern-match here due to its depth in the
iadd
tree which looks like:+ ------- / \ + + / \ / \ + v0 + v0 / \ / \ / 32 / 32 / / sext(v1) sext(v1)
there the
sextend
nodes are so deep in the tree the pattern matching isn't picking them out.That's at least why it's not matching, but as to the impact of it I'm not really sure. I don't think this affects wasm much since wasm typically has one main sign-extend to pattern match, but I'm also not super-certain about that.
cfallin submitted PR review.
cfallin created PR review comment:
OK, yeah, that makes sense. The old handwritten code would traverse the tree arbitrarily deep and gather all the constant offsets, extends, etc., which is why it can do better here I think. But it's possible that this doesn't matter too much in practice.
If we can just spot-check performance (
bz2
andspidermonkey
from Sightglass or whatnot) to make sure, I think I'm happy with this regression at the fringe.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Sightglass shows:
execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 142380.50 ± 5778.82 (confidence = 99%) new-amode.so is 1.02x to 1.02x faster than main.so! [8232969 8267115.57 8308418] main.so [8072036 8124735.07 8159535] new-amode.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 1706947.06 ± 1093497.57 (confidence = 99%) new-amode.so is 1.00x to 1.02x faster than main.so! [151473126 155506902.82 158095691] main.so [145110211 153799955.76 160548952] new-amode.so
and everything else is "No difference in performance."
I haven't done much to make this machine super reliable about performance gathering, so I'd interpret this as "no difference" across the board and the 2% here is probably noise.
alexcrichton edited PR review comment.
cfallin submitted PR review.
cfallin created PR review comment:
Sounds good -- ship it!
alexcrichton merged PR #6805.
Last updated: Jan 24 2025 at 00:11 UTC