elliottt commented on issue #6743:
@alexcrichton or @cfallin, could you take a look at this?
alexcrichton commented on issue #6743:
Sure yeah I can help review. Looking over this I can't help but personally feel that a bit of a refactor is in order (preexisting issue, not a new one) as I find it a bit odd to follow. This feels similar to the addressing that historically happened in x64 but I changed awhile back where it was recursive and walked thorugh an entire
iadd
chain to compute the final amode. To me the problem here is that all add operands are collected into a vector but if the vector is larger than 1 or 2 elements everything falls back to the normal operands and amodes instead.Instead I liked the idea in the x64 backend where you start with an
AMode
and then add items to it which might extend the amode. This is theto_amode_add
function in the x64 lowering where it's iteratively modified as possible through subsequent "peeling" of add operands. This avoided the x64-specific problem of recursing entirely and blowing the stack on largeiadd
chains, but I think would be appropriate for aarch64 as well since it avoids collecting large 32/64-bit operand lists and collects only precisely what's needed (in theory).Now I realize though that the refactoring may be a bit bigger than desired though on this PR. I bring this up though because this PR has a few bugs I'll list below. I think the bugs stem from the fact that currently the only pattern matched is "a bunch of adds" and the pattern matching add here should only match a single shift in a very specific location as opposed to any location.
Multiple shifted operands
function %a(i64, i64, i64) -> i32 { block0(v0: i64, v1: i64, v2: i64): v3 = iconst.i64 3 v4 = ishl v0, v3 v5 = ishl v1, v3 v6 = iadd v4, v5 v7 = iadd v6, v2 v8 = load.i32 v7 return v8 }
yields:
$ cargo run -q compile foo.clif --target aarch64 thread 'main' panicked at 'assertion failed: shifted.is_none()', cranelift/codegen/src/isa/aarch64/lower.rs:303:21 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
A single shifted operand
function %a(i64) -> i32 { block0(v0: i64): v1 = iconst.i64 3 v2 = ishl v0, v1 v3 = load.i32 v2 return v3 }
yields:
$ cargo run -q compile foo.clif --target aarch64 thread 'main' panicked at 'assertion failed: `(left == right)` left: `32`, right: `64`', cranelift/codegen/src/isa/aarch64/inst/emit.rs:1030:25 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Shift not actually taken into account
function %a(i64, i32, i64) -> i32 { block0(v0: i64, v1: i32, v2: i64): v3 = uextend.i64 v1 v4 = iadd v0, v3 v5 = ishl_imm v2, 2 v6 = iadd v4, v5 v7 = load.i32 v6 return v7 }
yields:
$ cargo run -q compile foo.clif --target aarch64 -D .byte 0, 72, 97, 184, 192, 3, 95, 214 Disassembly of 8 bytes: 0: 00 48 61 b8 ldr w0, [x0, w1, uxtw] 4: c0 03 5f d6 ret
(note that processing of
x2
is missing here)There's also a number of other cases here where
shifted
is only taken into account in one case rather than all of them so the fall-through is ignoringshifted
in more cases than just this.
Those are some examples which are currently incorrect with this PR, and at least personally I don't see an "easy fix" with a small piece of handling here-or-there to fix these issues. That's why I think that it may be best to refactor this amode computation to be a bit simpler than before (probably in ISLE rather than Rust) and then I think it might be less brittle to add a special case. For example a
to_amode_add
function for AArch64 could pattern match an existing amode ofUnsignedOffset
with offset 0 with a shifting operation to produceRegScaled
. OnceRegScaled
is produced, however, no further amode massaging would be done.I could very well be wrong though and there may be an easy fix to these issues.
maekawatoshiki commented on issue #6743:
I appreciate your detailed comment.
I totally agree that we need to refactor this amode computation, since I thought so when I first read the code... Also the bugs you mentioned are really helpful (and I should have tested on such cases).
I plan on adding ISLE rules that replaces the existing amode computation.
maekawatoshiki commented on issue #6743:
I tried adding ISLE rules that replace existing
lower_address
, while referring to https://github.com/bytecodealliance/wasmtime/pull/5986.
But I figured out that it's hard to be done for me and, for someone, we should open an issue for it :)
alexcrichton commented on issue #6743:
I've made a first attempt in https://github.com/bytecodealliance/wasmtime/pull/6805 to move
lower_address
into ISLE
Last updated: Jan 24 2025 at 00:11 UTC