Stream: git-wasmtime

Topic: wasmtime / issue #6743 Cranelift: Utilize `AMode::RegScal...


view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2023 at 20:51):

elliottt commented on issue #6743:

@alexcrichton or @cfallin, could you take a look at this?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 20 2023 at 12:14):

alexcrichton commented on issue #6743:

Sure yeah I can help review. Looking over this I can't help but personally feel that a bit of a refactor is in order (preexisting issue, not a new one) as I find it a bit odd to follow. This feels similar to the addressing that historically happened in x64 but I changed awhile back where it was recursive and walked thorugh an entire iadd chain to compute the final amode. To me the problem here is that all add operands are collected into a vector but if the vector is larger than 1 or 2 elements everything falls back to the normal operands and amodes instead.

Instead I liked the idea in the x64 backend where you start with an AMode and then add items to it which might extend the amode. This is the to_amode_add function in the x64 lowering where it's iteratively modified as possible through subsequent "peeling" of add operands. This avoided the x64-specific problem of recursing entirely and blowing the stack on large iadd chains, but I think would be appropriate for aarch64 as well since it avoids collecting large 32/64-bit operand lists and collects only precisely what's needed (in theory).

Now I realize though that the refactoring may be a bit bigger than desired though on this PR. I bring this up though because this PR has a few bugs I'll list below. I think the bugs stem from the fact that currently the only pattern matched is "a bunch of adds" and the pattern matching add here should only match a single shift in a very specific location as opposed to any location.

Multiple shifted operands

function %a(i64, i64, i64) -> i32 {
block0(v0: i64, v1: i64, v2: i64):
  v3 = iconst.i64 3
  v4 = ishl v0, v3
  v5 = ishl v1, v3
  v6 = iadd v4, v5
  v7 = iadd v6, v2
  v8 = load.i32 v7
  return v8
}

yields:

$ cargo run -q compile foo.clif --target aarch64
thread 'main' panicked at 'assertion failed: shifted.is_none()', cranelift/codegen/src/isa/aarch64/lower.rs:303:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

A single shifted operand

function %a(i64) -> i32 {
block0(v0: i64):
  v1 = iconst.i64 3
  v2 = ishl v0, v1
  v3 = load.i32 v2
  return v3
}

yields:

$ cargo run -q compile foo.clif --target aarch64
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `32`,
 right: `64`', cranelift/codegen/src/isa/aarch64/inst/emit.rs:1030:25
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Shift not actually taken into account

function %a(i64, i32, i64) -> i32 {
block0(v0: i64, v1: i32, v2: i64):
  v3 = uextend.i64 v1
  v4 = iadd v0, v3
  v5 = ishl_imm v2, 2
  v6 = iadd v4, v5
  v7 = load.i32 v6
  return v7
}

yields:

$ cargo run -q compile foo.clif --target aarch64 -D
.byte 0, 72, 97, 184, 192, 3, 95, 214

Disassembly of 8 bytes:
   0:   00 48 61 b8             ldr     w0, [x0, w1, uxtw]
   4:   c0 03 5f d6             ret

(note that processing of x2 is missing here)

There's also a number of other cases here where shifted is only taken into account in one case rather than all of them so the fall-through is ignoring shifted in more cases than just this.


Those are some examples which are currently incorrect with this PR, and at least personally I don't see an "easy fix" with a small piece of handling here-or-there to fix these issues. That's why I think that it may be best to refactor this amode computation to be a bit simpler than before (probably in ISLE rather than Rust) and then I think it might be less brittle to add a special case. For example a to_amode_add function for AArch64 could pattern match an existing amode of UnsignedOffset with offset 0 with a shifting operation to produce RegScaled. Once RegScaled is produced, however, no further amode massaging would be done.

I could very well be wrong though and there may be an easy fix to these issues.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 20 2023 at 13:02):

maekawatoshiki commented on issue #6743:

I appreciate your detailed comment.

I totally agree that we need to refactor this amode computation, since I thought so when I first read the code... Also the bugs you mentioned are really helpful (and I should have tested on such cases).

I plan on adding ISLE rules that replaces the existing amode computation.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2023 at 16:52):

maekawatoshiki commented on issue #6743:

I tried adding ISLE rules that replace existing lower_address, while referring to https://github.com/bytecodealliance/wasmtime/pull/5986.
But I figured out that it's hard to be done for me and, for someone, we should open an issue for it :)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2023 at 23:14):

alexcrichton commented on issue #6743:

I've made a first attempt in https://github.com/bytecodealliance/wasmtime/pull/6805 to move lower_address into ISLE


Last updated: Dec 23 2024 at 12:05 UTC