Stream: git-wasmtime

Topic: wasmtime / issue #6742 Cranelift: Generate load/store wit...


view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2023 at 04:38):

maekawatoshiki opened issue #6742:

Feature

Currently, on aarch64 backend, the following piece of CLIF instructions...

; Equivalent to: int64_t *v9; int64_t v10; v4 = v9[v10];
v1 = iconst.i64 3
v2 = ishl.i64 v10, v1  ; v1 = 3
v3 = iadd v9, v2
v4 = load.i64 v3

... will generate the assembly like below:

adrp    x4, 0x780000
ldr     x4, [x4]
lsl     x5, x3, #3
ldr     x4, [x4, x5]

However, the assembly can be converted into more efficient one like this:

adrp    x4, 0x780000
ldr     x4, [x4]
ldr     x4, [x4, x3, lsl #3]

Benefit

The shorter instruction sequence will help improve the performance.
In fact, this problem was found when I was diffing the assembly generated by cranelift and llvm, where llvm was around 20% faster than cranelift in my case.

Implementation

I've walked through the cranelift codebase and figured out that such addressing mode seems to be represented as AMode::RegScaled, but not sure how I can teach the code generator to use RegScaled for ldr.
Editing isle rules or something like that?


Last updated: Dec 23 2024 at 12:05 UTC