maekawatoshiki opened issue #6742:
Feature
Currently, on aarch64 backend, the following piece of CLIF instructions...
; Equivalent to: int64_t *v9; int64_t v10; v4 = v9[v10]; v1 = iconst.i64 3 v2 = ishl.i64 v10, v1 ; v1 = 3 v3 = iadd v9, v2 v4 = load.i64 v3
... will generate the assembly like below:
adrp x4, 0x780000 ldr x4, [x4] lsl x5, x3, #3 ldr x4, [x4, x5]
However, the assembly can be converted into more efficient one like this:
adrp x4, 0x780000 ldr x4, [x4] ldr x4, [x4, x3, lsl #3]
Benefit
The shorter instruction sequence will help improve the performance.
In fact, this problem was found when I was diffing the assembly generated by cranelift and llvm, where llvm was around 20% faster than cranelift in my case.Implementation
I've walked through the cranelift codebase and figured out that such addressing mode seems to be represented as
AMode::RegScaled
, but not sure how I can teach the code generator to useRegScaled
forldr
.
Editing isle rules or something like that?
Last updated: Nov 22 2024 at 17:03 UTC