alexcrichton opened issue #9942:
Given this input CLIF:
function u0:0(i32) -> i32 tail { sig0 = (i32) -> i32 tail fn0 = colocated u0:0 sig0 block0(v2: i32): v19 = call fn0(v2) v20 = iadd v2, v19 return v20 }
I'm currently seeing on
main
$ cargo run --features pulley compile --target x86_64 -D ../clif/wasm_func_0.clif --set opt_level=speed .byte 157, 16, 0, 0, 0, 0, 0, 32, 0, 64, 21, 0, 2, 21, 0, 0, 0, 0, 64, 5, 21, 69, 160, 0, 158, 16, 0, 0, 0, 0, 0, 32, 0, 0 Disassembly of 34 bytes <u0:0>: 0: 9d 10 00 00 00 00 00 20 00 push_frame_save 16, x21 9: 40 15 00 xmov x21, x0 c: 02 15 00 00 00 00 call1 x21, 0x0 // target = 0xc 12: 40 05 15 xmov x5, x21 15: 45 a0 00 xadd32 x0, x5, x0 18: 9e 10 00 00 00 00 00 20 00 pop_frame_restore 16, x21 21: 00 ret
Notably instruction at 12,
xmov x5, x21
, I'm not sure why that exists. In theory that should not be there and the subsequentxadd32
should bexadd32 x0, x21, x0
and that would remove the need for the insertion of thexmov
. Thisxmov
is being inserted as part of register allocation as logging shows that the input program is:VCode { Entry block: 0 Block 0([]): (original IR block: block0) Inst 0: args v192=x0 Inst 1: call CallInfo { dest: PulleyCall { name: User(userextname0), args: [XReg(v192)] }, uses: [], defs: [CallRetPair { vreg: Writable { reg: v196 }, preg: p0i }], clobbers: PRegSet { bits: [65534, 65535, 4294967295, 0] }, callee_conv: Tail, caller_conv: Tail, callee_pop_size: 0 } Inst 2: xadd32 v195, v192, v196 Inst 3: rets v195=x0 }
On aarch64 I see:
$ cargo run -q --features pulley compile --target aarch64 -D ../clif/wasm_func_0.clif --set opt_level=speed .byte 253, 123, 191, 169, 253, 3, 0, 145, 248, 15, 31, 248, 248, 3, 2, 170, 0, 0, 0, 148, 2, 3, 2, 11, 248, 7, 65, 248, 253, 123, 193, 168, 192, 3, 95, 214 Disassembly of 36 bytes <u0:0>: 0: fd 7b bf a9 stp x29, x30, [sp, #-0x10]! 4: fd 03 00 91 mov x29, sp 8: f8 0f 1f f8 str x24, [sp, #-0x10]! c: f8 03 02 aa mov x24, x2 10: 00 00 00 94 bl #0x10 14: 02 03 02 0b add w2, w24, w2 18: f8 07 41 f8 ldr x24, [sp], #0x10 1c: fd 7b c1 a8 ldp x29, x30, [sp], #0x10 20: c0 03 5f d6 ret
which does the right thing after the call instruction at 10 and the
add
instruction at 14 doesn't have amov
in front. Additionally for x64 it also looks "correct":$ cargo run -q --features pulley compile --target x86_64 -D ../clif/wasm_func_0.clif --set opt_level=speed .byte 85, 72, 137, 229, 72, 131, 236, 16, 76, 137, 60, 36, 73, 137, 255, 232, 0, 0, 0, 0, 68, 1, 248, 76, 139, 60, 36, 72, 131, 196, 16, 72, 137, 236, 93, 195 Disassembly of 36 bytes <u0:0>: 0: 55 pushq %rbp 1: 48 89 e5 movq %rsp, %rbp 4: 48 83 ec 10 subq $0x10, %rsp 8: 4c 89 3c 24 movq %r15, (%rsp) c: 49 89 ff movq %rdi, %r15 f: e8 00 00 00 00 callq 0x14 14: 44 01 f8 addl %r15d, %eax 17: 4c 8b 3c 24 movq (%rsp), %r15 1b: 48 83 c4 10 addq $0x10, %rsp 1f: 48 89 ec movq %rbp, %rsp 22: 5d popq %rbp 23: c3 retq
Notably the
addl
has no precedingmov
instruction.Is this the Pulley backend perhaps missing something with respect to regalloc metadata?
The generated
get_operands
method for the instruction here is:RawInst::Xadd32 { dst, src1,src2, .. } => { collector.reg_use(src1); collector.reg_use(src2); collector.reg_def(dst); }
which is in theory pretty similar to how x64/aarch64 both work. I'm not sure if this is falling off a regalloc heuristic or a meaningful difference between x64/aarch64 though.
alexcrichton added the pulley label to Issue #9942.
Last updated: Jan 24 2025 at 00:11 UTC