fitzgen labeled issue #6301:
I would not expect
movq 16(%rbp), %rax
to be lifted up intoblock0
, as it is only used inblock1
and notblock2
. This results in a load that is not necessary for all possible executions.test compile precise-output target x86_64 function %foo(i32, i32, i32, i32, i32, i32, i32) -> i32 { block0(v0: i32, v1: i32, v2: i32, v3: i32, v4: i32, v5: i32, v6: i32): brif v0, block1, block2 block1: return v5 block2: return v6 } ; VCode: ; pushq %rbp ; movq %rsp, %rbp ; block0: ; movq %r9, %r11 ; movq 16(%rbp), %rax ; testl %edi, %edi ; jnz label2; j label1 ; block1: ; movq %rbp, %rsp ; popq %rbp ; ret ; block2: ; movq %r11, %rax ; movq %rbp, %rsp ; popq %rbp ; ret ; ; Disassembled: ; block0: ; offset 0x0 ; pushq %rbp ; movq %rsp, %rbp ; block1: ; offset 0x4 ; movq %r9, %r11 ; movq 0x10(%rbp), %rax ; testl %edi, %edi ; jne 0x18 ; block2: ; offset 0x13 ; movq %rbp, %rsp ; popq %rbp ; retq ; block3: ; offset 0x18 ; movq %r11, %rax ; movq %rbp, %rsp ; popq %rbp ; retq
fitzgen opened issue #6301:
I would not expect
movq 16(%rbp), %rax
to be lifted up intoblock0
, as it is only used inblock1
and notblock2
. This results in a load that is not necessary for all possible executions.test compile precise-output target x86_64 function %foo(i32, i32, i32, i32, i32, i32, i32) -> i32 { block0(v0: i32, v1: i32, v2: i32, v3: i32, v4: i32, v5: i32, v6: i32): brif v0, block1, block2 block1: return v5 block2: return v6 } ; VCode: ; pushq %rbp ; movq %rsp, %rbp ; block0: ; movq %r9, %r11 ; movq 16(%rbp), %rax ; testl %edi, %edi ; jnz label2; j label1 ; block1: ; movq %rbp, %rsp ; popq %rbp ; ret ; block2: ; movq %r11, %rax ; movq %rbp, %rsp ; popq %rbp ; ret ; ; Disassembled: ; block0: ; offset 0x0 ; pushq %rbp ; movq %rsp, %rbp ; block1: ; offset 0x4 ; movq %r9, %r11 ; movq 0x10(%rbp), %rax ; testl %edi, %edi ; jne 0x18 ; block2: ; offset 0x13 ; movq %rbp, %rsp ; popq %rbp ; retq ; block3: ; offset 0x18 ; movq %r11, %rax ; movq %rbp, %rsp ; popq %rbp ; retq
fitzgen labeled issue #6301:
I would not expect
movq 16(%rbp), %rax
to be lifted up intoblock0
, as it is only used inblock1
and notblock2
. This results in a load that is not necessary for all possible executions.test compile precise-output target x86_64 function %foo(i32, i32, i32, i32, i32, i32, i32) -> i32 { block0(v0: i32, v1: i32, v2: i32, v3: i32, v4: i32, v5: i32, v6: i32): brif v0, block1, block2 block1: return v5 block2: return v6 } ; VCode: ; pushq %rbp ; movq %rsp, %rbp ; block0: ; movq %r9, %r11 ; movq 16(%rbp), %rax ; testl %edi, %edi ; jnz label2; j label1 ; block1: ; movq %rbp, %rsp ; popq %rbp ; ret ; block2: ; movq %r11, %rax ; movq %rbp, %rsp ; popq %rbp ; ret ; ; Disassembled: ; block0: ; offset 0x0 ; pushq %rbp ; movq %rsp, %rbp ; block1: ; offset 0x4 ; movq %r9, %r11 ; movq 0x10(%rbp), %rax ; testl %edi, %edi ; jne 0x18 ; block2: ; offset 0x13 ; movq %rbp, %rsp ; popq %rbp ; retq ; block3: ; offset 0x18 ; movq %r11, %rax ; movq %rbp, %rsp ; popq %rbp ; retq
cfallin commented on issue #6301:
To add more context why this happens: this is actually not a "lifting" (the load was never down in block1 or block2), but rather all args on the stack are loaded into vregs at the start of the function. That happens in this function in the ABI code:
Callee::gen_copy_arg_to_regs
.In order to do better, we could constrain the vreg to the location on the stack (FP + offset). That would require some additional plumbing in the constraint machinery: we'd need a way to specify "this offset from FP" as a constraint (rather than "any stack slot") and a third kind of alloc (not indexed stackslot or physical reg, but FP + offset). It would complicate the spillslot allocation code a bit too. But it could be done!
Last updated: Nov 22 2024 at 16:03 UTC