Stream: git-wasmtime

Topic: wasmtime / issue #6301 Lifting load of stack arg into dom...


view this post on Zulip Wasmtime GitHub notifications bot (Apr 27 2023 at 22:04):

fitzgen labeled issue #6301:

I would not expect movq 16(%rbp), %rax to be lifted up into block0, as it is only used in block1 and not block2. This results in a load that is not necessary for all possible executions.

test compile precise-output
target x86_64

function %foo(i32, i32, i32, i32, i32, i32, i32) -> i32 {
block0(v0: i32, v1: i32, v2: i32, v3: i32, v4: i32, v5: i32, v6: i32):
    brif v0, block1, block2

block1:
    return v5

block2:
    return v6
}

; VCode:
;   pushq   %rbp
;   movq    %rsp, %rbp
; block0:
;   movq    %r9, %r11
;   movq    16(%rbp), %rax
;   testl   %edi, %edi
;   jnz     label2; j label1
; block1:
;   movq    %rbp, %rsp
;   popq    %rbp
;   ret
; block2:
;   movq    %r11, %rax
;   movq    %rbp, %rsp
;   popq    %rbp
;   ret
;
; Disassembled:
; block0: ; offset 0x0
;   pushq %rbp
;   movq %rsp, %rbp
; block1: ; offset 0x4
;   movq %r9, %r11
;   movq 0x10(%rbp), %rax
;   testl %edi, %edi
;   jne 0x18
; block2: ; offset 0x13
;   movq %rbp, %rsp
;   popq %rbp
;   retq
; block3: ; offset 0x18
;   movq %r11, %rax
;   movq %rbp, %rsp
;   popq %rbp
;   retq

view this post on Zulip Wasmtime GitHub notifications bot (Apr 27 2023 at 22:04):

fitzgen opened issue #6301:

I would not expect movq 16(%rbp), %rax to be lifted up into block0, as it is only used in block1 and not block2. This results in a load that is not necessary for all possible executions.

test compile precise-output
target x86_64

function %foo(i32, i32, i32, i32, i32, i32, i32) -> i32 {
block0(v0: i32, v1: i32, v2: i32, v3: i32, v4: i32, v5: i32, v6: i32):
    brif v0, block1, block2

block1:
    return v5

block2:
    return v6
}

; VCode:
;   pushq   %rbp
;   movq    %rsp, %rbp
; block0:
;   movq    %r9, %r11
;   movq    16(%rbp), %rax
;   testl   %edi, %edi
;   jnz     label2; j label1
; block1:
;   movq    %rbp, %rsp
;   popq    %rbp
;   ret
; block2:
;   movq    %r11, %rax
;   movq    %rbp, %rsp
;   popq    %rbp
;   ret
;
; Disassembled:
; block0: ; offset 0x0
;   pushq %rbp
;   movq %rsp, %rbp
; block1: ; offset 0x4
;   movq %r9, %r11
;   movq 0x10(%rbp), %rax
;   testl %edi, %edi
;   jne 0x18
; block2: ; offset 0x13
;   movq %rbp, %rsp
;   popq %rbp
;   retq
; block3: ; offset 0x18
;   movq %r11, %rax
;   movq %rbp, %rsp
;   popq %rbp
;   retq

view this post on Zulip Wasmtime GitHub notifications bot (Apr 27 2023 at 22:04):

fitzgen labeled issue #6301:

I would not expect movq 16(%rbp), %rax to be lifted up into block0, as it is only used in block1 and not block2. This results in a load that is not necessary for all possible executions.

test compile precise-output
target x86_64

function %foo(i32, i32, i32, i32, i32, i32, i32) -> i32 {
block0(v0: i32, v1: i32, v2: i32, v3: i32, v4: i32, v5: i32, v6: i32):
    brif v0, block1, block2

block1:
    return v5

block2:
    return v6
}

; VCode:
;   pushq   %rbp
;   movq    %rsp, %rbp
; block0:
;   movq    %r9, %r11
;   movq    16(%rbp), %rax
;   testl   %edi, %edi
;   jnz     label2; j label1
; block1:
;   movq    %rbp, %rsp
;   popq    %rbp
;   ret
; block2:
;   movq    %r11, %rax
;   movq    %rbp, %rsp
;   popq    %rbp
;   ret
;
; Disassembled:
; block0: ; offset 0x0
;   pushq %rbp
;   movq %rsp, %rbp
; block1: ; offset 0x4
;   movq %r9, %r11
;   movq 0x10(%rbp), %rax
;   testl %edi, %edi
;   jne 0x18
; block2: ; offset 0x13
;   movq %rbp, %rsp
;   popq %rbp
;   retq
; block3: ; offset 0x18
;   movq %r11, %rax
;   movq %rbp, %rsp
;   popq %rbp
;   retq

view this post on Zulip Wasmtime GitHub notifications bot (Apr 27 2023 at 22:17):

cfallin commented on issue #6301:

To add more context why this happens: this is actually not a "lifting" (the load was never down in block1 or block2), but rather all args on the stack are loaded into vregs at the start of the function. That happens in this function in the ABI code: Callee::gen_copy_arg_to_regs.

In order to do better, we could constrain the vreg to the location on the stack (FP + offset). That would require some additional plumbing in the constraint machinery: we'd need a way to specify "this offset from FP" as a constraint (rather than "any stack slot") and a third kind of alloc (not indexed stackslot or physical reg, but FP + offset). It would complicate the spillslot allocation code a bit too. But it could be done!


Last updated: Jan 24 2025 at 00:11 UTC