Stream: git-wasmtime

Topic: wasmtime / issue #3744 Cranelift: `mov` not removed when ...


view this post on Zulip Wasmtime GitHub notifications bot (Jan 29 2022 at 01:43):

abrown opened issue #3744:

.clif Test Case

test compile
target x86_64

function %select_eq_f32(f32, f32) -> i32 {
block0(v0: f32, v1: f32):
    v2 = fcmp eq v0, v1
    v3 = iconst.i32 1
    v4 = iconst.i32 0
    v5 = select v2, v3, v4
    return v5
}

Steps to Reproduce

RUST_LOG=cranelift_codegen::machinst::compile=trace cargo run -p cranelift-tools -- compile -dDp --target x86_64 scratch.clif

Expected Results

No extra movl instruction after the ucomiss comparison.

Actual Results

There is no need for instruction 7, movl %edi, %edi, but for some reason it is not being elided.

 TRACE cranelift_codegen::machinst::compile > vcode after regalloc: final version:
VCode_ShowWithRRU {{
  Entry block: 0
Block 0:
  (original IR block: block0)
  (instruction range: 0 .. 13)
  Inst 0:   pushq   %rbp
  Inst 1:   unwind PushFrameRegs { offset_upward_to_caller_sp: 16 }
  Inst 2:   movq    %rsp, %rbp
  Inst 3:   unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 }
  Inst 4:   movl    $1, %edi
  Inst 5:   xorl    %esi, %esi
  Inst 6:   ucomiss %xmm0, %xmm1
  Inst 7:   movl    %edi, %edi
  Inst 8:   cmovnzl %esi, %edi; cmovpl  %esi, %edi
  Inst 9:   movq    %rdi, %rax
  Inst 10:   movq    %rbp, %rsp
  Inst 11:   popq    %rbp
  Inst 12:   ret
}}

Versions and Environment

Cranelift version or commit: this redundant instruction is present in the ISLE changes to select currently under review at https://github.com/bytecodealliance/wasmtime/pull/3682.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 29 2022 at 01:43):

abrown labeled issue #3744:

.clif Test Case

test compile
target x86_64

function %select_eq_f32(f32, f32) -> i32 {
block0(v0: f32, v1: f32):
    v2 = fcmp eq v0, v1
    v3 = iconst.i32 1
    v4 = iconst.i32 0
    v5 = select v2, v3, v4
    return v5
}

Steps to Reproduce

RUST_LOG=cranelift_codegen::machinst::compile=trace cargo run -p cranelift-tools -- compile -dDp --target x86_64 scratch.clif

Expected Results

No extra movl instruction after the ucomiss comparison.

Actual Results

There is no need for instruction 7, movl %edi, %edi, but for some reason it is not being elided.

 TRACE cranelift_codegen::machinst::compile > vcode after regalloc: final version:
VCode_ShowWithRRU {{
  Entry block: 0
Block 0:
  (original IR block: block0)
  (instruction range: 0 .. 13)
  Inst 0:   pushq   %rbp
  Inst 1:   unwind PushFrameRegs { offset_upward_to_caller_sp: 16 }
  Inst 2:   movq    %rsp, %rbp
  Inst 3:   unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 }
  Inst 4:   movl    $1, %edi
  Inst 5:   xorl    %esi, %esi
  Inst 6:   ucomiss %xmm0, %xmm1
  Inst 7:   movl    %edi, %edi
  Inst 8:   cmovnzl %esi, %edi; cmovpl  %esi, %edi
  Inst 9:   movq    %rdi, %rax
  Inst 10:   movq    %rbp, %rsp
  Inst 11:   popq    %rbp
  Inst 12:   ret
}}

Versions and Environment

Cranelift version or commit: this redundant instruction is present in the ISLE changes to select currently under review at https://github.com/bytecodealliance/wasmtime/pull/3682.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 29 2022 at 01:43):

abrown labeled issue #3744:

.clif Test Case

test compile
target x86_64

function %select_eq_f32(f32, f32) -> i32 {
block0(v0: f32, v1: f32):
    v2 = fcmp eq v0, v1
    v3 = iconst.i32 1
    v4 = iconst.i32 0
    v5 = select v2, v3, v4
    return v5
}

Steps to Reproduce

RUST_LOG=cranelift_codegen::machinst::compile=trace cargo run -p cranelift-tools -- compile -dDp --target x86_64 scratch.clif

Expected Results

No extra movl instruction after the ucomiss comparison.

Actual Results

There is no need for instruction 7, movl %edi, %edi, but for some reason it is not being elided.

 TRACE cranelift_codegen::machinst::compile > vcode after regalloc: final version:
VCode_ShowWithRRU {{
  Entry block: 0
Block 0:
  (original IR block: block0)
  (instruction range: 0 .. 13)
  Inst 0:   pushq   %rbp
  Inst 1:   unwind PushFrameRegs { offset_upward_to_caller_sp: 16 }
  Inst 2:   movq    %rsp, %rbp
  Inst 3:   unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 }
  Inst 4:   movl    $1, %edi
  Inst 5:   xorl    %esi, %esi
  Inst 6:   ucomiss %xmm0, %xmm1
  Inst 7:   movl    %edi, %edi
  Inst 8:   cmovnzl %esi, %edi; cmovpl  %esi, %edi
  Inst 9:   movq    %rdi, %rax
  Inst 10:   movq    %rbp, %rsp
  Inst 11:   popq    %rbp
  Inst 12:   ret
}}

Versions and Environment

Cranelift version or commit: this redundant instruction is present in the ISLE changes to select currently under review at https://github.com/bytecodealliance/wasmtime/pull/3682.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 31 2022 at 18:08):

fitzgen commented on issue #3744:

As we discussed, this is regalloc failing to coalesce/elide the mov in this case. Funny because it is the "easiest" kind of move to elide. Should be fixed with regalloc2, or at least once we switch to regalloc2 we can make sure that it handles this and update regalloc2 if it doesn't.

view this post on Zulip Wasmtime GitHub notifications bot (May 03 2022 at 15:44):

alexcrichton commented on issue #3744:

The current output is:

Disassembly of 28 bytes:
   0:   55                      push    rbp
   1:   48 89 e5                mov     rbp, rsp
   4:   45 31 c0                xor     r8d, r8d
   7:   b8 01 00 00 00          mov     eax, 1
   c:   0f 2e c8                ucomiss xmm1, xmm0
   f:   41 0f 45 c0             cmovne  eax, r8d
  13:   41 0f 4a c0             cmovp   eax, r8d
  17:   48 89 ec                mov     rsp, rbp
  1a:   5d                      pop     rbp
  1b:   c3                      ret

which I think means that this is fixed. Is this something we'll want a test for though before closing?

view this post on Zulip Wasmtime GitHub notifications bot (May 03 2022 at 16:27):

abrown commented on issue #3744:

Yeah, I guess we could add a CLIF compile test for this but I think that if we use the debug output it could be unwittingly overwritten in a BLESSed run and this issue could resurface. We can still use test compile instead of test compile precise-output, I think.

view this post on Zulip Wasmtime GitHub notifications bot (May 03 2022 at 16:35):

cfallin commented on issue #3744:

if we use the debug output it could be unwittingly overwritten in a BLESSed run and this issue could resurface

Indeed, this was one of my concerns with the auto-blessing (though to be clear it's a big convenience improvement overall, so I don't think it was a mistake). I think the right answer is probably to use non-autoblessed tests for cases where we're asserting a certain property about the output, and possibly be a bit looser in the matching on details we don't care about (prologue/epilogue, specific registers), then reduce the number of precise-output tests overall (and make heavier use of run-tests).

view this post on Zulip Wasmtime GitHub notifications bot (May 03 2022 at 16:40):

abrown commented on issue #3744:

I can add a test compile later today if no one else wants to take this?

view this post on Zulip Wasmtime GitHub notifications bot (May 03 2022 at 16:49):

cfallin commented on issue #3744:

Go for it, happy to review, and thank you!

view this post on Zulip Wasmtime GitHub notifications bot (May 03 2022 at 22:04):

abrown closed issue #3744:

.clif Test Case

test compile
target x86_64

function %select_eq_f32(f32, f32) -> i32 {
block0(v0: f32, v1: f32):
    v2 = fcmp eq v0, v1
    v3 = iconst.i32 1
    v4 = iconst.i32 0
    v5 = select v2, v3, v4
    return v5
}

Steps to Reproduce

RUST_LOG=cranelift_codegen::machinst::compile=trace cargo run -p cranelift-tools -- compile -dDp --target x86_64 scratch.clif

Expected Results

No extra movl instruction after the ucomiss comparison.

Actual Results

There is no need for instruction 7, movl %edi, %edi, but for some reason it is not being elided.

 TRACE cranelift_codegen::machinst::compile > vcode after regalloc: final version:
VCode_ShowWithRRU {{
  Entry block: 0
Block 0:
  (original IR block: block0)
  (instruction range: 0 .. 13)
  Inst 0:   pushq   %rbp
  Inst 1:   unwind PushFrameRegs { offset_upward_to_caller_sp: 16 }
  Inst 2:   movq    %rsp, %rbp
  Inst 3:   unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 }
  Inst 4:   movl    $1, %edi
  Inst 5:   xorl    %esi, %esi
  Inst 6:   ucomiss %xmm0, %xmm1
  Inst 7:   movl    %edi, %edi
  Inst 8:   cmovnzl %esi, %edi; cmovpl  %esi, %edi
  Inst 9:   movq    %rdi, %rax
  Inst 10:   movq    %rbp, %rsp
  Inst 11:   popq    %rbp
  Inst 12:   ret
}}

Versions and Environment

Cranelift version or commit: this redundant instruction is present in the ISLE changes to select currently under review at https://github.com/bytecodealliance/wasmtime/pull/3682.


Last updated: Jan 24 2025 at 00:11 UTC