abrown opened issue #3744:
.clif
Test Casetest compile target x86_64 function %select_eq_f32(f32, f32) -> i32 { block0(v0: f32, v1: f32): v2 = fcmp eq v0, v1 v3 = iconst.i32 1 v4 = iconst.i32 0 v5 = select v2, v3, v4 return v5 }
Steps to Reproduce
RUST_LOG=cranelift_codegen::machinst::compile=trace cargo run -p cranelift-tools -- compile -dDp --target x86_64 scratch.clif
Expected Results
No extra
movl
instruction after theucomiss
comparison.Actual Results
There is no need for instruction 7,
movl %edi, %edi
, but for some reason it is not being elided.TRACE cranelift_codegen::machinst::compile > vcode after regalloc: final version: VCode_ShowWithRRU {{ Entry block: 0 Block 0: (original IR block: block0) (instruction range: 0 .. 13) Inst 0: pushq %rbp Inst 1: unwind PushFrameRegs { offset_upward_to_caller_sp: 16 } Inst 2: movq %rsp, %rbp Inst 3: unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 } Inst 4: movl $1, %edi Inst 5: xorl %esi, %esi Inst 6: ucomiss %xmm0, %xmm1 Inst 7: movl %edi, %edi Inst 8: cmovnzl %esi, %edi; cmovpl %esi, %edi Inst 9: movq %rdi, %rax Inst 10: movq %rbp, %rsp Inst 11: popq %rbp Inst 12: ret }}
Versions and Environment
Cranelift version or commit: this redundant instruction is present in the ISLE changes to
select
currently under review at https://github.com/bytecodealliance/wasmtime/pull/3682.
abrown labeled issue #3744:
.clif
Test Casetest compile target x86_64 function %select_eq_f32(f32, f32) -> i32 { block0(v0: f32, v1: f32): v2 = fcmp eq v0, v1 v3 = iconst.i32 1 v4 = iconst.i32 0 v5 = select v2, v3, v4 return v5 }
Steps to Reproduce
RUST_LOG=cranelift_codegen::machinst::compile=trace cargo run -p cranelift-tools -- compile -dDp --target x86_64 scratch.clif
Expected Results
No extra
movl
instruction after theucomiss
comparison.Actual Results
There is no need for instruction 7,
movl %edi, %edi
, but for some reason it is not being elided.TRACE cranelift_codegen::machinst::compile > vcode after regalloc: final version: VCode_ShowWithRRU {{ Entry block: 0 Block 0: (original IR block: block0) (instruction range: 0 .. 13) Inst 0: pushq %rbp Inst 1: unwind PushFrameRegs { offset_upward_to_caller_sp: 16 } Inst 2: movq %rsp, %rbp Inst 3: unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 } Inst 4: movl $1, %edi Inst 5: xorl %esi, %esi Inst 6: ucomiss %xmm0, %xmm1 Inst 7: movl %edi, %edi Inst 8: cmovnzl %esi, %edi; cmovpl %esi, %edi Inst 9: movq %rdi, %rax Inst 10: movq %rbp, %rsp Inst 11: popq %rbp Inst 12: ret }}
Versions and Environment
Cranelift version or commit: this redundant instruction is present in the ISLE changes to
select
currently under review at https://github.com/bytecodealliance/wasmtime/pull/3682.
abrown labeled issue #3744:
.clif
Test Casetest compile target x86_64 function %select_eq_f32(f32, f32) -> i32 { block0(v0: f32, v1: f32): v2 = fcmp eq v0, v1 v3 = iconst.i32 1 v4 = iconst.i32 0 v5 = select v2, v3, v4 return v5 }
Steps to Reproduce
RUST_LOG=cranelift_codegen::machinst::compile=trace cargo run -p cranelift-tools -- compile -dDp --target x86_64 scratch.clif
Expected Results
No extra
movl
instruction after theucomiss
comparison.Actual Results
There is no need for instruction 7,
movl %edi, %edi
, but for some reason it is not being elided.TRACE cranelift_codegen::machinst::compile > vcode after regalloc: final version: VCode_ShowWithRRU {{ Entry block: 0 Block 0: (original IR block: block0) (instruction range: 0 .. 13) Inst 0: pushq %rbp Inst 1: unwind PushFrameRegs { offset_upward_to_caller_sp: 16 } Inst 2: movq %rsp, %rbp Inst 3: unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 } Inst 4: movl $1, %edi Inst 5: xorl %esi, %esi Inst 6: ucomiss %xmm0, %xmm1 Inst 7: movl %edi, %edi Inst 8: cmovnzl %esi, %edi; cmovpl %esi, %edi Inst 9: movq %rdi, %rax Inst 10: movq %rbp, %rsp Inst 11: popq %rbp Inst 12: ret }}
Versions and Environment
Cranelift version or commit: this redundant instruction is present in the ISLE changes to
select
currently under review at https://github.com/bytecodealliance/wasmtime/pull/3682.
fitzgen commented on issue #3744:
As we discussed, this is
regalloc
failing to coalesce/elide themov
in this case. Funny because it is the "easiest" kind of move to elide. Should be fixed with regalloc2, or at least once we switch to regalloc2 we can make sure that it handles this and update regalloc2 if it doesn't.
alexcrichton commented on issue #3744:
The current output is:
Disassembly of 28 bytes: 0: 55 push rbp 1: 48 89 e5 mov rbp, rsp 4: 45 31 c0 xor r8d, r8d 7: b8 01 00 00 00 mov eax, 1 c: 0f 2e c8 ucomiss xmm1, xmm0 f: 41 0f 45 c0 cmovne eax, r8d 13: 41 0f 4a c0 cmovp eax, r8d 17: 48 89 ec mov rsp, rbp 1a: 5d pop rbp 1b: c3 ret
which I think means that this is fixed. Is this something we'll want a test for though before closing?
abrown commented on issue #3744:
Yeah, I guess we could add a CLIF compile test for this but I think that if we use the debug output it could be unwittingly overwritten in a
BLESS
ed run and this issue could resurface. We can still usetest compile
instead oftest compile precise-output
, I think.
cfallin commented on issue #3744:
if we use the debug output it could be unwittingly overwritten in a BLESSed run and this issue could resurface
Indeed, this was one of my concerns with the auto-blessing (though to be clear it's a big convenience improvement overall, so I don't think it was a mistake). I think the right answer is probably to use non-autoblessed tests for cases where we're asserting a certain property about the output, and possibly be a bit looser in the matching on details we don't care about (prologue/epilogue, specific registers), then reduce the number of precise-output tests overall (and make heavier use of run-tests).
abrown commented on issue #3744:
I can add a
test compile
later today if no one else wants to take this?
cfallin commented on issue #3744:
Go for it, happy to review, and thank you!
abrown closed issue #3744:
.clif
Test Casetest compile target x86_64 function %select_eq_f32(f32, f32) -> i32 { block0(v0: f32, v1: f32): v2 = fcmp eq v0, v1 v3 = iconst.i32 1 v4 = iconst.i32 0 v5 = select v2, v3, v4 return v5 }
Steps to Reproduce
RUST_LOG=cranelift_codegen::machinst::compile=trace cargo run -p cranelift-tools -- compile -dDp --target x86_64 scratch.clif
Expected Results
No extra
movl
instruction after theucomiss
comparison.Actual Results
There is no need for instruction 7,
movl %edi, %edi
, but for some reason it is not being elided.TRACE cranelift_codegen::machinst::compile > vcode after regalloc: final version: VCode_ShowWithRRU {{ Entry block: 0 Block 0: (original IR block: block0) (instruction range: 0 .. 13) Inst 0: pushq %rbp Inst 1: unwind PushFrameRegs { offset_upward_to_caller_sp: 16 } Inst 2: movq %rsp, %rbp Inst 3: unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 } Inst 4: movl $1, %edi Inst 5: xorl %esi, %esi Inst 6: ucomiss %xmm0, %xmm1 Inst 7: movl %edi, %edi Inst 8: cmovnzl %esi, %edi; cmovpl %esi, %edi Inst 9: movq %rdi, %rax Inst 10: movq %rbp, %rsp Inst 11: popq %rbp Inst 12: ret }}
Versions and Environment
Cranelift version or commit: this redundant instruction is present in the ISLE changes to
select
currently under review at https://github.com/bytecodealliance/wasmtime/pull/3682.
Last updated: Nov 22 2024 at 16:03 UTC