abrown commented on Issue #2489:
As mentioned on Zulip, this runs into issues. When I run
RUST_LOG=cranelift_codegen=debug cargo run --features experimental_x64 -- wast --disable-cache --enable-simd scratch.wast
on:(module (memory 1) (data (i32.const 0) "\00\01\02\03\04\05\06\07\08\09\0A\0B\0C\0D\0E\0F\80\81\82\83\84\85\86\87\88\89") (data (i32.const 65520) "\0A\0B\0C\0D\0E\0F\80\81\82\83\84\85\86\87\88\89") (func (export "v128.load32_zero") (param $0 i32) (result v128) (v128.load32_zero (local.get $0)) ) ) (assert_return (invoke "v128.load32_zero" (i32.const 0)) (v128.const i32x4 0x03020100 0x00000000 0x00000000 0x00000000))
The test fails with:
Error: failed to run script file 'scratch.wast' Caused by: 0: failed directive on scratch.wast:9:1 1: expected V128(I32x4([50462976, 0, 0, 0])) (0x00000000000000000000000000010203), got V128(2595974132930443020774722140989752) (0x00007ffdccd1730000007ffdccd16938)
After lowering, the
MOVSS
is present (as it should be):DEBUG cranelift_codegen::machinst::compile > vcode from lowering: VCode_ShowWithRRU {{ Entry block: 0 Block 0: (original IR block: block0) (successor: Block 1) (instruction range: 0 .. 9) Inst 0: movq %rdi, %v0J Inst 1: movq %rsi, %v1J Inst 2: movq %rdx, %v2J Inst 3: movl %v2Jl, %v3Jl Inst 4: movq 28(%v0J), %v4J Inst 5: movl 0(%v4J,%v3J,1), %v6Jl Inst 6: movss %v6J, %v7V Inst 7: movdqa %v7V, %v8V Inst 8: jmp label1 Block 1: (original IR block: block1) (instruction range: 9 .. 12) Inst 9: movdqa %v8V, %v9V Inst 10: movdqa %v9V, %xmm0 Inst 11: ret }}
But after regalloc it is gone!
DEBUG cranelift_codegen::machinst::compile > vcode after regalloc: final version: VCode_ShowWithRRU {{ Entry block: 0 Block 0: (original IR block: block0) (successor: Block 1) (instruction range: 0 .. 6) Inst 0: pushq %rbp Inst 1: movq %rsp, %rbp Inst 2: movl %edx, %esi Inst 3: movq 28(%rdi), %rdi Inst 4: movl 0(%rdi,%rsi,1), %xmm0 Inst 5: jmp label1 Block 1: (original IR block: block1) (instruction range: 6 .. 9) Inst 6: movq %rbp, %rsp Inst 7: popq %rbp Inst 8: re
abrown commented on Issue #2489:
Weirdly, a CLIF filetest like the following will produce the expected code:
test compile set enable_simd target x86_64 has_ssse3 has_sse41 feature "experimental_x64" function %load32_zero(i64) -> i32x4 { block0(v0: i64): v1 = load.i32 v0 v2 = scalar_to_vector.i32x4 v1 ; check: movss 0(%rdi), %xmm0 return v2 }
When I run that with
cargo run --features experimental_x64 -p cranelift-tools -- compile --target="x86_64" -dDp scratch.clif
I see the following before regalloc:DEBUG cranelift_codegen::machinst::compile > vcode from lowering: VCode_ShowWithRRU {{ Entry block: 0 Block 0: (original IR block: block0) (instruction range: 0 .. 5) Inst 0: movq %rdi, %v0J Inst 1: movss 0(%v0J), %v2V Inst 2: movdqa %v2V, %v3V Inst 3: movdqa %v3V, %xmm0 Inst 4: ret }}
And after, the correct
MOVSS
:DEBUG cranelift_codegen::machinst::compile > vcode after regalloc: final version: VCode_ShowWithRRU {{ Entry block: 0 Block 0: (original IR block: block0) (instruction range: 0 .. 6) Inst 0: pushq %rbp Inst 1: movq %rsp, %rbp Inst 2: movss 0(%rdi), %xmm0 Inst 3: movq %rbp, %rsp Inst 4: popq %rbp Inst 5: ret }}
abrown commented on Issue #2489:
@cfallin, I am bringing this out of draft since:
- it seems like we are still merging x64 SIMD code despite the tests being disabled (I checked locally and this passes the tests)
- I think this side-steps most of the stuff we found in #2545, though we should still fix that separately.
cfallin commented on Issue #2489:
Will review soon; re: merging SIMD, yes, at least my position right now is that we should be able to keep making progress on SIMD despite the CI heisenbug and disabled tests; when we resolve that bug and turn the tests back on, we'll see any issues and be able to trace them back to one of N PRs merged recently. IMHO that's better than holding all SIMD work until we track down the issue, given that it's proven elusive so far.
cfallin commented on Issue #2489:
@abrown I think this broke the build on
main
as it went in after the multi-reg refactor but tests had run before it.
Last updated: Jan 24 2025 at 00:11 UTC