iwanders added the cranelift label to Issue #10118.
iwanders added the bug label to Issue #10118.
iwanders opened issue #10118:
.clif
Test CaseThe Cranelift IR
average
example code from the docs/ir.md. I've already added that to the filetests test directory in caeafe27c69804e14c10d28d25ce511488afd212.Using
clif-util
;../target/debug/clif-util bugpoint filetests/filetests/isa/x64/average.clif x86_64 After pass 0, remaining insts/blocks: 22/3 (will keep reducing) After pass 1, remaining insts/blocks: 13/3 (will keep reducing) After pass 2, remaining insts/blocks: 13/3 (stop reducing) ████████████████████████████████████████████████████████████ pass 2 phase merge blocks 2/ 2 Remove unused global values Crash message: assertion `left == right` failed left: types::I32 right: types::I64 function %average() -> f32 system_v { ss0 = explicit_slot 8 block1: v0 = iconst.i32 0 v1 = iconst.i32 0 v5 = iconst.i32 0 v6 = iadd v0, v5 ; v0 = 0, v5 = 0 v9 = f64const 0.0 v100 = f32const +NaN brif v1, block2, block5 ; v1 = 0 block2: v7 = load.f32 v6 v8 = fpromote.f64 v7 v10 = fadd v8, v9 ; v9 = 0.0 stack_store v10, ss0 trap user1 block5: return v100 ; v100 = +NaN } 5 blocks 22 insts -> 3 blocks 13 insts
Manually, I reduced it to the following:
function %average(i32, i32) -> f32 system_v { ss0 = explicit_slot 8 ; Stack slot for `sum` block1(v0: i32, v1: i32): v20 = f64const 0x0.0 ; Create a f64 as 0.0, just to initialise ss0. stack_store v20, ss0 ; Store f64 into the ss0 stack slot, just to initialise ss0. v6 = iadd v0, v1 ; Adds the input together as integers, output is i32? v7 = load.f32 v6 ; Converts v7 to f32 from i32. v8 = fpromote.f64 v7 ; converts v7 to f64 into v8. v9 = stack_load.f64 ss0 ; Loads ss0 from the stack into v9, v9 is now f64 that's 0.0 v10 = fadd.f64 v8, v9 ; Adds v8 and v9 together, both are f64's, so v10 is an f64. stack_store v10, ss0 ; Write the 8 bytes back to ss0. v100 = f32const +NaN ; Just creating a dummy to return. return v100 ; Return the dummy. }
Instructions themselves seem fine to me, but today is the first day I'm looking at Cranelift at all, so I absolutely may be missing something obvious.
Steps to Reproduce
- Cherrypick caeafe27c69804e14c10d28d25ce511488afd212
cd cranelift
cargo t
(note without--release
, test will pass with--release
).Note, the problem is with
x86_64
, this function compiles fine withaarch64
.Expected Results
I would expect the example function provided in the documentation to compile, it compiles on
aarch64
, which (to me at least) indicates the function is fine. I would expect this to also compile onx86_64
.Actual Results
A debug assert triggers on this line;
<details>
<summary>Backtrace</summary>---- filetests stdout ---- thread 'worker #7' panicked at cranelift/codegen/src/isa/x64/lower.rs:233:9: assertion `left == right` failed left: types::I32 right: types::I64 stack backtrace: 0: rust_begin_unwind at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:665:5 1: core::panicking::panic_fmt at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:76:14 2: core::panicking::assert_failed_inner 3: core::panicking::assert_failed at /home/ivor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panicking.rs:373:5 4: cranelift_codegen::isa::x64::lower::lower_to_amode at ./codegen/src/isa/x64/lower.rs:233:9 5: cranelift_codegen::isa::x64::lower::isle::<impl cranelift_codegen::isa::x64::lower::isle::generated_code::Context for cranelift_codegen::machinst::isle::IsleContext<cranelift_codegen::isa::x64::lower::isle::generated_code::MInst,cranelift_codegen::isa::x64::X64Backend>>::sink_load at ./codegen/src/isa/x64/lower/isle.rs:313:20 6: cranelift_codegen::isa::x64::lower::isle::<impl cranelift_codegen::isa::x64::lower::isle::generated_code::Context for cranelift_codegen::machinst::isle::IsleContext<cranelift_codegen::isa::x64::lower::isle::generated_code::MInst,cranelift_codegen::isa::x64::X64Backend>>::put_in_reg_mem at ./codegen/src/isa/x64/lower/isle.rs:148:23 7: cranelift_codegen::isa::x64::lower::isle::<impl cranelift_codegen::isa::x64::lower::isle::generated_code::Context for cranelift_codegen::machinst::isle::IsleContext<cranelift_codegen::isa::x64::lower::isle::generated_code::MInst,cranelift_codegen::isa::x64::X64Backend>>::put_in_xmm_mem at ./codegen/src/isa/x64/lower/isle.rs:132:28 8: cranelift_codegen::isa::x64::lower::isle::generated_code::constructor_lower at /home/ivor/Documents/Code/rust/wasmtime/wasmtime/target/debug/build/cranelift-codegen-8f3d42bba10fabf3/out/isle_x64.rs:21098:42 9: cranelift_codegen::isa::x64::lower::isle::lower at ./codegen/src/isa/x64/lower/isle.rs:55:5 10: cranelift_codegen::isa::x64::lower::<impl cranelift_codegen::machinst::lower::LowerBackend for cranelift_codegen::isa::x64::X64Backend>::lower at ./codegen/src/isa/x64/lower.rs:311:9 11: cranelift_codegen::machinst::lower::Lower<I>::lower_clif_block at ./codegen/src/machinst/lower.rs:679:39 12: cranelift_codegen::machinst::lower::Lower<I>::lower at ./codegen/src/machinst/lower.rs:1020:17 13: cranelift_codegen::machinst::compile::compile at ./codegen/src/machinst/compile.rs:42:9 14: cranelift_codegen::isa::x64::X64Backend::compile_vcode at ./codegen/src/isa/x64/mod.rs:62:9 15: <cranelift_codegen::isa::x64::X64Backend as cranelift_codegen::isa::TargetIsa>::compile_function at ./codegen/src/isa/x64/mod.rs:74:40 16: cranelift_codegen::context::Context::compile_stencil at ./codegen/src/context.rs:138:9 17: cranelift_codegen::context::Context::compile at ./codegen/src/context.rs:204:23 18: <cranelift_filetests::test_compile::TestCompile as cranelift_filetests::subtest::SubTest>::run at ./filetests/src/test_compile.rs:59:29 19: cranelift_filetests::subtest::SubTest::run_target at ./filetests/src/subtest.rs:101:13 20: cranelift_filetests::runone::run at ./filetests/src/runone.rs:97:9 21: cranelift_filetests::concurrent::worker_thread::{{closure}}::{{closure}} at ./filetests/src/concurrent.rs:149:46 22: std::panicking::try::do_call at /home/ivor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:557:40 23: __rust_try 24: std::panicking::try at /home/ivor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:520:19 25: std::panic::catch_unwind at /home/ivor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:358:14 26: cranelift_filetests::concurrent::worker_thread::{{closure}} at ./filetests/src/concurrent.rs:149:30 note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
</details>
The relevant section from
isle_x64.rs
;&Opcode::Fpromote => { let v1 = C::first_result(ctx, arg0); if let Some(v2) = v1 { let v3 = C::value_type(ctx, v2); if v3 == F64 { let v1809 = constructor_xmm_zero(ctx, F64X2); let v1804 = &C::put_in_xmm_mem(ctx, v520); // <- Line isle_x64.rs:21098:42 let v1819 = constructor_x64_cvtss2sd(ctx, v1809, v1804); let v1820 = constructor_output_xmm(ctx, v1819); let v1821 = Some(v1820); // Rule at src/isa/x64/lower.isle line 2661. return v1821; } } }
Which seems to indicate the problem is originating from
Fpromote
.Versions and Environment
Cranelift version or commit: Current released;
0.116.1
, also tested on current main; 5dfccc078bb3dcad152d37013dde8067f197e6e7.Operating system: Ubuntu, stable rustc 1.84.0 (9fc6b4312 2025-01-07)
Architecture: x86_64
Extra Info
If I use a release build, target x86_64 and write the output to an object file using
cranelift_object::ObjectModule
. The provided symbol does work and correctly calculates averages.<details>
<summary>Disassembly of object and usage of it to calculate average</summary>$ objdump -d average.o average.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <average>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 83 ec 10 sub $0x10,%rsp 8: 66 0f 57 e4 xorpd %xmm4,%xmm4 c: 48 8d 04 24 lea (%rsp),%rax 10: f2 0f 11 20 movsd %xmm4,(%rax) 14: 85 f6 test %esi,%esi 16: 0f 85 12 00 00 00 jne 2e <average+0x2e> 1c: b9 00 00 c0 7f mov $0x7fc00000,%ecx 21: 66 0f 6e c1 movd %ecx,%xmm0 25: 48 83 c4 10 add $0x10,%rsp 29: 48 89 ec mov %rbp,%rsp 2c: 5d pop %rbp 2d: c3 retq 2e: 31 c9 xor %ecx,%ecx 30: 44 6b c9 04 imul $0x4,%ecx,%r9d 34: 66 0f 57 ed xorpd %xmm5,%xmm5 38: f3 42 0f 5a 2c 0f cvtss2sd (%rdi,%r9,1),%xmm5 3e: 4c 8d 0c 24 lea (%rsp),%r9 42: f2 41 0f 58 29 addsd (%r9),%xmm5 47: 4c 8d 0c 24 lea (%rsp),%r9 4b: [message truncated]
cfallin commented on issue #10118:
In this snippet
block1(v0: i32, v1: i32): v20 = f64const 0x0.0 ; Create a f64 as 0.0, just to initialise ss0. stack_store v20, ss0 ; Store f64 into the ss0 stack slot, just to initialise ss0. v6 = iadd v0, v1 ; Adds the input together as integers, output is i32? v7 = load.f32 v6 ; Converts v7 to f32 from i32.
you are loading from memory at address
v6
, butv6
is an I32-typed value (iadd
is polymorphic so the type is implicit fromv0
andv1
args, which are bothi32
). Note that x86-64 is a 64-bit architecture, so we require addresses to be 64 bits wide. That this worked on aarch64 is a little surprising but I guess the lowering rules are more lenient in that backend (they probably auto-extend).(The comments aren't load-bearing for the bug but I note too that you wrote "Converts ... to f32 from i32" which is inconsistent with a load -- usually I would reach for an int-to-float opcode to convert in the usual sense.)
Our assertion failure message could absolutely be better here -- perhaps we could add "address with unexpected" or something like that.
I'll note as well that the example IR you took from
ir.md
is written without a target in mind, and address width is one of the few ways in which CLIF is not totally target-independent -- it must match the target ISA. I suspect that example was written long ago when riscv32 was the original target of proto-Cranelift.Hopefully this is a simple fix (use i64 types for addresses) -- let us know if that addresses your problem!
iwanders commented on issue #10118:
@cfallin , thank you for the detailed answer.
Note that x86-64 is a 64-bit architecture, so we require addresses to be 64 bits wide.
Hopefully this is a simple fix (use i64 types for addresses) -- let us know if that addresses your problem!Ah, I didn't think of that, but that does make sense!
I can indeed get the example from the documentation to work by changing
v0
,v1
,v3
andv4
toi64
. Would you like me to file a PR to change the documentation to usei64
s for those variables? I expect most people who try to read through the docs and build something to use thecranelift_native
crate and be on 64 bit architectures?Our assertion failure message could absolutely be better here -- perhaps we could add "address with unexpected" or something like that.
Yeah, I think just something simple as changing that debug assert to be;
debug_assert_eq!(ctx.output_ty(add, 0), types::I64, "address width of 64 expected, got 32");
would go a long way towards steering people in the right direction? I can file a pr with this change if you'd like me to. The discrepancy between debug and release builds is also a bit surprising to me, but that may be less of an easy fix as it probably requires adding an explicit check.
cfallin commented on issue #10118:
Yes, if you're willing, a PR to make both of those changes would be very much appreciated! Happy to review. Thanks!
bjorn3 commented on issue #10118:
The discrepancy between debug and release builds is also a bit surprising to me, but that may be less of an easy fix as it probably requires adding an explicit check.
Does the clif ir verifier catch this mistake? You need to explicitly enable it as it is kinda slow.
iwanders commented on issue #10118:
Does the clif ir verifier catch this mistake? You need to explicitly enable it as it is kinda slow.
I tried adding the same
average.clif
file with antest verifier
stanza at the top in e132b3bdbe4144800df54ac78a01c7473e62a927, this does not cause a unit test failure, so I don't think it does. I could not find a way to specify atest verifier_fail
or something to indicate the verifier should report a failure in the test case, so I'm not sure how to really express this as a test.My exact setup is available in 70af5ceb5122121a670181615fb733cc55bce571 on (branch issue-10118-reproduction), it adds a crate
cranelift/compile_average_ir
in the wasmtime repo for easy reproduction (functionattempt_two
inmain.rs
). This prints the flags as:enable_alias_analysis = true enable_verifier = true enable_pcc = false
Which I take to mean the verifier is enabled, the documentation here also states it is enabled by default? Reproduction should be a matter of going to the crate and doing
cargo r
, that gives the assert, whilecargo r --release
shows the object file bytes.Context; I'm considering creating a toy compiler, just to learn more about compilers. On Sunday I was exploring Cranelift to see how approachable it is and whether I'd like to use it as the frontend and share datastructures (effectively I think I would attempt making a
TargetIsa
). So that's why I ended up reading the IR documentation and tried to compile that function from the documentation. Overall though, the fact that I got working assembly within an hour or two of reading, experimenting, and in like 100 lines, makes it very approachable :+1:Yes, if you're willing, a PR to make both of those changes would be very much appreciated! Happy to review. Thanks!
Cool, I'll put something together.
iwanders commented on issue #10118:
I took another look at this, now that I have a bit more understanding of the cranelift data structures.
In efd56cbfc6f068de82e0cbb4ebba014a584d5499 of the issue-10118-reproduction branch, I added some printing of the types that we get in the clif:
This indeed shows the problem with the load on I32.
inst: inst2 instruction_data: Binary { opcode: Iadd, args: [v0, v1] } typevar_operand: Some(v0) opcode: Iadd args: [v0, v1] types: [types::I32, types::I32] results? true -> [v6] (types: [types::I32]) inst: inst3 instruction_data: Load { opcode: Load, arg: v6, flags: MemFlags { bits: 32384 }, offset: Offset32(0) } typevar_operand: Some(v6) opcode: Load args: [v6] types: [types::I32] results? true -> [v7] (types: [types::F32])
So at first I looked at the
isle
files and tried to figure out if there needed to be a type constraint on the argument to prevent it from being used... but I think I was looking in the wrong place.Next I looked at the verifier... so in instructions.rs the
iAddr
is defined, but this is just an int of 32 or 64 bits.In d7e53eadae6e7933a20628750caca1757be0e0b2 I added some prints to the verifier, and this does indeed show that the int-set for the Value that's the argument has the 32 and 64 bit integers set;
typecheck: inst3 Load { opcode: Load, arg: v6, flags: MemFlags { bits: 32384 }, offset: Offset32(0) } ctrl: types::F32 arg_type: types::I32 type_set: ValueTypeSet { lanes: cranelift_bitset::scalar::ScalarBitSet<u16> { 0: true, 1: false, 2: false, 3: false, 4: false, 5: false, 6: false, 7: false, 8: false, 9: false, 10: false, 11: false, 12: false, 13: false, 14: false, 15: false }, ints: cranelift_bitset::scalar::ScalarBitSet<u8> { 0: false, 1: false, 2: false, 3: false, 4: false, 5: true, 6: true, 7: false }, floats: cranelift_bitset::scalar::ScalarBitSet<u8> { 0: false, 1: false, 2: false, 3: false, 4: false, 5: false, 6: false, 7: false }, dynamic_lanes: cranelift_bitset::scalar::ScalarBitSet<u16> { 0: false, 1: false, 2: false, 3: false, 4: false, 5: false, 6: false, 7: false, 8: false, 9: false, 10: false, 11: false, 12: false, 13: false, 14: false, 15: false } } arg_type: types::I32
So I think this means that even though we are on a 64 bit architecture, the load can take a 32 or 64 bit argument according to the verifier.
I'm not sure what the best approach is here, by the time we reach the verifier, we cannot distinguish between an
i32
,i64
andiAddr
anymore, given my (limited) understanding the ideal would be to use a bit in theValueTypeSet
to denote something as an address integer, and then the verifier can confirm that that the value is of the appropriate type for the current isa.In c75db15dd4a64ca71d245057be12aeaa08ff9400 I just added a
verify_is_address
method to the verifier, and call for load, that makes the verifier complain with this bad function;Defining function er: VerifierError { location: inst3, context: Some("v7 = load.f32 v6"), message: "invalid pointer width (got 32, expected 64) encountered v6" } Error: "Compilation error: Verifier errors"
Unfortunately, that commit does not run the test suite cleanly, there are 5 tests that fail:
<details>
<summary>Test failures with pointer width checking</summary>running 1 test test filetests ... FAILED failures: ---- filetests stdout ---- FAIL filetests/filetests/runtests/simd-extractlane.clif: function %extractlane_i8x16_through_stack(i8x16) -> i8 system_v { ss0 = explicit_slot 8 block0(v0: i8x16): v2 = stack_addr.i64 ss0 v3 = extractlane v0, 1 store v3, v2 v4 = load.i8 v2 ; ^~~~~~~~~~~~~~~ ; error: inst3 (v4 = load.i8 v2): invalid pointer width (got 64, expected 32) encountered v2 return v4 } ; 1 verifier error detected (see above). Compilation aborted. Stack backtrace: <snip> FAIL filetests/filetests/runtests/fdemote.clif: function %fdemote_load(i64, f64) -> f32 system_v { ss0 = explicit_slot 16 block0(v1: i64, v2: f64): v3 = stack_addr.i64 ss0 store v2, v3 v4 = load.f64 v3 ; ^~~~~~~~~~~~~~~~ ; error: inst2 (v4 = load.f64 v3): invalid pointer width (got 64, expected 32) encountered v3 v5 = fdemote.f32 v4 return v5 } ; 1 verifier error detected (see above). Compilation aborted. Stack backtrace: <snip> FAIL filetests/filetests/runtests/simd-insertlane.clif: function %insertlane_i8x16_through_stack(i8x16, i8) -> i8x16 system_v { ss0 = explicit_slot 8 block0(v0: i8x16, v1: i8): v2 = stack_addr.i64 ss0 store v1, v2 v3 = load.i8 v2 ; ^~~~~~~~~~~~~~~ ; error: inst2 (v3 = load.i8 v2): invalid pointer width (got 64, expected 32) encountered v2 v4 = insertlane v0, v3, 1 return v4 } ; 1 verifier error detected (see above). Compilation aborted. Stack backtrace: <snip> FAIL filetests/filetests/runtests/fpromote.clif: function %fpromote_load(i64, f32) -> f64 system_v { ss0 = explicit_slot 16 block0(v1: i64, v2: f32): v3 = stack_addr.i64 ss0 store v2, v3 v4 = load.f32 v3 ; ^~~~~~~~~~~~~~~~ ; error: inst2 (v4 = load.f32 v3): invalid pointer width (got 64, expected 32) encountered v3 v5 = fpromote.f64 v4 return v5 } ; 1 verifier error detected (see above). Compilation aborted. Stack backtrace: <snip> FAIL filetests/filetests/isa/x64/average.clif: function %average(i32, i32) -> f32 system_v { ss0 = explicit_slot 8 block1(v0: i32, v1: i32): v2 = f64const 0.0 stack_store v2, ss0 ; v2 = 0.0 brif v1, block2, block5 block2: v3 = iconst.i32 0 jump block3(v3) ; v3 = 0 block3(v4: i32): v5 = imul_imm v4, 4 v6 = iadd.i32 v0, v5 v7 = load.f32 v6 ; ^~~~~~~~~~~~~~~~ ; error: inst7 (v7 = load.f32 v6): invalid pointer width (got 32, expected 64) encountered v6 v8 = fpromote.f64 v7 v9 = stack_load.f64 ss0 v10 = fadd v8, v9 stack_store v10, ss0 v11 = iadd_imm v4, 1 v12 = icmp ult v11, v1 brif v12, block3(v11), block4 block4: v13 = stack_load.f64 ss0 v14 = fcvt_from_uint.f64 v1 v15 = fdiv v13, v14 v16 = fdemote.f32 v15 return v16 block5: v100 = f32const +NaN return v100 ; v100 = +NaN } ; 1 verifier error detected (see above). Compilation aborted. Stack backtrace: <snip> 1065 tests Error: 5 failures Stack backtrace: <snip> failures: filetests test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.82s
</details>
Problem here is, that that simd-extractlane.clif test file has
pulley32
, which probably creates an isa that uses 32 bits pointer widths, but then line 49 presumably gets a 64 bit stack address, which is then passed to the load on line 52, where the verifier now raises that this is incorrect.I can't really see a good solution for this that does not involve either introducing an address type, or having to split the clif files into 32 and 64 bit tests.
cfallin commented on issue #10118:
Problem here is, that that simd-extractlane.clif test file has pulley32, which probably creates an isa that uses 32 bits pointer widths, but then line 49 presumably gets a 64 bit stack address, which is then passed to the load on line 52, where the verifier now raises that this is incorrect.
I can't really see a good solution for this that does not involve either introducing an address type, or having to split the clif files into 32 and 64 bit tests.
It's fine in this case to remove
pulley32
from the test, I think -- given that the CLIF needs to be platform-pointer-width-aware, any fixed handwritten CLIF will necessarily be tied to one pointer width. For tests where loads and stores are an important part of the tested lowerings, we can split out 32-bit versions. And we should try to write tests in ways that minimize uses of loads and stores except where that is an essential part of the tested lowering (in this case it is, since we have extract-lane-from-memory lowerings on x86-64 at least).
iwanders commented on issue #10118:
Okay, interesting. Part of me was even wondering why it would be illegal to do a load from an i32, given that it's just pointing to a memory address. The docs for it state
An integer address type
, I guess theaddress type
also implies width must be equal to the pointer width?@cfallin , given that you stated it would be okay to split out the clif files (where necessary) to 32 and 64 bits... should we consider doing that in combination with the (kind of hardcoded) improvement to the verifier I did in c75db15dd4a64ca71d245057be12aeaa08ff9400? I can probably find some time over the weekend to explore how that solution would end up looking. We may want to extend it to also check
store
to keep it symmetrical?
cfallin commented on issue #10118:
Yes, updating the verifier and then updating tests that it finds violate the constraint is the right path here, I think (and for stores as well as loads; and make sure all the SIMD variants and atomics are covered, too). Thanks!
The docs for it state
An integer address type
, I guess the address type also implies width must be equal to the pointer width?Yes, the wording is somewhat ambiguous in that method documentation, but we've explicitly decided before (when adding the assert that you're hitting) that we define correct IR to use platform-width pointers only.
cfallin closed issue #10118:
.clif
Test CaseThe Cranelift IR
average
example code from the docs/ir.md. I've already added that to the filetests test directory in caeafe27c69804e14c10d28d25ce511488afd212.Using
clif-util
;../target/debug/clif-util bugpoint filetests/filetests/isa/x64/average.clif x86_64 After pass 0, remaining insts/blocks: 22/3 (will keep reducing) After pass 1, remaining insts/blocks: 13/3 (will keep reducing) After pass 2, remaining insts/blocks: 13/3 (stop reducing) ████████████████████████████████████████████████████████████ pass 2 phase merge blocks 2/ 2 Remove unused global values Crash message: assertion `left == right` failed left: types::I32 right: types::I64 function %average() -> f32 system_v { ss0 = explicit_slot 8 block1: v0 = iconst.i32 0 v1 = iconst.i32 0 v5 = iconst.i32 0 v6 = iadd v0, v5 ; v0 = 0, v5 = 0 v9 = f64const 0.0 v100 = f32const +NaN brif v1, block2, block5 ; v1 = 0 block2: v7 = load.f32 v6 v8 = fpromote.f64 v7 v10 = fadd v8, v9 ; v9 = 0.0 stack_store v10, ss0 trap user1 block5: return v100 ; v100 = +NaN } 5 blocks 22 insts -> 3 blocks 13 insts
Manually, I reduced it to the following:
function %average(i32, i32) -> f32 system_v { ss0 = explicit_slot 8 ; Stack slot for `sum` block1(v0: i32, v1: i32): v20 = f64const 0x0.0 ; Create a f64 as 0.0, just to initialise ss0. stack_store v20, ss0 ; Store f64 into the ss0 stack slot, just to initialise ss0. v6 = iadd v0, v1 ; Adds the input together as integers, output is i32? v7 = load.f32 v6 ; Converts v7 to f32 from i32. v8 = fpromote.f64 v7 ; converts v7 to f64 into v8. v9 = stack_load.f64 ss0 ; Loads ss0 from the stack into v9, v9 is now f64 that's 0.0 v10 = fadd.f64 v8, v9 ; Adds v8 and v9 together, both are f64's, so v10 is an f64. stack_store v10, ss0 ; Write the 8 bytes back to ss0. v100 = f32const +NaN ; Just creating a dummy to return. return v100 ; Return the dummy. }
Instructions themselves seem fine to me, but today is the first day I'm looking at Cranelift at all, so I absolutely may be missing something obvious.
Steps to Reproduce
- Cherrypick caeafe27c69804e14c10d28d25ce511488afd212
cd cranelift
cargo t
(note without--release
, test will pass with--release
).Note, the problem is with
x86_64
, this function compiles fine withaarch64
.Expected Results
I would expect the example function provided in the documentation to compile, it compiles on
aarch64
, which (to me at least) indicates the function is fine. I would expect this to also compile onx86_64
.Actual Results
A debug assert triggers on this line;
<details>
<summary>Backtrace</summary>---- filetests stdout ---- thread 'worker #7' panicked at cranelift/codegen/src/isa/x64/lower.rs:233:9: assertion `left == right` failed left: types::I32 right: types::I64 stack backtrace: 0: rust_begin_unwind at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:665:5 1: core::panicking::panic_fmt at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:76:14 2: core::panicking::assert_failed_inner 3: core::panicking::assert_failed at /home/ivor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panicking.rs:373:5 4: cranelift_codegen::isa::x64::lower::lower_to_amode at ./codegen/src/isa/x64/lower.rs:233:9 5: cranelift_codegen::isa::x64::lower::isle::<impl cranelift_codegen::isa::x64::lower::isle::generated_code::Context for cranelift_codegen::machinst::isle::IsleContext<cranelift_codegen::isa::x64::lower::isle::generated_code::MInst,cranelift_codegen::isa::x64::X64Backend>>::sink_load at ./codegen/src/isa/x64/lower/isle.rs:313:20 6: cranelift_codegen::isa::x64::lower::isle::<impl cranelift_codegen::isa::x64::lower::isle::generated_code::Context for cranelift_codegen::machinst::isle::IsleContext<cranelift_codegen::isa::x64::lower::isle::generated_code::MInst,cranelift_codegen::isa::x64::X64Backend>>::put_in_reg_mem at ./codegen/src/isa/x64/lower/isle.rs:148:23 7: cranelift_codegen::isa::x64::lower::isle::<impl cranelift_codegen::isa::x64::lower::isle::generated_code::Context for cranelift_codegen::machinst::isle::IsleContext<cranelift_codegen::isa::x64::lower::isle::generated_code::MInst,cranelift_codegen::isa::x64::X64Backend>>::put_in_xmm_mem at ./codegen/src/isa/x64/lower/isle.rs:132:28 8: cranelift_codegen::isa::x64::lower::isle::generated_code::constructor_lower at /home/ivor/Documents/Code/rust/wasmtime/wasmtime/target/debug/build/cranelift-codegen-8f3d42bba10fabf3/out/isle_x64.rs:21098:42 9: cranelift_codegen::isa::x64::lower::isle::lower at ./codegen/src/isa/x64/lower/isle.rs:55:5 10: cranelift_codegen::isa::x64::lower::<impl cranelift_codegen::machinst::lower::LowerBackend for cranelift_codegen::isa::x64::X64Backend>::lower at ./codegen/src/isa/x64/lower.rs:311:9 11: cranelift_codegen::machinst::lower::Lower<I>::lower_clif_block at ./codegen/src/machinst/lower.rs:679:39 12: cranelift_codegen::machinst::lower::Lower<I>::lower at ./codegen/src/machinst/lower.rs:1020:17 13: cranelift_codegen::machinst::compile::compile at ./codegen/src/machinst/compile.rs:42:9 14: cranelift_codegen::isa::x64::X64Backend::compile_vcode at ./codegen/src/isa/x64/mod.rs:62:9 15: <cranelift_codegen::isa::x64::X64Backend as cranelift_codegen::isa::TargetIsa>::compile_function at ./codegen/src/isa/x64/mod.rs:74:40 16: cranelift_codegen::context::Context::compile_stencil at ./codegen/src/context.rs:138:9 17: cranelift_codegen::context::Context::compile at ./codegen/src/context.rs:204:23 18: <cranelift_filetests::test_compile::TestCompile as cranelift_filetests::subtest::SubTest>::run at ./filetests/src/test_compile.rs:59:29 19: cranelift_filetests::subtest::SubTest::run_target at ./filetests/src/subtest.rs:101:13 20: cranelift_filetests::runone::run at ./filetests/src/runone.rs:97:9 21: cranelift_filetests::concurrent::worker_thread::{{closure}}::{{closure}} at ./filetests/src/concurrent.rs:149:46 22: std::panicking::try::do_call at /home/ivor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:557:40 23: __rust_try 24: std::panicking::try at /home/ivor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:520:19 25: std::panic::catch_unwind at /home/ivor/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:358:14 26: cranelift_filetests::concurrent::worker_thread::{{closure}} at ./filetests/src/concurrent.rs:149:30 note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
</details>
The relevant section from
isle_x64.rs
;&Opcode::Fpromote => { let v1 = C::first_result(ctx, arg0); if let Some(v2) = v1 { let v3 = C::value_type(ctx, v2); if v3 == F64 { let v1809 = constructor_xmm_zero(ctx, F64X2); let v1804 = &C::put_in_xmm_mem(ctx, v520); // <- Line isle_x64.rs:21098:42 let v1819 = constructor_x64_cvtss2sd(ctx, v1809, v1804); let v1820 = constructor_output_xmm(ctx, v1819); let v1821 = Some(v1820); // Rule at src/isa/x64/lower.isle line 2661. return v1821; } } }
Which seems to indicate the problem is originating from
Fpromote
.Versions and Environment
Cranelift version or commit: Current released;
0.116.1
, also tested on current main; 5dfccc078bb3dcad152d37013dde8067f197e6e7.Operating system: Ubuntu, stable rustc 1.84.0 (9fc6b4312 2025-01-07)
Architecture: x86_64
Extra Info
If I use a release build, target x86_64 and write the output to an object file using
cranelift_object::ObjectModule
. The provided symbol does work and correctly calculates averages.<details>
<summary>Disassembly of object and usage of it to calculate average</summary>$ objdump -d average.o average.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <average>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 83 ec 10 sub $0x10,%rsp 8: 66 0f 57 e4 xorpd %xmm4,%xmm4 c: 48 8d 04 24 lea (%rsp),%rax 10: f2 0f 11 20 movsd %xmm4,(%rax) 14: 85 f6 test %esi,%esi 16: 0f 85 12 00 00 00 jne 2e <average+0x2e> 1c: b9 00 00 c0 7f mov $0x7fc00000,%ecx 21: 66 0f 6e c1 movd %ecx,%xmm0 25: 48 83 c4 10 add $0x10,%rsp 29: 48 89 ec mov %rbp,%rsp 2c: 5d pop %rbp 2d: c3 retq 2e: 31 c9 xor %ecx,%ecx 30: 44 6b c9 04 imul $0x4,%ecx,%r9d 34: 66 0f 57 ed xorpd %xmm5,%xmm5 38: f3 42 0f 5a 2c 0f cvtss2sd (%rdi,%r9,1),%xmm5 3e: 4c 8d 0c 24 lea (%rsp),%r9 42: f2 41 0f 58 29 addsd (%r9),%xmm5 47: 4c 8d 0c 24 lea (%rsp),%r9 4b: [message truncated]
Last updated: Feb 28 2025 at 03:10 UTC