I got a few verifier errors with cg_clif saying that the domtree is not valid. I can't reproduce it with clif-util even when enabling the exact same flags.
thread 'rustc' panicked at 'called `Result::unwrap()` on an `Err` value: Compilation(Verifier(VerifierErrors([VerifierError { location: block3, context: None, message: "invalid domtree, expected idom(block3) = Some(inst11), got Some(inst10)" }])))', src/base.rs:133:14
thread 'rustc' panicked at 'called `Result::unwrap()` on an `Err` value: Compilation(Verifier(VerifierErrors([VerifierError { location: block4, context: None, message: "invalid domtree, expected idom(block4) = Some(inst4), got Some(inst5)" }])))', src/base.rs:133:14
thread 'rustc' panicked at 'called `Result::unwrap()` on an `Err` value: Compilation(Verifier(VerifierErrors([VerifierError { location: block4, context: None, message: "invalid domtree, expected idom(block4) = Some(inst15), got Some(inst14)" }])))', src/base.rs:133:14
Any idea how I can debug this?
The manual verifier pass just after the generation of clif ir doesn't give any error. Only once compilation happens does this verifier error get hit.
Weird, do you have all the CLIF + same settings? And using the same backends in both cases?
(same CPU flags too)
Yes, I used the same cranelift commit, matched all flags and used the old backend in both cases.
Currently testing if the latest cranelift version has the same problem.
Updating doesn't help.
If you want to reproduce, just use ./prepare.sh && ./test.sh --debug
on the latest commit of cg_clif.
Removing https://github.com/bjorn3/rustc_codegen_cranelift/blob/3ea8915d4a247b5b3c4cfb3424c230ccd2645b17/src/base.rs#L114-L119 doesn't help.
One thing that would be interesting is the backtrace - the verifier runs several times during compilation, so knowing when it happens would be interesting.
If you have the CLIF graph around, might be valuable too?
This is the clif that gets compiled:
target x86_64-unknown-linux-gnu haswell
function u0:117(f32) -> i8 system_v {
; symbol _ZN4core3f3221_$LT$impl$u20$f32$GT$8classify17h334bc58d8063a2b5E
; instance Instance { def: Item(WithOptConstParam { did: DefId(0:125 ~ core[25e5]::f32::{impl#0}::classify), const_param_did: None }), substs: [] }
ss0 = explicit_slot 1
sig0 = (f32) -> i32 system_v
fn0 = colocated u0:118 sig0
block0(v0: f32):
v1 -> v0
nop
jump block1
block1:
nop
@0002 v2 = call fn0(v1)
v3 -> v2
@0002 jump block2
block2:
@0002 nop
@0005 v4 = iconst.i32 0x007f_ffff
@0005 v5 = band.i32 v3, v4
v11 -> v5
v23 -> v5
@0008 v6 = iconst.i32 0x7f80_0000
@0008 v7 = band.i32 v3, v6
v8 -> v7
@000c brz v5, block3
@000c jump block4(v7)
block3:
@000c nop
@000d brz.i32 v8, block7
@000d jump block4(v8)
block4(v9: i32):
@000d nop
@000e v10 = icmp_imm eq v9, 0x7f80_0000
@000e brnz v10, block5
@000e jump block12
block12:
@000e brz.i32 v9, block8
@000e jump block6
block5:
@000e nop
@000f brz.i32 v11, block9
@000f jump block10
block6:
@000f nop
@0010 v12 = iconst.i8 4
@0010 stack_store v12, ss0
@0011 v13 = stack_load.i8 ss0
@0011 return v13
block7:
@0011 nop
@0012 v14 = iconst.i8 2
@0012 stack_store v14, ss0
@0011 v15 = stack_load.i8 ss0
@0011 return v15
block8:
@0011 nop
@0013 v16 = iconst.i8 3
@0013 stack_store v16, ss0
@0011 v17 = stack_load.i8 ss0
@0011 return v17
block9:
@0011 nop
@0014 v18 = iconst.i8 1
@0014 stack_store v18, ss0
@0011 v19 = stack_load.i8 ss0
@0011 return v19
block10:
@0011 nop
@0015 v20 = iconst.i8 0
@0015 stack_store v20, ss0
@0011 v21 = stack_load.i8 ss0
@0011 return v21
block11:
@0011 nop
@0017 v22 = stack_load.i8 ss0
@0017 return v22
}
did cg_clif insert these nops?
for instance for block3, the expected dominator value is wrong -- it's the actual value that's correct
Yes, cg_clif inserts these nops at the start of the codegen of each mir block. I have code writing comments after specified instructions when printing the clif ir. The comment for the first mir statement is attached to this nop.
When making Context::verify
panic on verifier errors I got the following backtrace:
0: rust_begin_unwind
at /rustc/44e3daf5eee8263dfc3a2509e78ddd1f6f783a0e/library/std/src/panicking.rs:493:5
1: std::panicking::begin_panic_fmt
at /rustc/44e3daf5eee8263dfc3a2509e78ddd1f6f783a0e/library/std/src/panicking.rs:435:5
2: cranelift_codegen::context::Context::verify
at /home/bjorn/Documenten/wasmtime/cranelift/codegen/src/context.rs:293:13
3: cranelift_codegen::context::Context::verify_if
at /home/bjorn/Documenten/wasmtime/cranelift/codegen/src/context.rs:302:13
4: cranelift_codegen::context::Context::preopt
at /home/bjorn/Documenten/wasmtime/cranelift/codegen/src/context.rs:347:9
5: cranelift_codegen::context::Context::compile
at /home/bjorn/Documenten/wasmtime/cranelift/codegen/src/context.rs:173:13
6: <cranelift_object::backend::ObjectModule as cranelift_module::module::Module>::define_function
at /home/bjorn/Documenten/wasmtime/cranelift/object/src/backend.rs:263:13
7: rustc_codegen_cranelift::base::codegen_fn::{{closure}}
at /home/bjorn/Documenten/cg_clif/src/base.rs:126:9
The verifier error happens after preopt.
context.domtree.clear()
after the optimizations done by cg_clif itself fixes it.
Some optimizations probably don't expect it to already be computed and thus don't invalidate it as necessary.
Fixed in https://github.com/bjorn3/rustc_codegen_cranelift/commit/cfedad1f75bf22468fce59f754daf1501fa2827d
Nice to know there's a workaround, I'd be really interested in extracting a clif test case to reproduce. Maybe time to eat the frog and set up cg_clif :smile:
I think the repro would be something like context.compute_cfg(); context.compute_domtree(); context.preopt(isa).unwrap();
with the verifier enabled on a function containing a nop instruction.
Last updated: Dec 23 2024 at 12:05 UTC