jameysharp added the cranelift label to Issue #8197.
jameysharp added the bug label to Issue #8197.
jameysharp opened issue #8197:
.clif
Test CaseWasmtime currently translates this WebAssembly module:
(module (type $fn (func (result i32))) (table $fnptrs 2 2 funcref) (func (param i32) loop local.get 0 call_indirect $fnptrs (type $fn) br 0 end))
To this CLIF, before optimization:
function u0:0(i64 vmctx, i64, i32) fast { gv0 = vmctx gv1 = load.i64 notrap aligned readonly gv0+8 gv2 = load.i64 notrap aligned gv1 gv3 = vmctx gv4 = load.i64 notrap aligned gv3+72 sig0 = (i64 vmctx, i64) -> i32 fast sig1 = (i64 vmctx, i32 uext, i32 uext) -> i64 system_v stack_limit = gv2 block0(v0: i64, v1: i64, v2: i32): v3 -> v2 v28 -> v3 @0027 jump block2 block2: @002b v4 = iconst.i32 2 @002b v5 = icmp.i32 uge v3, v4 ; v4 = 2 @002b v6 = uextend.i64 v3 @002b v7 = global_value.i64 gv4 @002b v8 = ishl_imm v6, 3 @002b v9 = iadd v7, v8 @002b v10 = iconst.i64 0 @002b v11 = select_spectre_guard v5, v10, v9 ; v10 = 0 @002b v12 = load.i64 table_oob aligned table v11 @002b v13 = band_imm v12, -2 @002b brif v12, block5(v13), block4 block4 cold: @002b v15 = iconst.i32 0 @002b v16 = global_value.i64 gv3 @002b v17 = load.i64 notrap aligned readonly v16+56 @002b v18 = load.i64 notrap aligned readonly v17+72 @002b v19 = call_indirect sig1, v18(v16, v15, v3) ; v15 = 0 @002b jump block5(v19) block5(v14: i64): @002b v20 = global_value.i64 gv3 @002b v21 = load.i64 notrap aligned readonly v20+64 @002b v22 = load.i32 notrap aligned readonly v21 @002b v23 = load.i32 icall_null aligned readonly v14+24 @002b v24 = icmp eq v23, v22 @002b trapz v24, bad_sig @002b v25 = load.i64 notrap aligned readonly v14+16 @002b v26 = load.i64 notrap aligned readonly v14+32 @002b v27 = call_indirect sig0, v25(v26, v0) @002e jump block2 }
Steps to Reproduce
Optimize this CLIF with the egraph pass enabled using
opt_level=speed
.Expected Results
I think the loads of
v17
andv18
should stay in thecold
block.Actual Results
The loads of
v17
andv18
are hoisted to the entry block, since they have thenotrap
andreadonly
flags set and depend only on the vmctx parameter,v0
.Versions and Environment
Cranelift version or commit: tested on afaf1c73f67c42e638590608579b2b02c7f65ca3
Extra Info
This is not a correctness bug but might be a performance problem. I haven't measured it, just noticed it while looking at the CLIF that Wasmtime currently generates for
call_indirect
.cc: @cfallin
cfallin commented on issue #8197:
I suppose we could add a(nother) special case to the block-placement logic, where LICM etc is implemented: if the original block is cold, keep the op in the same block. That's probably limited in scope enough that it's both (i) tractable and (ii) shouldn't have unforeseen impacts?
Last updated: Nov 22 2024 at 16:03 UTC