Stream: git-wasmtime

Topic: wasmtime / issue #8197 Cranelift: don't hoist instruction...


view this post on Zulip Wasmtime GitHub notifications bot (Mar 20 2024 at 20:59):

jameysharp added the cranelift label to Issue #8197.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 20 2024 at 20:59):

jameysharp added the bug label to Issue #8197.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 20 2024 at 20:59):

jameysharp opened issue #8197:

.clif Test Case

Wasmtime currently translates this WebAssembly module:

(module
  (type $fn (func (result i32)))
  (table $fnptrs 2 2 funcref)
  (func (param i32)
        loop
          local.get 0
          call_indirect $fnptrs (type $fn)
          br 0
        end))

To this CLIF, before optimization:

function u0:0(i64 vmctx, i64, i32) fast {
    gv0 = vmctx
    gv1 = load.i64 notrap aligned readonly gv0+8
    gv2 = load.i64 notrap aligned gv1
    gv3 = vmctx
    gv4 = load.i64 notrap aligned gv3+72
    sig0 = (i64 vmctx, i64) -> i32 fast
    sig1 = (i64 vmctx, i32 uext, i32 uext) -> i64 system_v
    stack_limit = gv2

                                block0(v0: i64, v1: i64, v2: i32):
                                    v3 -> v2
                                    v28 -> v3
@0027                               jump block2

                                block2:
@002b                               v4 = iconst.i32 2
@002b                               v5 = icmp.i32 uge v3, v4  ; v4 = 2
@002b                               v6 = uextend.i64 v3
@002b                               v7 = global_value.i64 gv4
@002b                               v8 = ishl_imm v6, 3
@002b                               v9 = iadd v7, v8
@002b                               v10 = iconst.i64 0
@002b                               v11 = select_spectre_guard v5, v10, v9  ; v10 = 0
@002b                               v12 = load.i64 table_oob aligned table v11
@002b                               v13 = band_imm v12, -2
@002b                               brif v12, block5(v13), block4

                                block4 cold:
@002b                               v15 = iconst.i32 0
@002b                               v16 = global_value.i64 gv3
@002b                               v17 = load.i64 notrap aligned readonly v16+56
@002b                               v18 = load.i64 notrap aligned readonly v17+72
@002b                               v19 = call_indirect sig1, v18(v16, v15, v3)  ; v15 = 0
@002b                               jump block5(v19)

                                block5(v14: i64):
@002b                               v20 = global_value.i64 gv3
@002b                               v21 = load.i64 notrap aligned readonly v20+64
@002b                               v22 = load.i32 notrap aligned readonly v21
@002b                               v23 = load.i32 icall_null aligned readonly v14+24
@002b                               v24 = icmp eq v23, v22
@002b                               trapz v24, bad_sig
@002b                               v25 = load.i64 notrap aligned readonly v14+16
@002b                               v26 = load.i64 notrap aligned readonly v14+32
@002b                               v27 = call_indirect sig0, v25(v26, v0)
@002e                               jump block2
}

Steps to Reproduce

Optimize this CLIF with the egraph pass enabled using opt_level=speed.

Expected Results

I think the loads of v17 and v18 should stay in the cold block.

Actual Results

The loads of v17 and v18 are hoisted to the entry block, since they have the notrap and readonly flags set and depend only on the vmctx parameter, v0.

Versions and Environment

Cranelift version or commit: tested on afaf1c73f67c42e638590608579b2b02c7f65ca3

Extra Info

This is not a correctness bug but might be a performance problem. I haven't measured it, just noticed it while looking at the CLIF that Wasmtime currently generates for call_indirect.

cc: @cfallin

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2024 at 19:29):

cfallin commented on issue #8197:

I suppose we could add a(nother) special case to the block-placement logic, where LICM etc is implemented: if the original block is cold, keep the op in the same block. That's probably limited in scope enough that it's both (i) tractable and (ii) shouldn't have unforeseen impacts?


Last updated: Dec 23 2024 at 13:07 UTC