Stream: git-wasmtime

Topic: wasmtime / Issue #2165 Optimize cranelift-reader a bit


view this post on Zulip Wasmtime GitHub notifications bot (Aug 26 2020 at 11:32):

github-actions[bot] commented on Issue #2165:

Subscribe to Label Action

cc @bnjbvr

<details>
This issue or pull request has been labeled: "cranelift"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Aug 26 2020 at 18:17):

bjorn3 commented on Issue #2165:

I used perf record -e instructions:u -e cycles:u and then perf report looking it the estimated values at the header after selecting an event. I know there is an easier way for perf, but I can't find the exact command.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 26 2020 at 18:37):

abrown commented on Issue #2165:

I usually use perf stat for that type of thing but I mean something different: what did you use as input for cranelift-reader when you measured?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 26 2020 at 19:22):

bjorn3 commented on Issue #2165:

The two test files I was using were:

<details>

; ptrs: 100

function u0:0(i64) system_v {
; symbol _ZN9mini_core13drop_in_place17hf38f2fd3a61ef36bE
; instance Instance { def: DropGlue(DefId(1:231 ~ mini_core[8787]::drop_in_place[0]), Some([NoisyDropInner; 2])), substs: [[NoisyDropInner; 2]] }
; sig ([*mut [NoisyDropInner; 2]]; c_variadic: false)->()

    ss0 = explicit_slot 8
    ss1 = explicit_slot 8
    ss2 = explicit_slot 8
    sig0 = (i64) system_v
    sig1 = (i64) system_v
    fn0 = colocated u0:0 sig0
    fn1 = colocated u0:0 sig1

block0(v0: i64):
    stack_store v0, ss0
    jump block1

block1:
    v1 = iconst.i64 0
    v2 = stack_load.i64 ss0
    brz v1, block5
    jump block8

block5:
    v16 = iconst.i64 0
    jump block4(v16)

block8:
    v27 = stack_load.i64 ss0
    v29 = iconst.i64 0
    v30 = iadd v27, v29
    jump block7(v27)

block4(v11: i64):
    v13 = icmp_imm eq v11, 2
    v14 = bint.i8 v13
    v15 = uextend.i32 v14
    brz v15, block3
    jump block2

block3:
    v4 = stack_load.i64 ss0
    v6 = imul_imm.i64 v11, 0
    v7 = iadd v4, v6
    v8 = iconst.i64 1
    v9 = iadd.i64 v11, v8
    stack_store v7, ss1
    v10 = stack_load.i64 ss1
    call fn0(v10)
    jump block4(v9)

block7(v22: i64):
    v24 = icmp eq v22, v30
    v25 = bint.i8 v24
    v26 = uextend.i32 v25
    brz v26, block6
    jump block2

block6:
    v18 = iconst.i64 1
    v19 = imul_imm v18, 0
    v20 = iadd.i64 v22, v19
    stack_store.i64 v22, ss2
    v21 = stack_load.i64 ss2
    call fn1(v21)
    jump block7(v20)

block2:
    v100 = stack_addr.i64 ss0
    return
}

and

; ptrs: 0 1 3 10 11 13

function u0:0(i64) system_v {
    ss0 = explicit_slot 8

block0(v0: i64):
    v1 = iadd_imm v0, 1
    v2 = iconst.i64 2
    v3 = iadd v1, v2
    v4 = iconst.i8 0
    store v4, v3
    v10 = stack_addr.i64 ss0
    v11 = iadd_imm v10, 1
    v12 = iconst.i64 2
    v13 = iadd v11, v12
    v14 = iconst.i8 0
    store v14, v13
    return
}

</details>

The measured time is ~25% parsing. I took the two tests for a library I am working on and repeated them 10_000x in a row inside the test process.


Last updated: Oct 23 2024 at 20:03 UTC