Stream: git-wasmtime

Topic: wasmtime / issue #5998 cranelift: Fuzz mid-end optimizations


view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2023 at 18:44):

github-actions[bot] commented on issue #5998:

Subscribe to Label Action

cc @fitzgen

<details>
This issue or pull request has been labeled: "fuzzing"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2023 at 20:11):

jameysharp commented on issue #5998:

Oh, and when you write a more complete commit message, please include the phrase "Fixes #5967" in it somewhere so GitHub understands that these two issues are related.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2023 at 05:10):

ghostway0 commented on issue #5998:

I still don't understand why this fails on the optimized case, on the first test...

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2023 at 11:23):

afonso360 commented on issue #5998:

I still don't understand why this fails on the optimized case, on the first test...

So, I took a look at this, and it found a bug!

More specifically #5922. Although that bug is tagged as x64, it really is a bug in our mid end. We translate stack_stores into store notrap aligned, however it probably shouldn't unconditionally be translated as aligned.

The reason this only triggers in the second run is that the interpreter allows unaligned stores with stack_store, but if you pass it a store aligned and it isn't aligned, it complains.

Here are the test cases:

<details>
<summary>Original</summary>

test interpret
test run
set opt_level=speed_and_size
set enable_alias_analysis=false
set use_egraphs=false
set enable_simd=true
set enable_llvm_abi_extensions=true
set unwind_info=false
set machine_code_cfg_info=true
set enable_jump_tables=false
set enable_heap_access_spectre_mitigation=false
set enable_table_access_spectre_mitigation=false
set enable_incremental_compilation_cache_checks=true
target x86_64

function %a() system_v {
    ss0 = explicit_slot 51
    ss1 = explicit_slot 68
    sig0 = (f32) -> f32 system_v
    sig1 = (f64) -> f64 system_v
    sig2 = (f32) -> f32 system_v
    sig3 = (f64) -> f64 system_v
    sig4 = (f32) -> f32 system_v
    sig5 = (f64) -> f64 system_v
    fn0 = %CeilF32 sig0
    fn1 = %CeilF64 sig1
    fn2 = %FloorF32 sig2
    fn3 = %FloorF64 sig3
    fn4 = colocated %TruncF32 sig4
    fn5 = %TruncF64 sig5

block0:
    v0 = iconst.i8 0
    v1 = iconst.i16 0
    v2 = iconst.i32 0
    v3 = iconst.i64 0
    v4 = uextend.i128 v3  ; v3 = 0
    stack_store v4, ss0
    stack_store v4, ss0+16
    stack_store v4, ss0+32
    stack_store v1, ss0+48  ; v1 = 0
    stack_store v0, ss0+50  ; v0 = 0
    stack_store v4, ss1
    stack_store v4, ss1+16
    stack_store v4, ss1+32
    stack_store v4, ss1+48
    stack_store v2, ss1+64  ; v2 = 0
    return
}

</details>

<details>

<summary>Optimized</summary>

function u1:0() system_v {
    ss0 = explicit_slot 51
    ss1 = explicit_slot 68
    sig0 = (f32) -> f32 system_v
    sig1 = (f64) -> f64 system_v
    sig2 = (f32) -> f32 system_v
    sig3 = (f64) -> f64 system_v
    sig4 = (f32) -> f32 system_v
    sig5 = (f64) -> f64 system_v
    fn0 = %CeilF32 sig0
    fn1 = %CeilF64 sig1
    fn2 = %FloorF32 sig2
    fn3 = %FloorF64 sig3
    fn4 = colocated %TruncF32 sig4
    fn5 = %TruncF64 sig5

block0:
    v0 = iconst.i8 0
    v1 = iconst.i16 0
    v2 = iconst.i32 0
    v3 = iconst.i64 0
    v4 = uextend.i128 v3  ; v3 = 0
    v5 = stack_addr.i64 ss0
    store notrap aligned v4, v5
    v6 = stack_addr.i64 ss0+16
    store notrap aligned v4, v6
    v7 = stack_addr.i64 ss0+32
    store notrap aligned v4, v7
    v8 = stack_addr.i64 ss0+48
    store notrap aligned v1, v8  ; v1 = 0
    v9 = stack_addr.i64 ss0+50
    store notrap aligned v0, v9  ; v0 = 0
    v10 = stack_addr.i64 ss1
    store notrap aligned v4, v10
    v11 = stack_addr.i64 ss1+16
    store notrap aligned v4, v11
    v12 = stack_addr.i64 ss1+32
    store notrap aligned v4, v12
    v13 = stack_addr.i64 ss1+48
    store notrap aligned v4, v13
    v14 = stack_addr.i64 ss1+64
    store notrap aligned v2, v14  ; v2 = 0
    return
}

</details>

This crashes in the first store to ss1, since it expects it to be aligned to 16bytes, but it isn't due to the previous stack slot. The interpreter never inserts padding between stack slots.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2023 at 11:27):

afonso360 edited a comment on issue #5998:

I still don't understand why this fails on the optimized case, on the first test...

So, I took a look at this, and it found a bug! :tada:

More specifically #5922. Although that bug is tagged as x64, it really is a bug in our mid end. We translate stack_stores into store notrap aligned, however it probably shouldn't unconditionally be translated as aligned.

The reason this only triggers in the second run is that the interpreter allows unaligned stores with stack_store, but if you pass it a store aligned and it isn't aligned, it complains.

Here are the test cases:

<details>
<summary>Original</summary>

test interpret
test run
set opt_level=speed_and_size
set enable_alias_analysis=false
set use_egraphs=false
set enable_simd=true
set enable_llvm_abi_extensions=true
set unwind_info=false
set machine_code_cfg_info=true
set enable_jump_tables=false
set enable_heap_access_spectre_mitigation=false
set enable_table_access_spectre_mitigation=false
set enable_incremental_compilation_cache_checks=true
target x86_64

function %a() system_v {
    ss0 = explicit_slot 51
    ss1 = explicit_slot 68
    sig0 = (f32) -> f32 system_v
    sig1 = (f64) -> f64 system_v
    sig2 = (f32) -> f32 system_v
    sig3 = (f64) -> f64 system_v
    sig4 = (f32) -> f32 system_v
    sig5 = (f64) -> f64 system_v
    fn0 = %CeilF32 sig0
    fn1 = %CeilF64 sig1
    fn2 = %FloorF32 sig2
    fn3 = %FloorF64 sig3
    fn4 = colocated %TruncF32 sig4
    fn5 = %TruncF64 sig5

block0:
    v0 = iconst.i8 0
    v1 = iconst.i16 0
    v2 = iconst.i32 0
    v3 = iconst.i64 0
    v4 = uextend.i128 v3  ; v3 = 0
    stack_store v4, ss0
    stack_store v4, ss0+16
    stack_store v4, ss0+32
    stack_store v1, ss0+48  ; v1 = 0
    stack_store v0, ss0+50  ; v0 = 0
    stack_store v4, ss1
    stack_store v4, ss1+16
    stack_store v4, ss1+32
    stack_store v4, ss1+48
    stack_store v2, ss1+64  ; v2 = 0
    return
}

</details>

<details>

<summary>Optimized</summary>

function u1:0() system_v {
    ss0 = explicit_slot 51
    ss1 = explicit_slot 68
    sig0 = (f32) -> f32 system_v
    sig1 = (f64) -> f64 system_v
    sig2 = (f32) -> f32 system_v
    sig3 = (f64) -> f64 system_v
    sig4 = (f32) -> f32 system_v
    sig5 = (f64) -> f64 system_v
    fn0 = %CeilF32 sig0
    fn1 = %CeilF64 sig1
    fn2 = %FloorF32 sig2
    fn3 = %FloorF64 sig3
    fn4 = colocated %TruncF32 sig4
    fn5 = %TruncF64 sig5

block0:
    v0 = iconst.i8 0
    v1 = iconst.i16 0
    v2 = iconst.i32 0
    v3 = iconst.i64 0
    v4 = uextend.i128 v3  ; v3 = 0
    v5 = stack_addr.i64 ss0
    store notrap aligned v4, v5
    v6 = stack_addr.i64 ss0+16
    store notrap aligned v4, v6
    v7 = stack_addr.i64 ss0+32
    store notrap aligned v4, v7
    v8 = stack_addr.i64 ss0+48
    store notrap aligned v1, v8  ; v1 = 0
    v9 = stack_addr.i64 ss0+50
    store notrap aligned v0, v9  ; v0 = 0
    v10 = stack_addr.i64 ss1
    store notrap aligned v4, v10
    v11 = stack_addr.i64 ss1+16
    store notrap aligned v4, v11
    v12 = stack_addr.i64 ss1+32
    store notrap aligned v4, v12
    v13 = stack_addr.i64 ss1+48
    store notrap aligned v4, v13
    v14 = stack_addr.i64 ss1+64
    store notrap aligned v2, v14  ; v2 = 0
    return
}

</details>

This crashes in the first store to ss1, since it expects it to be aligned to 16bytes, but it isn't due to the previous stack slot. The interpreter never inserts padding between stack slots.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2023 at 11:45):

afonso360 edited a comment on issue #5998:

I still don't understand why this fails on the optimized case, on the first test...

So, I took a look at this, and it found a bug! :tada:

More specifically #5922. Although that bug is tagged as x64, it really is a bug in our mid end. We translate stack_stores into store notrap aligned, however it probably shouldn't unconditionally be translated as aligned.

The reason this only triggers in the second run is that the interpreter allows unaligned stores with stack_store, but if you pass it a store aligned and it isn't aligned, it complains.

Here are the test cases:

<details>
<summary>Original</summary>

test interpret
test run
set opt_level=speed_and_size
set enable_alias_analysis=false
set use_egraphs=false
set enable_simd=true
set enable_llvm_abi_extensions=true
set unwind_info=false
set machine_code_cfg_info=true
set enable_jump_tables=false
set enable_heap_access_spectre_mitigation=false
set enable_table_access_spectre_mitigation=false
set enable_incremental_compilation_cache_checks=true
target x86_64

function %a() system_v {
    ss0 = explicit_slot 51
    ss1 = explicit_slot 68
    sig0 = (f32) -> f32 system_v
    sig1 = (f64) -> f64 system_v
    sig2 = (f32) -> f32 system_v
    sig3 = (f64) -> f64 system_v
    sig4 = (f32) -> f32 system_v
    sig5 = (f64) -> f64 system_v
    fn0 = %CeilF32 sig0
    fn1 = %CeilF64 sig1
    fn2 = %FloorF32 sig2
    fn3 = %FloorF64 sig3
    fn4 = colocated %TruncF32 sig4
    fn5 = %TruncF64 sig5

block0:
    v0 = iconst.i8 0
    v1 = iconst.i16 0
    v2 = iconst.i32 0
    v3 = iconst.i64 0
    v4 = uextend.i128 v3  ; v3 = 0
    stack_store v4, ss0
    stack_store v4, ss0+16
    stack_store v4, ss0+32
    stack_store v1, ss0+48  ; v1 = 0
    stack_store v0, ss0+50  ; v0 = 0
    stack_store v4, ss1
    stack_store v4, ss1+16
    stack_store v4, ss1+32
    stack_store v4, ss1+48
    stack_store v2, ss1+64  ; v2 = 0
    return
}

</details>

<details>

<summary>Optimized</summary>

function u1:0() system_v {
    ss0 = explicit_slot 51
    ss1 = explicit_slot 68
    sig0 = (f32) -> f32 system_v
    sig1 = (f64) -> f64 system_v
    sig2 = (f32) -> f32 system_v
    sig3 = (f64) -> f64 system_v
    sig4 = (f32) -> f32 system_v
    sig5 = (f64) -> f64 system_v
    fn0 = %CeilF32 sig0
    fn1 = %CeilF64 sig1
    fn2 = %FloorF32 sig2
    fn3 = %FloorF64 sig3
    fn4 = colocated %TruncF32 sig4
    fn5 = %TruncF64 sig5

block0:
    v0 = iconst.i8 0
    v1 = iconst.i16 0
    v2 = iconst.i32 0
    v3 = iconst.i64 0
    v4 = uextend.i128 v3  ; v3 = 0
    v5 = stack_addr.i64 ss0
    store notrap aligned v4, v5
    v6 = stack_addr.i64 ss0+16
    store notrap aligned v4, v6
    v7 = stack_addr.i64 ss0+32
    store notrap aligned v4, v7
    v8 = stack_addr.i64 ss0+48
    store notrap aligned v1, v8  ; v1 = 0
    v9 = stack_addr.i64 ss0+50
    store notrap aligned v0, v9  ; v0 = 0
    v10 = stack_addr.i64 ss1
    store notrap aligned v4, v10
    v11 = stack_addr.i64 ss1+16
    store notrap aligned v4, v11
    v12 = stack_addr.i64 ss1+32
    store notrap aligned v4, v12
    v13 = stack_addr.i64 ss1+48
    store notrap aligned v4, v13
    v14 = stack_addr.i64 ss1+64
    store notrap aligned v2, v14  ; v2 = 0
    return
}

</details>

This crashes in the first store to ss1, since it expects it to be aligned to 16bytes, but it isn't due to the previous stack slot. The interpreter never inserts padding between stack slots.

Edit: I've opened #6016 to address this.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2023 at 19:09):

ghostway0 commented on issue #5998:

That's cool. If there are no further comments, after #6016 merges, this can be merged as well!

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2023 at 19:10):

ghostway0 edited a comment on issue #5998:

That's cool. If there are no further comments, after #6016 merges, this can be merged as well (after pretty-printing is fixed)!

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2023 at 19:13):

ghostway0 edited a comment on issue #5998:

That's cool. If there are no further comments, after #6016 merges, this can be merged as well (after pretty-printing is fixed)!
Interestingly, it is failing for me on a different test case! So there might be another bug-fix needed soon

view this post on Zulip Wasmtime GitHub notifications bot (Mar 15 2023 at 14:05):

ghostway0 commented on issue #5998:

Should I rebase or would you squash all the commits?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 15 2023 at 14:06):

ghostway0 edited a comment on issue #5998:

Should I rebase or would you squash all the commits?
Anyway, a diff-like tool would be great to use, if PrintableTestCase can be extended to do that

view this post on Zulip Wasmtime GitHub notifications bot (Mar 15 2023 at 15:01):

afonso360 commented on issue #5998:

Should I rebase or would you squash all the commits?

I think it will squash all commits once the PR gets merged.

Anyway, a diff-like tool would be great to use, if PrintableTestCase can be extended to do that

I don't think we have anything like that right now (though it would be a good addition!), but if you do something like RUST_LOG=trace cargo run test ./output-from-fuzzer.clif, you should see the optimized function being printed in the trace log. It's pretty much my go to every time.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 16 2023 at 07:24):

ghostway0 commented on issue #5998:

Oh, it appears to be the same problem! stack_stores that are optimized to store notrap aligned. But when I try to fuzz with #6016, it still fails

view this post on Zulip Wasmtime GitHub notifications bot (Mar 16 2023 at 07:26):

ghostway0 edited a comment on issue #5998:

Oh, it appears to be the same problem! stack_stores that are optimized to store notrap aligned. But when I try to fuzz with #6016 (main), it still fails

view this post on Zulip Wasmtime GitHub notifications bot (Mar 16 2023 at 07:29):

ghostway0 edited a comment on issue #5998:

Oh, it appears to be the same problem! stack_stores that are optimized to store notrap aligned. But when I try to fuzz with #6016 (main), it still fails
No! Apparently I didn't pull recently enough. It works!


Last updated: Nov 22 2024 at 16:03 UTC