Stream: git-wasmtime

Topic: wasmtime / Issue #2432 Investigate non-deterministic fail...


view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2020 at 04:37):

abrown edited Issue #2432:

@jlb6740, I am opening this issue to track a failure I noticed with x64 SIMD: https://github.com/bytecodealliance/wasmtime/pull/2428/checks?check_run_id=1421136203#step:7:1582

view this post on Zulip Wasmtime GitHub notifications bot (Dec 08 2020 at 05:33):

jlb6740 commented on Issue #2432:

FYI, I am trying to see if I can pinpoint the patch that caused the issue. Basically running:

cargo test -p wasmtime-cli --features experimental_x64 wast::Cranelift::spec::simd::simd_

In a loop like so:

COUNTER=0 && while [ $COUNTER -lt 1000 ] && ./run_test.sh && COUNTER=$((COUNTER+1)) && echo -e "\n\nIteration: $COUNTER\n\n" ; do :; done | tee run_test.out

where the cargo test command is in ./run_test.sh.

It seems the failure usually occurs within 100 iterations but can take as long as 200-300.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 08 2020 at 17:25):

abrown commented on Issue #2432:

That's some progress!

view this post on Zulip Wasmtime GitHub notifications bot (Dec 08 2020 at 17:40):

julian-seward1 commented on Issue #2432:

@jlb6740 Can you extract from that, the actual non-shell-script program invokation that ./run_test.sh` does, that is failing, and then run that on valgrind, to see if something is reading uninitialised memory?

view this post on Zulip Wasmtime GitHub notifications bot (Dec 09 2020 at 07:26):

jlb6740 commented on Issue #2432:

I've been attempting to both (1) isolate the patch where the issue starts and (2) simplify the reproducer but both are proving to be more difficult than anticipated. Some observations ..

W.r.t pinpointing the patch that introduced the issue, I am finding it impossible to reproduce this patch using the script above before https://github.com/bytecodealliance/wasmtime/pull/2365 . However I am not thinking this patch introduces this bug since it was seen as earlier as October here: https://github.com/bytecodealliance/wasmtime/runs/1323686030

W.r.t. simplifying the reproducer I am finding the error almost always occurs with the same tests involving negation or abs .. test such as:

(assert_return (invoke "add-neg" (v128.const f64x2 1.125 1.125)
(v128.const f64x2 0.125 0.125))
(v128.const f64x2 -1.0 -1.0))

or

(assert_return (invoke "f64x2.abs" (v128.const f64x2 0x1.fffffffffffffp+1023 0x1.fffffffffffffp+1023))
(v128.const f64x2 0x1.fffffffffffffp+1023 0x1.fffffffffffffp+1023))

But when I try to isolate just these tests and run, the problem is not reproduced. It is hard for me to tell if the issue is with something in the lowering or something in the wast testing path.

I did run valgrind against a277cf for example and there is a conspicuous error:

==4490== Conditional jump or move depends on uninitialised value(s)
==4490== at 0x48A056: core::iter::traits::iterator::Iterator::all::check::{{closure}} (iterator.rs:2114)
==4490== by 0x484EBD: <core::iter::adapters::Enumerate<I> as core::iter::traits::iterator::Iterator>::try_fold::enumerate::{{closure}} (mod.rs:1413)
==4490== by 0x46FECB: core::iter::traits::iterator::Iterator::try_fold (iterator.rs:1888)
==4490== by 0x484945: <core::iter::adapters::Enumerate<I> as core::iter::traits::iterator::Iterator>::try_fold (mod.rs:1420)
==4490== by 0x4842C0: core::iter::traits::iterator::Iterator::all (iterator.rs:2117)
==4490== by 0x4791DD: wasmtime_wast::wast::v128_matches (wast.rs:471)
==4490== by 0x478B77: wasmtime_wast::wast::val_matches (wast.rs:411)
==4490== by 0x47456B: wasmtime_wast::wast::WastContext::assert_return (wast.rs:185)
==4490== by 0x476693: wasmtime_wast::wast::WastContext::run_directive (wast.rs:272)
==4490== by 0x47530B: wasmtime_wast::wast::WastContext::run_buffer (wast.rs:229)
==4490== by 0x478201: wasmtime_wast::wast::WastContext::run_file (wast.rs:353)
==4490== by 0x333EF2: all::wast::run_wast (wast.rs:46)
==4490== Uninitialised value was created by a stack allocation
==4490== at 0x56B530: <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::get_unchecked (zip.rs:291)

Which I think actually involves these lines here: https://github.com/bytecodealliance/wasmtime/blob/main/crates/wast/src/wast.rs#L507-L514 in head.

It is not obvious to me though how to correct this yet. I think this bug should be understood and remove this bug as a next step.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 09 2020 at 09:03):

julian-seward1 commented on Issue #2432:

Just to be on the safe side .. is this with valgrind 3.16.0 or later? That has the lowest false-positive rate so far; I would recommend against any investigation based on output from a version before 3.16.0.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 09 2020 at 10:07):

cfallin commented on Issue #2432:

So I was nerdsniped by this and chased it for a while -- with valgrind 3.16.1 (Fedora 33) I'm seeing the uninit value in the values_vec through which the trampoline passes the wasm func's return value (crates/wasmtime/src/func.rs) -- insert a println there and the uninit value is used right away, problem isn't in wast checks. The value's origin (valgrind --track-origins=yes) is:

==473121==  Uninitialised value was created by a stack allocation
==473121==    at 0xEFC9F0: wasmtime_runtime::traphandlers::setup_unix_sigaltstack::TLS::__init (local.rs:157)

So what I can gather from that is that we're reading uninitialized stack somewhere. I spent a while staring at the generated trampoline functions and am not seeing anything obviously wrong, but is it possible that maybe we're calling the wrong trampoline (too many/too few args or returns)?

Another oddity I noticed: there is no WasmTy impl for u128. Most vector code seems to work fine without this, though perhaps this is A Problem anyway (@alexcrichton any thoughts on this, plus @jlb6740's note above that the uninit behavior seems to go away prior to #2365?).

view this post on Zulip Wasmtime GitHub notifications bot (Dec 09 2020 at 10:24):

julian-seward1 commented on Issue #2432:

insert a println there and the uninit value is used right away

That reduces the chance that it's a false positive. The clincher would however be if you can demonstrate that multiple runs print different values at that point.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 09 2020 at 15:21):

alexcrichton commented on Issue #2432:

To confirm @jlb6740, are you testing recent master? #2365 had a use-after-free but if modules/stores were deallocated in the wrong order (later fixed though). That might explain why just after that commit valgrind finds issues.

As for WasmTy for u128, the reason we haven't implemented that yet is mainly that there's no native ABI for Rust to call a cranelift-generated function that takes u128 right now. Or at least no one's said 'yep this works let us add it'. Once that's possible we can add WasmTy for u128, however.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 09 2020 at 17:44):

jlb6740 commented on Issue #2432:

@julian-seward1 I have valgrind-3.15.0 installed. I will update to a 3.16 or higher. @alexcrichton the valgrind run is based on commit a277cf which was the last time I could readily reproduce. I've tried patches though since @cfallin disabled the tests and could reproduce up until then. Let me rerun the latest valgrind on the latest build with these tests enabled.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 09 2020 at 17:45):

jlb6740 edited a comment on Issue #2432:

I've been attempting to both (1) isolate the patch where the issue starts and (2) simplify the reproducer but both are proving to be more difficult than anticipated. Some observations ..

W.r.t pinpointing the patch that introduced the issue, I am finding it impossible to reproduce this issue using the script above before https://github.com/bytecodealliance/wasmtime/pull/2365 . However I am not thinking this patch introduces this bug since it was seen as earlier as October here: https://github.com/bytecodealliance/wasmtime/runs/1323686030

W.r.t. simplifying the reproducer I am finding the error almost always occurs with the same tests involving negation or abs .. test such as:

(assert_return (invoke "add-neg" (v128.const f64x2 1.125 1.125)
(v128.const f64x2 0.125 0.125))
(v128.const f64x2 -1.0 -1.0))

or

(assert_return (invoke "f64x2.abs" (v128.const f64x2 0x1.fffffffffffffp+1023 0x1.fffffffffffffp+1023))
(v128.const f64x2 0x1.fffffffffffffp+1023 0x1.fffffffffffffp+1023))

But when I try to isolate just these tests and run, the problem is not reproduced. It is hard for me to tell if the issue is with something in the lowering or something in the wast testing path.

I did run valgrind against a277cf for example and there is a conspicuous error:

==4490== Conditional jump or move depends on uninitialised value(s)
==4490== at 0x48A056: core::iter::traits::iterator::Iterator::all::check::{{closure}} (iterator.rs:2114)
==4490== by 0x484EBD: <core::iter::adapters::Enumerate<I> as core::iter::traits::iterator::Iterator>::try_fold::enumerate::{{closure}} (mod.rs:1413)
==4490== by 0x46FECB: core::iter::traits::iterator::Iterator::try_fold (iterator.rs:1888)
==4490== by 0x484945: <core::iter::adapters::Enumerate<I> as core::iter::traits::iterator::Iterator>::try_fold (mod.rs:1420)
==4490== by 0x4842C0: core::iter::traits::iterator::Iterator::all (iterator.rs:2117)
==4490== by 0x4791DD: wasmtime_wast::wast::v128_matches (wast.rs:471)
==4490== by 0x478B77: wasmtime_wast::wast::val_matches (wast.rs:411)
==4490== by 0x47456B: wasmtime_wast::wast::WastContext::assert_return (wast.rs:185)
==4490== by 0x476693: wasmtime_wast::wast::WastContext::run_directive (wast.rs:272)
==4490== by 0x47530B: wasmtime_wast::wast::WastContext::run_buffer (wast.rs:229)
==4490== by 0x478201: wasmtime_wast::wast::WastContext::run_file (wast.rs:353)
==4490== by 0x333EF2: all::wast::run_wast (wast.rs:46)
==4490== Uninitialised value was created by a stack allocation
==4490== at 0x56B530: <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::get_unchecked (zip.rs:291)

Which I think actually involves these lines here: https://github.com/bytecodealliance/wasmtime/blob/main/crates/wast/src/wast.rs#L507-L514 in head.

It is not obvious to me though how to correct this yet. I think this bug should be understood and remove this bug as a next step.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 09 2020 at 17:46):

jlb6740 edited a comment on Issue #2432:

I've been attempting to both (1) isolate the patch where the issue starts and (2) simplify the reproducer but both are proving to be more difficult than anticipated. Some observations ..

W.r.t pinpointing the patch that introduced the issue, I am finding it impossible to reproduce this issue using the script above before https://github.com/bytecodealliance/wasmtime/pull/2365 . However I am not thinking this patch introduces this bug since it was seen as earlier as October here: https://github.com/bytecodealliance/wasmtime/runs/1323686030

W.r.t. simplifying the reproducer I am finding the error almost always occurs with the same tests involving negation or abs .. test such as:

(assert_return (invoke "add-neg" (v128.const f64x2 1.125 1.125)
(v128.const f64x2 0.125 0.125))
(v128.const f64x2 -1.0 -1.0))

or

(assert_return (invoke "f64x2.abs" (v128.const f64x2 0x1.fffffffffffffp+1023 0x1.fffffffffffffp+1023))
(v128.const f64x2 0x1.fffffffffffffp+1023 0x1.fffffffffffffp+1023))

But when I try to isolate just these tests and run, the problem is not reproduced. It is hard for me to tell if the issue is with something in the lowering or something in the wast testing path.

I did run valgrind against a277cf for example and there is a conspicuous error:

==4490== Conditional jump or move depends on uninitialised value(s)
==4490==    at 0x48A056: core::iter::traits::iterator::Iterator::all::check::{{closure}} (iterator.rs:2114)
==4490==    by 0x484EBD: <core::iter::adapters::Enumerate<I> as core::iter::traits::iterator::Iterator>::try_fold::enumerate::{{closure}} (mod.rs:1413)
==4490==    by 0x46FECB: core::iter::traits::iterator::Iterator::try_fold (iterator.rs:1888)
==4490==    by 0x484945: <core::iter::adapters::Enumerate<I> as core::iter::traits::iterator::Iterator>::try_fold (mod.rs:1420)
==4490==    by 0x4842C0: core::iter::traits::iterator::Iterator::all (iterator.rs:2117)
==4490==    by 0x4791DD: wasmtime_wast::wast::v128_matches (wast.rs:471)
==4490==    by 0x478B77: wasmtime_wast::wast::val_matches (wast.rs:411)
==4490==    by 0x47456B: wasmtime_wast::wast::WastContext::assert_return (wast.rs:185)
==4490==    by 0x476693: wasmtime_wast::wast::WastContext::run_directive (wast.rs:272)
==4490==    by 0x47530B: wasmtime_wast::wast::WastContext::run_buffer (wast.rs:229)
==4490==    by 0x478201: wasmtime_wast::wast::WastContext::run_file (wast.rs:353)
==4490==    by 0x333EF2: all::wast::run_wast (wast.rs:46)
==4490==  Uninitialised value was created by a stack allocation
==4490==    at 0x56B530: <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::get_unchecked (zip.rs:291)

Which I think actually involves these lines here: https://github.com/bytecodealliance/wasmtime/blob/main/crates/wast/src/wast.rs#L507-L514 in head.

It is not obvious to me though how to correct this yet. I think this bug should be understood and remove this bug as a next step.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 09 2020 at 17:46):

jlb6740 edited a comment on Issue #2432:

I've been attempting to both (1) isolate the patch where the issue starts and (2) simplify the reproducer but both are proving to be more difficult than anticipated. Some observations ..

W.r.t pinpointing the patch that introduced the issue, I am finding it impossible to reproduce this issue using the script above before https://github.com/bytecodealliance/wasmtime/pull/2365 . However I am not thinking this patch introduces this bug since it was seen as earlier as October here: https://github.com/bytecodealliance/wasmtime/runs/1323686030

W.r.t. simplifying the reproducer I am finding the error almost always occurs with the same tests involving negation or abs .. test such as:

(assert_return (invoke "add-neg" (v128.const f64x2 1.125 1.125)
(v128.const f64x2 0.125 0.125))
(v128.const f64x2 -1.0 -1.0))

or

(assert_return (invoke "f64x2.abs" (v128.const f64x2 0x1.fffffffffffffp+1023 0x1.fffffffffffffp+1023))
(v128.const f64x2 0x1.fffffffffffffp+1023 0x1.fffffffffffffp+1023))

But when I try to isolate just these tests and run, the problem is not reproduced. It is hard for me to tell if the issue is with something in the lowering or something in the wast testing path.

I did run valgrind against a277cf for example and there is a conspicuous error:

==4490== Conditional jump or move depends on uninitialised value(s)
==4490==    at 0x48A056: core::iter::traits::iterator::Iterator::all::check::{{closure}} (iterator.rs:2114)
==4490==    by 0x484EBD: <core::iter::adapters::Enumerate<I> as core::iter::traits::iterator::Iterator>::try_fold::enumerate::{{closure}} (mod.rs:1413)
==4490==    by 0x46FECB: core::iter::traits::iterator::Iterator::try_fold (iterator.rs:1888)
==4490==    by 0x484945: <core::iter::adapters::Enumerate<I> as core::iter::traits::iterator::Iterator>::try_fold (mod.rs:1420)
==4490==    by 0x4842C0: core::iter::traits::iterator::Iterator::all (iterator.rs:2117)
==4490==    by 0x4791DD: wasmtime_wast::wast::v128_matches (wast.rs:471)
==4490==    by 0x478B77: wasmtime_wast::wast::val_matches (wast.rs:411)
==4490==    by 0x47456B: wasmtime_wast::wast::WastContext::assert_return (wast.rs:185)
==4490==    by 0x476693: wasmtime_wast::wast::WastContext::run_directive (wast.rs:272)
==4490==    by 0x47530B: wasmtime_wast::wast::WastContext::run_buffer (wast.rs:229)
==4490==    by 0x478201: wasmtime_wast::wast::WastContext::run_file (wast.rs:353)
==4490==    by 0x333EF2: all::wast::run_wast (wast.rs:46)
==4490==  Uninitialised value was created by a stack allocation
==4490==    at 0x56B530: <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::get_unchecked (zip.rs:291)

Which I think actually involves these lines here: https://github.com/bytecodealliance/wasmtime/blob/main/crates/wast/src/wast.rs#L507-L514 in head.

It is not obvious to me though how to correct this yet. I think this bug should be understood and remove as a next step.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 09 2020 at 19:52):

alexcrichton commented on Issue #2432:

Ah ok in that case I'm not entirely sure what would be causing this unfortunately :(

view this post on Zulip Wasmtime GitHub notifications bot (Dec 09 2020 at 20:11):

jlb6740 commented on Issue #2432:

@alexcrichton yes, just needs more investigation. I've confirmed that use of unitialized values still occurs with the latest build with valgrind 3.16.1

==22915== Conditional jump or move depends on uninitialised value(s)
==22915==    at 0x50D676: core::iter::traits::iterator::Iterator::all::check::{{closure}} (iterator.rs:2114)
==22915==    by 0x52297D: <core::iter::adapters::Enumerate<I> as core::iter::traits::iterator::Iterator>::try_fold::enumerate::{{closure}} (mod.rs:1413)
==22915==    by 0x50B1EB: core::iter::traits::iterator::Iterator::try_fold (iterator.rs:1888)
==22915==    by 0x522405: <core::iter::adapters::Enumerate<I> as core::iter::traits::iterator::Iterator>::try_fold (mod.rs:1420)
==22915==    by 0x51E660: core::iter::traits::iterator::Iterator::all (iterator.rs:2117)
==22915==    by 0x4FEAB7: wasmtime_wast::wast::v128_matches (wast.rs:507)
==22915==    by 0x4FE497: wasmtime_wast::wast::val_matches (wast.rs:451)
==22915==    by 0x4FC757: wasmtime_wast::wast::WastContext::assert_return (wast.rs:186)
==22915==    by 0x518A7F: wasmtime_wast::wast::WastContext::run_directive (wast.rs:287)
==22915==    by 0x4FD713: wasmtime_wast::wast::WastContext::run_buffer (wast.rs:236)
==22915==    by 0x4FDB01: wasmtime_wast::wast::WastContext::run_file (wast.rs:368)
==22915==    by 0x404274: all::wast::run_wast (wast.rs:48)
==22915==  Uninitialised value was created by a stack allocation
==22915==    at 0x8244E0: <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::get_unchecked (zip.rs:291)
==22915==

view this post on Zulip Wasmtime GitHub notifications bot (Dec 09 2020 at 20:11):

jlb6740 edited a comment on Issue #2432:

@alexcrichton yes, just needs more investigation. I've confirmed that use of unitialized values still occurs with the latest build (e09b9400) with valgrind 3.16.1

==22915== Conditional jump or move depends on uninitialised value(s)
==22915==    at 0x50D676: core::iter::traits::iterator::Iterator::all::check::{{closure}} (iterator.rs:2114)
==22915==    by 0x52297D: <core::iter::adapters::Enumerate<I> as core::iter::traits::iterator::Iterator>::try_fold::enumerate::{{closure}} (mod.rs:1413)
==22915==    by 0x50B1EB: core::iter::traits::iterator::Iterator::try_fold (iterator.rs:1888)
==22915==    by 0x522405: <core::iter::adapters::Enumerate<I> as core::iter::traits::iterator::Iterator>::try_fold (mod.rs:1420)
==22915==    by 0x51E660: core::iter::traits::iterator::Iterator::all (iterator.rs:2117)
==22915==    by 0x4FEAB7: wasmtime_wast::wast::v128_matches (wast.rs:507)
==22915==    by 0x4FE497: wasmtime_wast::wast::val_matches (wast.rs:451)
==22915==    by 0x4FC757: wasmtime_wast::wast::WastContext::assert_return (wast.rs:186)
==22915==    by 0x518A7F: wasmtime_wast::wast::WastContext::run_directive (wast.rs:287)
==22915==    by 0x4FD713: wasmtime_wast::wast::WastContext::run_buffer (wast.rs:236)
==22915==    by 0x4FDB01: wasmtime_wast::wast::WastContext::run_file (wast.rs:368)
==22915==    by 0x404274: all::wast::run_wast (wast.rs:48)
==22915==  Uninitialised value was created by a stack allocation
==22915==    at 0x8244E0: <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::get_unchecked (zip.rs:291)
==22915==

view this post on Zulip Wasmtime GitHub notifications bot (Jan 14 2021 at 18:57):

jlb6740 commented on Issue #2432:

@abrown @cfallin Perhaps we can close this now? Please re-open if someone disagrees.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 14 2021 at 18:57):

jlb6740 closed Issue #2432:

@jlb6740, I am opening this issue to track a failure I noticed with x64 SIMD: https://github.com/bytecodealliance/wasmtime/pull/2428/checks?check_run_id=1421136203#step:7:1582


Last updated: Oct 23 2024 at 20:03 UTC