Stream: git-wasmtime

Topic: wasmtime / issue #5680 `wasi-nn` test broken on `main`


view this post on Zulip Wasmtime GitHub notifications bot (Feb 01 2023 at 00:53):

cfallin opened issue #5680:

In this CI run on main, the wasi-nn test job appears to fail with a Wasm trap in wasm-nn-example.wasm. The same test passed on the PR that just merged and triggered the CI run on main.

It appears that the wasi-nn test is influenced by model data or other bits downloaded from the Internet; as such, I suspect there is an issue due to changing inputs.

@abrown / @jlb6740, would you be able to look into this? Since wasi-nn support is tier 3, we'll need to disable the test tomorrow if not fixed urgently, since it blocks any other PR from merging.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 01 2023 at 00:56):

elliottt commented on issue #5680:

Let's go ahead and disable the test for now so that we can unblock any other in-flight work. I'll make a PR.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 01 2023 at 01:04):

abrown commented on issue #5680:

From a first glance, it doesn't look like this is wasi-nn related... at least the bits that would do any inference with a model. The backtrace points to wasi-libc as the issue, right? I think that might bear some looking into. Maybe this test is the only one exercising some aspect of wasi-libc?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 01 2023 at 01:08):

cfallin commented on issue #5680:

Possibly, but passing and then failing later on the same tree indicates to me at least that it's not a simple broken-codegen sort of issue... I'll poke at this locally a little bit.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 01 2023 at 01:10):

cfallin commented on issue #5680:

Hmm, nevermind, I don't have a functioning OpenVINO setup.

I think the most reasonable call is: given flake + tier 3, let's disable then debug off the critical path so main isn't blocked.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 01 2023 at 01:22):

abrown commented on issue #5680:

I think this is a bit more worrying than that. This install.sh script is how one could get an OpenVINO installation.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 01 2023 at 01:29):

cfallin commented on issue #5680:

OK, @elliottt noticed that the failure started on main after #5676, and actually failed on the PR itself too. #5605 had clean CI because it didn't include #5676 while running. #5676 merged despite one failing test because it had auto-merge enabled, and the wasi-nn test was erroneously not included in our list of required tests for main's branch protection.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 01 2023 at 01:32):

abrown commented on issue #5680:

So there must be something in wasi-libc's memset that doesn't play nicely with that optimization? That optimization seemed reasonable to me...

view this post on Zulip Wasmtime GitHub notifications bot (Feb 01 2023 at 01:33):

cfallin commented on issue #5680:

I've added wasi-nn to the list of required tests for main; if #5682 turns green then we can revert and @fitzgen can investigate tomorrow, otherwise let's dig a bit more. I agree that there's something non-wasi-nn-specific that seems to be going on since the backtrace is in libc startup before any app-specific code runs.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 01 2023 at 01:42):

abrown commented on issue #5680:

(thought: we should have more tests exercising wasi-libc if the wasi-nn test is the only one that found this... still puzzled at what is going on)

view this post on Zulip Wasmtime GitHub notifications bot (Feb 01 2023 at 01:46):

cfallin commented on issue #5680:

I suspect if it's a really subtle codegen thing then fuzzing would have caught it eventually too... though I agree, and I hope a few good tests of a reduced kernel of the bug come out of this!

view this post on Zulip Wasmtime GitHub notifications bot (Feb 01 2023 at 02:53):

elliottt closed issue #5680:

In this CI run on main, the wasi-nn test job appears to fail with a Wasm trap in wasm-nn-example.wasm. The same test passed on the PR that just merged and triggered the CI run on main.

It appears that the wasi-nn test is influenced by model data or other bits downloaded from the Internet; as such, I suspect there is an issue due to changing inputs.

@abrown / @jlb6740, would you be able to look into this? Since wasi-nn support is tier 3, we'll need to disable the test tomorrow if not fixed urgently, since it blocks any other PR from merging.


Last updated: Nov 22 2024 at 17:03 UTC