cfallin opened issue #5680:
In this CI run on
main
, thewasi-nn
test job appears to fail with a Wasm trap inwasm-nn-example.wasm
. The same test passed on the PR that just merged and triggered the CI run on main.It appears that the
wasi-nn
test is influenced by model data or other bits downloaded from the Internet; as such, I suspect there is an issue due to changing inputs.@abrown / @jlb6740, would you be able to look into this? Since wasi-nn support is tier 3, we'll need to disable the test tomorrow if not fixed urgently, since it blocks any other PR from merging.
elliottt commented on issue #5680:
Let's go ahead and disable the test for now so that we can unblock any other in-flight work. I'll make a PR.
abrown commented on issue #5680:
From a first glance, it doesn't look like this is wasi-nn related... at least the bits that would do any inference with a model. The backtrace points to wasi-libc as the issue, right? I think that might bear some looking into. Maybe this test is the only one exercising some aspect of wasi-libc?
cfallin commented on issue #5680:
Possibly, but passing and then failing later on the same tree indicates to me at least that it's not a simple broken-codegen sort of issue... I'll poke at this locally a little bit.
cfallin commented on issue #5680:
Hmm, nevermind, I don't have a functioning OpenVINO setup.
I think the most reasonable call is: given flake + tier 3, let's disable then debug off the critical path so
main
isn't blocked.
abrown commented on issue #5680:
I think this is a bit more worrying than that. This install.sh script is how one could get an OpenVINO installation.
cfallin commented on issue #5680:
OK, @elliottt noticed that the failure started on
main
after #5676, and actually failed on the PR itself too. #5605 had clean CI because it didn't include #5676 while running. #5676 merged despite one failing test because it had auto-merge enabled, and the wasi-nn test was erroneously not included in our list of required tests formain
's branch protection.
abrown commented on issue #5680:
So there must be something in wasi-libc's
memset
that doesn't play nicely with that optimization? That optimization seemed reasonable to me...
cfallin commented on issue #5680:
I've added
wasi-nn
to the list of required tests formain
; if #5682 turns green then we can revert and @fitzgen can investigate tomorrow, otherwise let's dig a bit more. I agree that there's something non-wasi-nn-specific that seems to be going on since the backtrace is in libc startup before any app-specific code runs.
abrown commented on issue #5680:
(thought: we should have more tests exercising wasi-libc if the wasi-nn test is the only one that found this... still puzzled at what is going on)
cfallin commented on issue #5680:
I suspect if it's a really subtle codegen thing then fuzzing would have caught it eventually too... though I agree, and I hope a few good tests of a reduced kernel of the bug come out of this!
elliottt closed issue #5680:
In this CI run on
main
, thewasi-nn
test job appears to fail with a Wasm trap inwasm-nn-example.wasm
. The same test passed on the PR that just merged and triggered the CI run on main.It appears that the
wasi-nn
test is influenced by model data or other bits downloaded from the Internet; as such, I suspect there is an issue due to changing inputs.@abrown / @jlb6740, would you be able to look into this? Since wasi-nn support is tier 3, we'll need to disable the test tomorrow if not fixed urgently, since it blocks any other PR from merging.
Last updated: Dec 23 2024 at 12:05 UTC