Stream: git-wasmtime

Topic: wasmtime / issue #2991 s390x: Add z14 support


view this post on Zulip Wasmtime GitHub notifications bot (Jun 16 2021 at 16:40):

alexcrichton commented on issue #2991:

I was curious so I ran the test suite in qemu and I ran into an issue that looks like:

---- wasi_cap_std_sync::fd_readdir stdout ----
preopen: "/tmp/wasi_common_fd_readdirfr5CGv"
guest stderr:
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `2`: expected two entries in an empty directory', src/bin/fd_readdir.rs:76:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

===
Error: error while testing Wasm module 'fd_readdir'

Caused by:
    wasm trap: call stack exhausted
    wasm backtrace:
        0: 0xaafb - <unknown>!<std::sys::wasi::stdio::Stderr as std::io::Write>::is_write_vectored::hf152121ba89ed5c9
        1: 0xa867 - <unknown>!rust_panic
        2: 0xa3fd - <unknown>!std::panicking::rust_panic_with_hook::hf735cc98c0f3e6f4
        3: 0x9b35 - <unknown>!std::panicking::begin_panic_handler::{{closure}}::hb082d09953c1ceec
        4: 0x9a76 - <unknown>!std::sys_common::backtrace::__rust_end_short_backtrace::hac58197bca415fd5
        5: 0xa2a1 - <unknown>!rust_begin_unwind
        6: 0xff0b - <unknown>!core::panicking::panic_fmt::hf8b3045973a2d1f9
        7: 0x10b73 - <unknown>!core::panicking::assert_failed::inner::h4a10935a4d4a4d0d
        8: 0x3581 - <unknown>!core::panicking::assert_failed::hf88aca872cdb2b11
        9: 0x167b - <unknown>!fd_readdir::main::hb00a7e1c801281d4
       10: 0x2dcf - <unknown>!std::sys_common::backtrace::__rust_begin_short_backtrace::hf92d88d850fed84d
       11: 0x2e06 - <unknown>!std::rt::lang_start::{{closure}}::hc83ff37db6c562d6
       12: 0xa912 - <unknown>!std::rt::lang_start_internal::h3bc712c5a299b4e4
       13: 0x1ec4 - <unknown>!__original_main
       14:  0x545 - <unknown>!_start
       15: 0x13c08 - <unknown>!_start.command_export
    note: run with `WASMTIME_BACKTRACE_DETAILS=1` environment variable to display more information

where something seems off there if it's saying that the call-stack is exhausted. Perhaps a qemu bug? Maybe a backend bug? In any case was just curious how qemu would run, although it unfortunately didn't make it to the meat of the tests.

I was also a little surprised at how slow the compile was, our aarch64 build finishes building tests in ~18m but the s390x tests built in ~27m. This is the speed of the LLVM backend for s390x presumably, so nothing related to Wasmtime, just curious!

view this post on Zulip Wasmtime GitHub notifications bot (Jun 16 2021 at 16:56):

uweigand commented on issue #2991:

Note that mainline qemu doesn't quite support z14 yet. Support has been merged into the qemu s390x maintainer repo (branch s390-next in https://gitlab.com/cohuck/qemu.git) but not yet mainline. Not sure if this explains this particular crash.

I was also a little surprised at how slow the compile was, our aarch64 build finishes building tests in ~18m but the s390x tests built in ~27m. This is the speed of the LLVM backend for s390x presumably, so nothing related to Wasmtime, just curious!

Is this running as cross-compiler, or running the native LLVM under qemu? I don't see any particular reason why the s390x back-end should be significantly slower than the aarch64 back-end when running as cross-compiler ...

view this post on Zulip Wasmtime GitHub notifications bot (Jun 16 2021 at 16:58):

uweigand commented on issue #2991:

Also, please hold off merging this a bit -- I just noticed that there seems to be bug in the auxv crate that causes getauxval to sometimes return a wrong value so the native platform is mis-detected. I'm currently testing a fix to just use getauxval from the libc crate, which works correctly (and seems more straightforward anyway).

view this post on Zulip Wasmtime GitHub notifications bot (Jun 16 2021 at 17:01):

alexcrichton commented on issue #2991:

Ah yeah it was using stock qemu 6.0.0, and "stack overflow" also happens with illegal instructions, so that would indeed explain that!

For the slowness, it's LLVM running natively but compiling to s390x. It could also just be variance in GitHub Actions perhaps, but afaik the only thing affecting the speed of compiling the test suite in this case would be the s390x backend in LLVM. In any case though not like something we'll fix here, just something I was curious about.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 16 2021 at 17:07):

uweigand commented on issue #2991:

Ah yeah it was using stock qemu 6.0.0, and "stack overflow" also happens with illegal instructions, so that would indeed explain that!

For the slowness, it's LLVM running natively but compiling to s390x. It could also just be variance in GitHub Actions perhaps, but afaik the only thing affecting the speed of compiling the test suite in this case would be the s390x backend in LLVM. In any case though not like something we'll fix here, just something I was curious about.

Is there a simple way to reproduce this process outside of GitHub actions? I could have a look ...

view this post on Zulip Wasmtime GitHub notifications bot (Jun 16 2021 at 17:39):

alexcrichton commented on issue #2991:

While not exactly easy one possible way to reproduce is to run the same steps locally that CI does, which basically just downloads QEMU, builds it, and then configures some env vars for cargo's build

view this post on Zulip Wasmtime GitHub notifications bot (Jun 16 2021 at 18:32):

uweigand commented on issue #2991:

Also, please hold off merging this a bit -- I just noticed that there seems to be bug in the auxv crate that causes getauxval to sometimes return a wrong value so the native platform is mis-detected. I'm currently testing a fix to just use getauxval from the libc crate, which works correctly (and seems more straightforward anyway).

OK, this is fixed now. The current version passes the full test suite on both z14 and z15, and it will indeed use the z15 instructions on the latter. As far as I can see, this should be good to merge now. FYI @cfallin .

view this post on Zulip Wasmtime GitHub notifications bot (Jun 18 2021 at 18:25):

uweigand commented on issue #2991:

While not exactly easy one possible way to reproduce is to run the same steps locally that CI does, which basically just downloads QEMU, builds it, and then configures some env vars for cargo's build

Turns out this has nothing to do with qemu, I'm seeing the same failure natively. This is related to the --features "test-programs/test_programs" argument used by ./ci/run-tests.sh -- I hadn't been using this argument in my testing, which means I've apparently never even attempted to executed some of those tests.

I'll have a look why those tests are failing.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 22 2021 at 12:08):

uweigand commented on issue #2991:

Turns out this was an endian bug in handling of the Dirent data type: https://github.com/bytecodealliance/wasmtime/pull/3016

With this, I can now successfully run ./ci/run-tests.sh (at least natively).

view this post on Zulip Wasmtime GitHub notifications bot (Jun 22 2021 at 14:20):

alexcrichton commented on issue #2991:

The trap was originally reported as a stack overflow exhaustion but given the wasm stack that doesn't seem to be the case, but was the trap classification fixed by https://github.com/bytecodealliance/wasmtime/pull/3014? I could definitely imagine that switching endiannness would cause some random traps on reads/writes in wasm though...

view this post on Zulip Wasmtime GitHub notifications bot (Jun 22 2021 at 14:44):

uweigand commented on issue #2991:

The trap was originally reported as a stack overflow exhaustion but given the wasm stack that doesn't seem to be the case, but was the trap classification fixed by #3014?

Looks like this is indeed the case! I now get wasm trap: unreachable which seems reasonable for a rust_panic.


Last updated: Oct 23 2024 at 20:03 UTC