Stream: git-wasmtime

Topic: wasmtime / Issue #1893 AArch64 CI tests: qemu hits memory...


view this post on Zulip Wasmtime GitHub notifications bot (Jun 17 2020 at 21:54):

cfallin edited Issue #1893:

The AArch64 CI test that runs using QEMU fails consistently for PR #1871 and the reasons are not clear - here's the relevant excerpt from the log:

2020-06-13T16:29:49.3730503Z test wast::Cranelift::spec::simd::simd_i32x4_cmp ... ok
2020-06-13T16:29:57.9345959Z test wast::Cranelift::spec::simd::simd_i8x16_sat_arith ... ignored
2020-06-13T16:30:08.5287111Z test wast::Cranelift::spec::simd::simd_lane ... ignored
2020-06-13T16:30:15.8261749Z test wast::Cranelift::spec::simd::simd_load ... ignored
2020-06-13T16:49:23.7624987Z error: test failed, to rerun pass '-p wasmtime-cli --test all'
2020-06-13T16:49:23.7648421Z
2020-06-13T16:49:23.7651248Z Caused by:
2020-06-13T16:49:23.7664954Z   process didn't exit successfully: `/home/runner/qemu/bin/qemu-aarch64 -L /usr/aarch64-linux-gnu /home/runner/work/wasmtime/wasmtime/target/aarch64-unknown-linux-gnu/release/deps/all-0af4aa3748ec4770` (signal: 9, SIGKILL: kill)
2020-06-13T16:49:24.0613948Z ##[error]Process completed with exit code 101.
2020-06-13T16:49:25.4620071Z Post job cleanup.

I have reproduced the test environment locally using the following commands:

rm -rf qemu-5.0.0 ${HOME}/qemu
curl https://download.qemu.org/qemu-5.0.0.tar.xz | tar xJf -
cd qemu-5.0.0
./configure --target-list=aarch64-linux-user --prefix=${HOME}/qemu --disable-tools --disable-slirp --disable-fdt --disable-capstone --disable-docs
make -j$(nproc) install
cd ..
RUSTFLAGS="-D warnings" \
  CARGO_INCREMENTAL=0 \
  CARGO_PROFILE_DEV_DEBUG=1 \
  CARGO_PROFILE_TEST_DEBUG=1 \
  CARGO_BUILD_TARGET=aarch64-unknown-linux-gnu \
  CARGO_TARGET_AARCH64_UNKNOWN_LINUX_GNU_RUNNER="${HOME}/qemu/bin/qemu-aarch64 -L /usr/aarch64-linux-gnu" \
  CARGO_TARGET_AARCH64_UNKNOWN_LINUX_GNU_LINKER=aarch64-linux-gnu-gcc \
  RUST_BACKTRACE=1 \
  cargo test \
  --features test-programs/test_programs \
  --release \
  --all \
  --exclude lightbeam \
  --exclude peepmatic \
  --exclude peepmatic-automata \
  --exclude peepmatic-fuzzing \
  --exclude peepmatic-macro \
  --exclude peepmatic-runtime \
  --exclude peepmatic-test \
  --exclude wasmtime-fuzz

However, I don't experience any test failures. In addition to that, I don't see any issues either when I run the test natively in an AArch64 environment. In that case the list of commands can be simplified to:

cargo test --release --all --exclude lightbeam

Note that the --features test-programs/test_programs parameter is omitted because it requires rust-lld, which appears not to be a part of the native AArch64 toolchain.

This issue has also been discussed in PR #1802.

cc @cfallin

view this post on Zulip Wasmtime GitHub notifications bot (Jun 17 2020 at 23:18):

akirilov-arm commented on Issue #1893:

@cfallin What is your preference with respect to opening PRs implementing AArch64 functionality - don't enable any relevant tests, but document their names in the description, so that people may run them manually, or enable all relevant tests, but disable them afterwards in case of CI failures (whose cause seems to be running out of memory)? I like the second option more - we have already merged a couple of changes after I had tried to push the first iteration of #1871, so evidently it works. Honestly, it's a little bit bizarre that the spec::simd::simd_align test triggers the issue because from a quick look at it there is nothing special about it, with one exception - it has the highest number of linear memory definitions of all SIMD tests (just run grep -R '(memory' tests/spec_testsuite/proposals/simd | cut -d: -f1 | sort | uniq -c | sort -rn), in fact it has more than the next 5 tests combined:

     92 tests/spec_testsuite/proposals/simd/simd_align.wast
     22 tests/spec_testsuite/proposals/simd/simd_load.wast
     20 tests/spec_testsuite/proposals/simd/simd_load_extend.wast
     16 tests/spec_testsuite/proposals/simd/simd_bit_shift.wast
     14 tests/spec_testsuite/proposals/simd/simd_load_splat.wast
     12 tests/spec_testsuite/proposals/simd/simd_i32x4_arith2.wast

On the other hand I have the feeling that we may run out of luck soon and start seeing consistent failures with any test.

cc @jgouly

view this post on Zulip Wasmtime GitHub notifications bot (Jun 18 2020 at 00:30):

cfallin commented on Issue #1893:

enable all relevant tests, but disable them afterwards in case of CI failures (whose cause seems to be running out of memory)?

Yes, I think this is the best option -- let's do this for now, and reference this issue when we have to disable a test to get a green CI to merge.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 18 2020 at 00:50):

alexcrichton commented on Issue #1893:

it has the highest number of linear memory definitions of all SIMD tests

Whoa nice find, that gives me an idea and testing locally it drastically reduces the memory usage of qemu (10GB -> 600MB). I think that means we can fix our CI quite easily actually!

view this post on Zulip Wasmtime GitHub notifications bot (Jun 18 2020 at 02:05):

alexcrichton closed Issue #1893:

The AArch64 CI test that runs using QEMU fails consistently for PR #1871 and the reasons are not clear - here's the relevant excerpt from the log:

2020-06-13T16:29:49.3730503Z test wast::Cranelift::spec::simd::simd_i32x4_cmp ... ok
2020-06-13T16:29:57.9345959Z test wast::Cranelift::spec::simd::simd_i8x16_sat_arith ... ignored
2020-06-13T16:30:08.5287111Z test wast::Cranelift::spec::simd::simd_lane ... ignored
2020-06-13T16:30:15.8261749Z test wast::Cranelift::spec::simd::simd_load ... ignored
2020-06-13T16:49:23.7624987Z error: test failed, to rerun pass '-p wasmtime-cli --test all'
2020-06-13T16:49:23.7648421Z
2020-06-13T16:49:23.7651248Z Caused by:
2020-06-13T16:49:23.7664954Z   process didn't exit successfully: `/home/runner/qemu/bin/qemu-aarch64 -L /usr/aarch64-linux-gnu /home/runner/work/wasmtime/wasmtime/target/aarch64-unknown-linux-gnu/release/deps/all-0af4aa3748ec4770` (signal: 9, SIGKILL: kill)
2020-06-13T16:49:24.0613948Z ##[error]Process completed with exit code 101.
2020-06-13T16:49:25.4620071Z Post job cleanup.

I have reproduced the test environment locally using the following commands:

rm -rf qemu-5.0.0 ${HOME}/qemu
curl https://download.qemu.org/qemu-5.0.0.tar.xz | tar xJf -
cd qemu-5.0.0
./configure --target-list=aarch64-linux-user --prefix=${HOME}/qemu --disable-tools --disable-slirp --disable-fdt --disable-capstone --disable-docs
make -j$(nproc) install
cd ..
RUSTFLAGS="-D warnings" \
  CARGO_INCREMENTAL=0 \
  CARGO_PROFILE_DEV_DEBUG=1 \
  CARGO_PROFILE_TEST_DEBUG=1 \
  CARGO_BUILD_TARGET=aarch64-unknown-linux-gnu \
  CARGO_TARGET_AARCH64_UNKNOWN_LINUX_GNU_RUNNER="${HOME}/qemu/bin/qemu-aarch64 -L /usr/aarch64-linux-gnu" \
  CARGO_TARGET_AARCH64_UNKNOWN_LINUX_GNU_LINKER=aarch64-linux-gnu-gcc \
  RUST_BACKTRACE=1 \
  cargo test \
  --features test-programs/test_programs \
  --release \
  --all \
  --exclude lightbeam \
  --exclude peepmatic \
  --exclude peepmatic-automata \
  --exclude peepmatic-fuzzing \
  --exclude peepmatic-macro \
  --exclude peepmatic-runtime \
  --exclude peepmatic-test \
  --exclude wasmtime-fuzz

However, I don't experience any test failures. In addition to that, I don't see any issues either when I run the test natively in an AArch64 environment. In that case the list of commands can be simplified to:

cargo test --release --all --exclude lightbeam

Note that the --features test-programs/test_programs parameter is omitted because it requires rust-lld, which appears not to be a part of the native AArch64 toolchain.

This issue has also been discussed in PR #1802.

cc @cfallin


Last updated: Oct 23 2024 at 20:03 UTC