abrown labeled Issue #1407:
What do you expect to happen? What does actually happen? Does it panic, and if so, with which assertion?
wasmtime
traps with an OOB memory access; Node does not. In Node$ node --version v13.9.0 $ node --experimental-wasm-simd emscripten-built-for-js.js source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes decode: vertex 16.32 ms (1.83 GB/sec), index 11.15 ms (2.00 GB/sec) decode: vertex 16.33 ms (1.83 GB/sec), index 11.15 ms (2.00 GB/sec) decode: vertex 16.57 ms (1.80 GB/sec), index 11.23 ms (1.99 GB/sec) decode: vertex 16.19 ms (1.84 GB/sec), index 11.35 ms (1.97 GB/sec) decode: vertex 16.18 ms (1.85 GB/sec), index 11.16 ms (2.00 GB/sec) decode: vertex 16.12 ms (1.85 GB/sec), index 11.19 ms (2.00 GB/sec) decode: vertex 16.19 ms (1.84 GB/sec), index 11.15 ms (2.01 GB/sec) decode: vertex 16.16 ms (1.85 GB/sec), index 11.14 ms (2.01 GB/sec) decode: vertex 16.15 ms (1.85 GB/sec), index 11.17 ms (2.00 GB/sec) decode: vertex 16.17 ms (1.85 GB/sec), index 11.16 ms (2.00 GB/sec) pass 1: vertex data 18518204 bytes, index data 2001016 bytes decode: vertex 16.12 ms (1.85 GB/sec), index 11.07 ms (2.02 GB/sec) decode: vertex 16.17 ms (1.85 GB/sec), index 11.08 ms (2.02 GB/sec) decode: vertex 16.11 ms (1.85 GB/sec), index 11.11 ms (2.01 GB/sec) decode: vertex 16.21 ms (1.84 GB/sec), index 11.09 ms (2.02 GB/sec) decode: vertex 16.17 ms (1.85 GB/sec), index 11.10 ms (2.01 GB/sec) decode: vertex 16.07 ms (1.86 GB/sec), index 11.13 ms (2.01 GB/sec) decode: vertex 16.19 ms (1.84 GB/sec), index 11.06 ms (2.02 GB/sec) decode: vertex 16.17 ms (1.85 GB/sec), index 11.10 ms (2.01 GB/sec) decode: vertex 16.04 ms (1.86 GB/sec), index 11.14 ms (2.01 GB/sec) decode: vertex 16.19 ms (1.84 GB/sec), index 11.07 ms (2.02 GB/sec) filters: oct8 data 4000000 bytes, oct12/quat12 data 8000000 bytes filter: oct8 2.12 ms (1.76 GB/sec), oct12 2.26 ms (3.29 GB/sec), quat12 2.84 ms (2.63 GB/sec) filter: oct8 2.11 ms (1.76 GB/sec), oct12 2.19 ms (3.40 GB/sec), quat12 2.79 ms (2.67 GB/sec) filter: oct8 2.11 ms (1.77 GB/sec), oct12 2.17 ms (3.43 GB/sec), quat12 2.79 ms (2.67 GB/sec) filter: oct8 2.13 ms (1.75 GB/sec), oct12 2.25 ms (3.32 GB/sec), quat12 2.86 ms (2.61 GB/sec) filter: oct8 2.10 ms (1.77 GB/sec), oct12 2.17 ms (3.43 GB/sec), quat12 2.80 ms (2.66 GB/sec) filter: oct8 2.09 ms (1.78 GB/sec), oct12 2.16 ms (3.45 GB/sec), quat12 2.81 ms (2.65 GB/sec) filter: oct8 2.13 ms (1.75 GB/sec), oct12 2.28 ms (3.26 GB/sec), quat12 2.82 ms (2.64 GB/sec) filter: oct8 2.23 ms (1.67 GB/sec), oct12 2.16 ms (3.44 GB/sec), quat12 2.81 ms (2.65 GB/sec) filter: oct8 2.10 ms (1.78 GB/sec), oct12 2.15 ms (3.47 GB/sec), quat12 2.83 ms (2.63 GB/sec) filter: oct8 2.14 ms (1.74 GB/sec), oct12 2.17 ms (3.44 GB/sec), quat12 2.80 ms (2.66 GB/sec)In wasmtime (on branch https://github.com/abrown/wasmtime/tree/additional-i8x16-shift which implements needed instructions). I tried various versions of the same code built with different tools:
$ ls ../oob/*.wasm | xargs -I{} sh -c "cargo run -- run --enable-simd --disable-cache {}" Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/wasmtime run --enable-simd --disable-cache ../oob/emscripten-built-for-js.wasm` Error: failed to run main module `../oob/emscripten-built-for-js.wasm` Caused by: import module `a` was not found Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/wasmtime run --enable-simd --disable-cache ../oob/emscripten-built.wasm` source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes Error: failed to run main module `../oob/emscripten-built.wasm` Caused by: 0: failed to invoke `_start` 1: wasm trap: out of bounds memory access, source location: @7d97 wasm backtrace: 0: <unknown>!<wasm function 74> 1: <unknown>!<wasm function 37> 2: <unknown>!<wasm function 75> 3: <unknown>!<wasm function 28> 4: <unknown>!<wasm function 67> Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/wasmtime run --enable-simd --disable-cache ../oob/wasi-sdk-built-extra-memory.wasm` source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes Error: failed to run main module `../oob/wasi-sdk-built-extra-memory.wasm` Caused by: 0: failed to invoke `_start` 1: wasm trap: out of bounds memory access, source location: @22a5 wasm backtrace: 0: <unknown>!meshopt::decodeVertexBlockSimd(unsigned char const*, unsigned char const*, unsigned char*, unsigned long, unsigned long, unsigned char*) 1: <unknown>!meshopt_decodeVertexBuffer 2: <unknown>!benchCodecs(std::__2::vector<Vertex, std::__2::allocator<Vertex> > const&, std::__2::vector<unsigned int, std::__2::allocator<unsigned int> > const&) 3: <unknown>!__original_main 4: <unknown>!_start Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/wasmtime run --enable-simd --disable-cache ../oob/wasi-sdk-built.wasm` source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes Error: failed to run main module `../oob/wasi-sdk-built.wasm` Caused by: 0: failed to invoke `_start` 1: wasm trap: out of bounds memory access, source location: @22a4 wasm backtrace: 0: <unknown>!meshopt::decodeVertexBlockSimd(unsigned char const*, unsigned char const*, unsigned char*, unsigned long, unsigned long, unsigned char*) 1: <unknown>!meshopt_decodeVertexBuffer 2: <unknown>!benchCodecs(std::__2::vector<Vertex, std::__2::allocator<Vertex> > const&, std::__2::vector<unsigned int, std::__2::allocator<unsigned int> > const&) 3: <unknown>!__original_main 4: <unknown>!_startWhich Wasmtime version / commit hash / branch are you using?
On branch https://github.com/abrown/wasmtime/tree/additional-i8x16-shift which implements needed instructions.
What are the steps to reproduce the issue?
See above. Also, here are steps for building the Wasm modules from https://github.com/zeux/meshoptimizer/tree/9047ac1936351d0508bb26b5b82ec1101f9735b4:
# wasi-sdi-built.wasm (2^28 bytes of memory, 4096x64K pages) $ /opt/wasi-sdk/bin/clang++ --version clang version 11.0.0 (https://github.com/llvm/llvm-project 46bb6613a31fd43b6d4485ce7e71a387dc22cbc7) Target: wasm32-unknown-wasi Thread model: posix InstalledDir: /opt/wasi-sdk/bin $ make clean && make codecbench-simd.wasm /opt/wasi-sdk/bin/clang++ tools/codecbench.cpp src/vertexcodec.cpp src/vertexfilter.cpp src/overdrawanalyzer.cpp src/indexgenerator.cpp src/vcacheoptimizer.cpp src/indexcodec.cpp src/vfetchanalyzer.cpp src/spatialorder.cpp src/clusterizer.cpp src/allocator.cpp src/vcacheanalyzer.cpp src/vfetchoptimizer.cpp src/overdrawoptimizer.cpp src/simplifier.cpp src/stripifier.cpp -O3 -DNDEBUG -fno-exceptions -Wl,--initial-memory=268435456 -msimd128 -o codecbench-simd.wasm # wasi-sdk-built-extra-memory.wasm (2^30 bytes, 16384x64K pages) $ make clean && make codecbench-simd.wasm /opt/wasi-sdk/bin/clang++ tools/codecbench.cpp src/vertexcodec.cpp src/vertexfilter.cpp src/overdrawanalyzer.cpp src/indexgenerator.cpp src/vcacheoptimizer.cpp src/indexcodec.cpp src/vfetchanalyzer.cpp src/spatialorder.cpp src/clusterizer.cpp src/allocator.cpp src/vcacheanalyzer.cpp src/vfetchoptimizer.cpp src/overdrawoptimizer.cpp src/simplifier.cpp src/stripifier.cpp -O3 -DNDEBUG -fno-exceptions -Wl,--initial-memory=1073741824 -msimd128 -o codecbench-simd.wasm # emscripten-built.wasm $ emcc --version emcc (Emscripten gcc/clang-like replacement) 1.39.10 (commit 1bd7d547598f3fc74699c172f6c9c59a1e8484f1) $ make clean && make codecbench-simd.wasm emcc tools/codecbench.cpp src/vertexcodec.cpp src/vertexfilter.cpp src/overdrawanalyzer.cpp src/indexgenerator.cpp src/vcacheoptimizer.cpp src/indexcodec.cpp src/vfetchanalyzer.cpp src/spatialorder.cpp src/clusterizer.cpp src/allocator.cpp src/vcacheanalyzer.cpp src/vfetchoptimizer.cpp src/overdrawoptimizer.cpp src/simplifier.cpp src/stripifier.cpp -O3 -DNDEBUG -s TOTAL_MEMORY=268435456 -msimd128 -o codecbench-simd.wasm # then generated wat and dump files with ls *.wasm | xargs -I{} sh -c "wasm2wat --enable-all {} > {}.wat" ls *.wasm | xargs -I{} sh -c "wasm-objdump -d {} > {}.dump"emscripten-built.wasm.txt
emscripten-built.wasm.wat.txt
emscripten-built-for-js.js.txt
emscripten-built-for-js.wasm.dump.txt
emscripten-built-for-js.wasm.txt
emscripten-built-for-js.wasm.wat.txt
wasi-sdk-built.wasm.dump.txt
wasi-sdk-built.wasm.txt
wasi-sdk-built.wasm.wat.txt
[wasi-sdk-built-extra-memory.wasm.dump.txt](https://github.com/bytecodealliance/wasmti
[message truncated]
abrown opened Issue #1407:
What do you expect to happen? What does actually happen? Does it panic, and if so, with which assertion?
wasmtime
traps with an OOB memory access; Node does not. In Node$ node --version v13.9.0 $ node --experimental-wasm-simd emscripten-built-for-js.js source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes decode: vertex 16.32 ms (1.83 GB/sec), index 11.15 ms (2.00 GB/sec) decode: vertex 16.33 ms (1.83 GB/sec), index 11.15 ms (2.00 GB/sec) decode: vertex 16.57 ms (1.80 GB/sec), index 11.23 ms (1.99 GB/sec) decode: vertex 16.19 ms (1.84 GB/sec), index 11.35 ms (1.97 GB/sec) decode: vertex 16.18 ms (1.85 GB/sec), index 11.16 ms (2.00 GB/sec) decode: vertex 16.12 ms (1.85 GB/sec), index 11.19 ms (2.00 GB/sec) decode: vertex 16.19 ms (1.84 GB/sec), index 11.15 ms (2.01 GB/sec) decode: vertex 16.16 ms (1.85 GB/sec), index 11.14 ms (2.01 GB/sec) decode: vertex 16.15 ms (1.85 GB/sec), index 11.17 ms (2.00 GB/sec) decode: vertex 16.17 ms (1.85 GB/sec), index 11.16 ms (2.00 GB/sec) pass 1: vertex data 18518204 bytes, index data 2001016 bytes decode: vertex 16.12 ms (1.85 GB/sec), index 11.07 ms (2.02 GB/sec) decode: vertex 16.17 ms (1.85 GB/sec), index 11.08 ms (2.02 GB/sec) decode: vertex 16.11 ms (1.85 GB/sec), index 11.11 ms (2.01 GB/sec) decode: vertex 16.21 ms (1.84 GB/sec), index 11.09 ms (2.02 GB/sec) decode: vertex 16.17 ms (1.85 GB/sec), index 11.10 ms (2.01 GB/sec) decode: vertex 16.07 ms (1.86 GB/sec), index 11.13 ms (2.01 GB/sec) decode: vertex 16.19 ms (1.84 GB/sec), index 11.06 ms (2.02 GB/sec) decode: vertex 16.17 ms (1.85 GB/sec), index 11.10 ms (2.01 GB/sec) decode: vertex 16.04 ms (1.86 GB/sec), index 11.14 ms (2.01 GB/sec) decode: vertex 16.19 ms (1.84 GB/sec), index 11.07 ms (2.02 GB/sec) filters: oct8 data 4000000 bytes, oct12/quat12 data 8000000 bytes filter: oct8 2.12 ms (1.76 GB/sec), oct12 2.26 ms (3.29 GB/sec), quat12 2.84 ms (2.63 GB/sec) filter: oct8 2.11 ms (1.76 GB/sec), oct12 2.19 ms (3.40 GB/sec), quat12 2.79 ms (2.67 GB/sec) filter: oct8 2.11 ms (1.77 GB/sec), oct12 2.17 ms (3.43 GB/sec), quat12 2.79 ms (2.67 GB/sec) filter: oct8 2.13 ms (1.75 GB/sec), oct12 2.25 ms (3.32 GB/sec), quat12 2.86 ms (2.61 GB/sec) filter: oct8 2.10 ms (1.77 GB/sec), oct12 2.17 ms (3.43 GB/sec), quat12 2.80 ms (2.66 GB/sec) filter: oct8 2.09 ms (1.78 GB/sec), oct12 2.16 ms (3.45 GB/sec), quat12 2.81 ms (2.65 GB/sec) filter: oct8 2.13 ms (1.75 GB/sec), oct12 2.28 ms (3.26 GB/sec), quat12 2.82 ms (2.64 GB/sec) filter: oct8 2.23 ms (1.67 GB/sec), oct12 2.16 ms (3.44 GB/sec), quat12 2.81 ms (2.65 GB/sec) filter: oct8 2.10 ms (1.78 GB/sec), oct12 2.15 ms (3.47 GB/sec), quat12 2.83 ms (2.63 GB/sec) filter: oct8 2.14 ms (1.74 GB/sec), oct12 2.17 ms (3.44 GB/sec), quat12 2.80 ms (2.66 GB/sec)In wasmtime (on branch https://github.com/abrown/wasmtime/tree/additional-i8x16-shift which implements needed instructions). I tried various versions of the same code built with different tools:
$ ls ../oob/*.wasm | xargs -I{} sh -c "cargo run -- run --enable-simd --disable-cache {}" Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/wasmtime run --enable-simd --disable-cache ../oob/emscripten-built-for-js.wasm` Error: failed to run main module `../oob/emscripten-built-for-js.wasm` Caused by: import module `a` was not found Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/wasmtime run --enable-simd --disable-cache ../oob/emscripten-built.wasm` source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes Error: failed to run main module `../oob/emscripten-built.wasm` Caused by: 0: failed to invoke `_start` 1: wasm trap: out of bounds memory access, source location: @7d97 wasm backtrace: 0: <unknown>!<wasm function 74> 1: <unknown>!<wasm function 37> 2: <unknown>!<wasm function 75> 3: <unknown>!<wasm function 28> 4: <unknown>!<wasm function 67> Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/wasmtime run --enable-simd --disable-cache ../oob/wasi-sdk-built-extra-memory.wasm` source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes Error: failed to run main module `../oob/wasi-sdk-built-extra-memory.wasm` Caused by: 0: failed to invoke `_start` 1: wasm trap: out of bounds memory access, source location: @22a5 wasm backtrace: 0: <unknown>!meshopt::decodeVertexBlockSimd(unsigned char const*, unsigned char const*, unsigned char*, unsigned long, unsigned long, unsigned char*) 1: <unknown>!meshopt_decodeVertexBuffer 2: <unknown>!benchCodecs(std::__2::vector<Vertex, std::__2::allocator<Vertex> > const&, std::__2::vector<unsigned int, std::__2::allocator<unsigned int> > const&) 3: <unknown>!__original_main 4: <unknown>!_start Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/wasmtime run --enable-simd --disable-cache ../oob/wasi-sdk-built.wasm` source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes Error: failed to run main module `../oob/wasi-sdk-built.wasm` Caused by: 0: failed to invoke `_start` 1: wasm trap: out of bounds memory access, source location: @22a4 wasm backtrace: 0: <unknown>!meshopt::decodeVertexBlockSimd(unsigned char const*, unsigned char const*, unsigned char*, unsigned long, unsigned long, unsigned char*) 1: <unknown>!meshopt_decodeVertexBuffer 2: <unknown>!benchCodecs(std::__2::vector<Vertex, std::__2::allocator<Vertex> > const&, std::__2::vector<unsigned int, std::__2::allocator<unsigned int> > const&) 3: <unknown>!__original_main 4: <unknown>!_startWhich Wasmtime version / commit hash / branch are you using?
On branch https://github.com/abrown/wasmtime/tree/additional-i8x16-shift which implements needed instructions.
What are the steps to reproduce the issue?
See above. Also, here are steps for building the Wasm modules from https://github.com/zeux/meshoptimizer/tree/9047ac1936351d0508bb26b5b82ec1101f9735b4:
# wasi-sdi-built.wasm (2^28 bytes of memory, 4096x64K pages) $ /opt/wasi-sdk/bin/clang++ --version clang version 11.0.0 (https://github.com/llvm/llvm-project 46bb6613a31fd43b6d4485ce7e71a387dc22cbc7) Target: wasm32-unknown-wasi Thread model: posix InstalledDir: /opt/wasi-sdk/bin $ make clean && make codecbench-simd.wasm /opt/wasi-sdk/bin/clang++ tools/codecbench.cpp src/vertexcodec.cpp src/vertexfilter.cpp src/overdrawanalyzer.cpp src/indexgenerator.cpp src/vcacheoptimizer.cpp src/indexcodec.cpp src/vfetchanalyzer.cpp src/spatialorder.cpp src/clusterizer.cpp src/allocator.cpp src/vcacheanalyzer.cpp src/vfetchoptimizer.cpp src/overdrawoptimizer.cpp src/simplifier.cpp src/stripifier.cpp -O3 -DNDEBUG -fno-exceptions -Wl,--initial-memory=268435456 -msimd128 -o codecbench-simd.wasm # wasi-sdk-built-extra-memory.wasm (2^30 bytes, 16384x64K pages) $ make clean && make codecbench-simd.wasm /opt/wasi-sdk/bin/clang++ tools/codecbench.cpp src/vertexcodec.cpp src/vertexfilter.cpp src/overdrawanalyzer.cpp src/indexgenerator.cpp src/vcacheoptimizer.cpp src/indexcodec.cpp src/vfetchanalyzer.cpp src/spatialorder.cpp src/clusterizer.cpp src/allocator.cpp src/vcacheanalyzer.cpp src/vfetchoptimizer.cpp src/overdrawoptimizer.cpp src/simplifier.cpp src/stripifier.cpp -O3 -DNDEBUG -fno-exceptions -Wl,--initial-memory=1073741824 -msimd128 -o codecbench-simd.wasm # emscripten-built.wasm $ emcc --version emcc (Emscripten gcc/clang-like replacement) 1.39.10 (commit 1bd7d547598f3fc74699c172f6c9c59a1e8484f1) $ make clean && make codecbench-simd.wasm emcc tools/codecbench.cpp src/vertexcodec.cpp src/vertexfilter.cpp src/overdrawanalyzer.cpp src/indexgenerator.cpp src/vcacheoptimizer.cpp src/indexcodec.cpp src/vfetchanalyzer.cpp src/spatialorder.cpp src/clusterizer.cpp src/allocator.cpp src/vcacheanalyzer.cpp src/vfetchoptimizer.cpp src/overdrawoptimizer.cpp src/simplifier.cpp src/stripifier.cpp -O3 -DNDEBUG -s TOTAL_MEMORY=268435456 -msimd128 -o codecbench-simd.wasm # then generated wat and dump files with ls *.wasm | xargs -I{} sh -c "wasm2wat --enable-all {} > {}.wat" ls *.wasm | xargs -I{} sh -c "wasm-objdump -d {} > {}.dump"emscripten-built.wasm.txt
emscripten-built.wasm.wat.txt
emscripten-built-for-js.js.txt
emscripten-built-for-js.wasm.dump.txt
emscripten-built-for-js.wasm.txt
emscripten-built-for-js.wasm.wat.txt
wasi-sdk-built.wasm.dump.txt
wasi-sdk-built.wasm.txt
wasi-sdk-built.wasm.wat.txt
[wasi-sdk-built-extra-memory.wasm.dump.txt](https://github.com/bytecodealliance/wasmtim
[message truncated]
abrown commented on Issue #1407:
It is interesting that the OOB is triggered in different places:
$ wasm-objdump -d emscripten-built.wasm | grep -A5 -B5 7d97 007d8a: 0e 03 01 02 03 00 | br_table 1 2 3 0 007d90: 0b | end 007d91: 20 07 | local.get 7 007d93: 41 00 | i32.const 0 007d95: fd 0c | i32x4.splat >007d97: fd 01 04 10 | v128.store 4 16 007d9b: 0c 03 | br 3 007d9d: 0b | end 007d9e: 20 07 | local.get 7 007da0: 20 00 | local.get 0 007da2: fd 00 00 04 | v128.load 0 4 abrown@abrown-desk:~/Code/oob$ wasm-objdump -d wasi-sdk-built.wasm | grep -A5 -B5 22a4 002297: fd 06 00 | i8x16.extract_lane_u 0 00229a: 6a | i32.add 00229b: 20 17 | local.get 23 00229d: 41 f0 b0 80 80 00 | i32.const 6256 0022a3: 6a | i32.add >0022a4: 2d 00 00 | i32.load8_u 0 0 0022a7: 6a | i32.add 0022a8: 21 00 | local.set 0 0022aa: 0c 02 | br 2 0022ac: 0b | end 0022ad: 20 0f | local.get 15
abrown edited a comment on Issue #1407:
It is interesting that the OOB is triggered in different places:
$ wasm-objdump -d emscripten-built.wasm | grep -A5 -B5 7d97 007d8a: 0e 03 01 02 03 00 | br_table 1 2 3 0 007d90: 0b | end 007d91: 20 07 | local.get 7 007d93: 41 00 | i32.const 0 007d95: fd 0c | i32x4.splat >007d97: fd 01 04 10 | v128.store 4 16 007d9b: 0c 03 | br 3 007d9d: 0b | end 007d9e: 20 07 | local.get 7 007da0: 20 00 | local.get 0 007da2: fd 00 00 04 | v128.load 0 4 $ wasm-objdump -d wasi-sdk-built.wasm | grep -A5 -B5 22a4 002297: fd 06 00 | i8x16.extract_lane_u 0 00229a: 6a | i32.add 00229b: 20 17 | local.get 23 00229d: 41 f0 b0 80 80 00 | i32.const 6256 0022a3: 6a | i32.add >0022a4: 2d 00 00 | i32.load8_u 0 0 0022a7: 6a | i32.add 0022a8: 21 00 | local.set 0 0022aa: 0c 02 | br 2 0022ac: 0b | end 0022ad: 20 0f | local.get 15
abrown edited Issue #1407:
What do you expect to happen? What does actually happen? Does it panic, and if so, with which assertion?
wasmtime
traps with an OOB memory access; Node does not.In Node the code runs as expected (though we have to run it from the JS wrapper):
$ node --version v13.9.0 $ node --experimental-wasm-simd emscripten-built-for-js.js source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes decode: vertex 16.32 ms (1.83 GB/sec), index 11.15 ms (2.00 GB/sec) decode: vertex 16.33 ms (1.83 GB/sec), index 11.15 ms (2.00 GB/sec) decode: vertex 16.57 ms (1.80 GB/sec), index 11.23 ms (1.99 GB/sec) decode: vertex 16.19 ms (1.84 GB/sec), index 11.35 ms (1.97 GB/sec) decode: vertex 16.18 ms (1.85 GB/sec), index 11.16 ms (2.00 GB/sec) decode: vertex 16.12 ms (1.85 GB/sec), index 11.19 ms (2.00 GB/sec) decode: vertex 16.19 ms (1.84 GB/sec), index 11.15 ms (2.01 GB/sec) decode: vertex 16.16 ms (1.85 GB/sec), index 11.14 ms (2.01 GB/sec) decode: vertex 16.15 ms (1.85 GB/sec), index 11.17 ms (2.00 GB/sec) decode: vertex 16.17 ms (1.85 GB/sec), index 11.16 ms (2.00 GB/sec) pass 1: vertex data 18518204 bytes, index data 2001016 bytes decode: vertex 16.12 ms (1.85 GB/sec), index 11.07 ms (2.02 GB/sec) decode: vertex 16.17 ms (1.85 GB/sec), index 11.08 ms (2.02 GB/sec) decode: vertex 16.11 ms (1.85 GB/sec), index 11.11 ms (2.01 GB/sec) decode: vertex 16.21 ms (1.84 GB/sec), index 11.09 ms (2.02 GB/sec) decode: vertex 16.17 ms (1.85 GB/sec), index 11.10 ms (2.01 GB/sec) decode: vertex 16.07 ms (1.86 GB/sec), index 11.13 ms (2.01 GB/sec) decode: vertex 16.19 ms (1.84 GB/sec), index 11.06 ms (2.02 GB/sec) decode: vertex 16.17 ms (1.85 GB/sec), index 11.10 ms (2.01 GB/sec) decode: vertex 16.04 ms (1.86 GB/sec), index 11.14 ms (2.01 GB/sec) decode: vertex 16.19 ms (1.84 GB/sec), index 11.07 ms (2.02 GB/sec) filters: oct8 data 4000000 bytes, oct12/quat12 data 8000000 bytes filter: oct8 2.12 ms (1.76 GB/sec), oct12 2.26 ms (3.29 GB/sec), quat12 2.84 ms (2.63 GB/sec) filter: oct8 2.11 ms (1.76 GB/sec), oct12 2.19 ms (3.40 GB/sec), quat12 2.79 ms (2.67 GB/sec) filter: oct8 2.11 ms (1.77 GB/sec), oct12 2.17 ms (3.43 GB/sec), quat12 2.79 ms (2.67 GB/sec) filter: oct8 2.13 ms (1.75 GB/sec), oct12 2.25 ms (3.32 GB/sec), quat12 2.86 ms (2.61 GB/sec) filter: oct8 2.10 ms (1.77 GB/sec), oct12 2.17 ms (3.43 GB/sec), quat12 2.80 ms (2.66 GB/sec) filter: oct8 2.09 ms (1.78 GB/sec), oct12 2.16 ms (3.45 GB/sec), quat12 2.81 ms (2.65 GB/sec) filter: oct8 2.13 ms (1.75 GB/sec), oct12 2.28 ms (3.26 GB/sec), quat12 2.82 ms (2.64 GB/sec) filter: oct8 2.23 ms (1.67 GB/sec), oct12 2.16 ms (3.44 GB/sec), quat12 2.81 ms (2.65 GB/sec) filter: oct8 2.10 ms (1.78 GB/sec), oct12 2.15 ms (3.47 GB/sec), quat12 2.83 ms (2.63 GB/sec) filter: oct8 2.14 ms (1.74 GB/sec), oct12 2.17 ms (3.44 GB/sec), quat12 2.80 ms (2.66 GB/sec)In wasmtime (on branch https://github.com/abrown/wasmtime/tree/additional-i8x16-shift which implements needed instructions). I tried various versions of the same code built with different tools:
$ ls ../oob/*.wasm | xargs -I{} sh -c "cargo run -- run --enable-simd --disable-cache {}" Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/wasmtime run --enable-simd --disable-cache ../oob/emscripten-built-for-js.wasm` Error: failed to run main module `../oob/emscripten-built-for-js.wasm` Caused by: import module `a` was not found Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/wasmtime run --enable-simd --disable-cache ../oob/emscripten-built.wasm` source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes Error: failed to run main module `../oob/emscripten-built.wasm` Caused by: 0: failed to invoke `_start` 1: wasm trap: out of bounds memory access, source location: @7d97 wasm backtrace: 0: <unknown>!<wasm function 74> 1: <unknown>!<wasm function 37> 2: <unknown>!<wasm function 75> 3: <unknown>!<wasm function 28> 4: <unknown>!<wasm function 67> Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/wasmtime run --enable-simd --disable-cache ../oob/wasi-sdk-built-extra-memory.wasm` source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes Error: failed to run main module `../oob/wasi-sdk-built-extra-memory.wasm` Caused by: 0: failed to invoke `_start` 1: wasm trap: out of bounds memory access, source location: @22a5 wasm backtrace: 0: <unknown>!meshopt::decodeVertexBlockSimd(unsigned char const*, unsigned char const*, unsigned char*, unsigned long, unsigned long, unsigned char*) 1: <unknown>!meshopt_decodeVertexBuffer 2: <unknown>!benchCodecs(std::__2::vector<Vertex, std::__2::allocator<Vertex> > const&, std::__2::vector<unsigned int, std::__2::allocator<unsigned int> > const&) 3: <unknown>!__original_main 4: <unknown>!_start Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/wasmtime run --enable-simd --disable-cache ../oob/wasi-sdk-built.wasm` source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes Error: failed to run main module `../oob/wasi-sdk-built.wasm` Caused by: 0: failed to invoke `_start` 1: wasm trap: out of bounds memory access, source location: @22a4 wasm backtrace: 0: <unknown>!meshopt::decodeVertexBlockSimd(unsigned char const*, unsigned char const*, unsigned char*, unsigned long, unsigned long, unsigned char*) 1: <unknown>!meshopt_decodeVertexBuffer 2: <unknown>!benchCodecs(std::__2::vector<Vertex, std::__2::allocator<Vertex> > const&, std::__2::vector<unsigned int, std::__2::allocator<unsigned int> > const&) 3: <unknown>!__original_main 4: <unknown>!_startWhich Wasmtime version / commit hash / branch are you using?
On branch https://github.com/abrown/wasmtime/tree/additional-i8x16-shift which implements needed instructions.
What are the steps to reproduce the issue?
See above. Also, here are steps for building the Wasm modules from https://github.com/zeux/meshoptimizer/tree/9047ac1936351d0508bb26b5b82ec1101f9735b4:
# wasi-sdi-built.wasm (2^28 bytes of memory, 4096x64K pages) $ /opt/wasi-sdk/bin/clang++ --version clang version 11.0.0 (https://github.com/llvm/llvm-project 46bb6613a31fd43b6d4485ce7e71a387dc22cbc7) Target: wasm32-unknown-wasi Thread model: posix InstalledDir: /opt/wasi-sdk/bin $ make clean && make codecbench-simd.wasm /opt/wasi-sdk/bin/clang++ tools/codecbench.cpp src/vertexcodec.cpp src/vertexfilter.cpp src/overdrawanalyzer.cpp src/indexgenerator.cpp src/vcacheoptimizer.cpp src/indexcodec.cpp src/vfetchanalyzer.cpp src/spatialorder.cpp src/clusterizer.cpp src/allocator.cpp src/vcacheanalyzer.cpp src/vfetchoptimizer.cpp src/overdrawoptimizer.cpp src/simplifier.cpp src/stripifier.cpp -O3 -DNDEBUG -fno-exceptions -Wl,--initial-memory=268435456 -msimd128 -o codecbench-simd.wasm # wasi-sdk-built-extra-memory.wasm (2^30 bytes, 16384x64K pages) $ make clean && make codecbench-simd.wasm /opt/wasi-sdk/bin/clang++ tools/codecbench.cpp src/vertexcodec.cpp src/vertexfilter.cpp src/overdrawanalyzer.cpp src/indexgenerator.cpp src/vcacheoptimizer.cpp src/indexcodec.cpp src/vfetchanalyzer.cpp src/spatialorder.cpp src/clusterizer.cpp src/allocator.cpp src/vcacheanalyzer.cpp src/vfetchoptimizer.cpp src/overdrawoptimizer.cpp src/simplifier.cpp src/stripifier.cpp -O3 -DNDEBUG -fno-exceptions -Wl,--initial-memory=1073741824 -msimd128 -o codecbench-simd.wasm # emscripten-built.wasm $ emcc --version emcc (Emscripten gcc/clang-like replacement) 1.39.10 (commit 1bd7d547598f3fc74699c172f6c9c59a1e8484f1) $ make clean && make codecbench-simd.wasm emcc tools/codecbench.cpp src/vertexcodec.cpp src/vertexfilter.cpp src/overdrawanalyzer.cpp src/indexgenerator.cpp src/vcacheoptimizer.cpp src/indexcodec.cpp src/vfetchanalyzer.cpp src/spatialorder.cpp src/clusterizer.cpp src/allocator.cpp src/vcacheanalyzer.cpp src/vfetchoptimizer.cpp src/overdrawoptimizer.cpp src/simplifier.cpp src/stripifier.cpp -O3 -DNDEBUG -s TOTAL_MEMORY=268435456 -msimd128 -o codecbench-simd.wasm # then generated wat and dump files with ls *.wasm | xargs -I{} sh -c "wasm2wat --enable-all {} > {}.wat" ls *.wasm | xargs -I{} sh -c "wasm-objdump -d {} > {}.dump"emscripten-built.wasm.txt
emscripten-built.wasm.wat.txt
emscripten-built-for-js.js.txt
emscripten-built-for-js.wasm.dump.txt
emscripten-built-for-js.wasm.txt
emscripten-built-for-js.wasm.wat.txt
wasi-sdk-built.wasm.dump.txt
wasi-sdk-built.wasm.txt
wasi-sdk-built.wasm.wat.txt
[wasi-sdk-b
[message truncated]
alexcrichton commented on Issue #1407:
Are the wasm files emitted here intended to be run directly? I would imagine that the JS does some sort of setup/glue which might prepare the runtime and/or size memory appropriately. Without that it might be expected that the wasm faults if run directly? (mostly in that node is running more code than we are, so a difference in behavior may not mean something wrong is happening)
abrown commented on Issue #1407:
Perhaps there is some Emscripten/Node-specific setup that I'm not aware of (@zeux, what do you think?). In the minified JS I do see code like
var DYNAMIC_BASE=5249984,DYNAMICTOP_PTR=6944
that might be doing something special. But I wouldn't think that the wasi-sdk-built Wasm code should need any of that setup: the files compiled are normal-looking C++.
abrown commented on Issue #1407:
In answer to,
Are the wasm files emitted here intended to be run directly?
I think, yes, the files that are not
*-for-js
should be runnable directly. Or at least looking at the*.wat
versions I do not see why not.
zeux commented on Issue #1407:
I'm wondering if wasmtime is hitting one of the cases where the code might actually hit an OOB access in practice. There's a couple of TODO comments in the code around this, where the right thing to do is to use a load_splat, but load_splat isn't available in node/Chrome so I'm not using it...
Let me look closer at what these accesses are actually doing.
abrown commented on Issue #1407:
This shouldn't theoretically be an issue but Cranelift is lowering
load_splat
toload + splat
at the moment (eventually optimized by #1175)... in case that matters.
zeux commented on Issue #1407:
I tried to reproduce this but (maybe because I'm using a later version of Emscripten) I'm hitting this:
Error: failed to run main module `codecbench-simd.wasm` Caused by: 0: WebAssembly failed to compile 1: Compilation error: function u0:73(i64 vmctx, i64, i32, i32, i32, i32, i32) -> i32 system_v { ... @7bf3 v154 = iadd v152, v153 ;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ; error: inst149 (v154 = iadd.i32 v152, v153): arg 1 (v153) has type i8, expected i32wasm-validate doesn't agree with this assessment but maybe it's because it doesn't validate something properly? Attaching the .wasm file in question.
bjorn3 commented on Issue #1407:
Where is v153 defined?
zeux commented on Issue #1407:
abrown commented on Issue #1407:
@zeux, I was in a special branch that has some fixes and additional instructions that make it possible to get past those types of errors: https://github.com/abrown/wasmtime/tree/additional-i8x16-shift. I'm waiting on a review for #1377 and then I can try to merge #1409; then all of that functionality should be in master.
zeux commented on Issue #1407:
@abrown Ahh, git :( I did check out that branch initially but had issues with the submodule links, and forgot to switch back to it after re-cloning it recursively. After syncing to this branch properly I can indeed reproduce the error, thanks! Will update once I understand this more.
zeux commented on Issue #1407:
I strongly suspect this isn't an issue in the benchmark; the behavior seems highly dependent on the inlining here. Adding
noinline
to decodeBytesGroupSimd & decodeBytesSimd & decodeVertexBlockSimd fixes this. With only decodeBytesSimd marked as noinline, I get this instead:/mnt/c/work/meshoptimizer $ make -B codecbench-simd.wasm && ../wasmtime/target/debug/wasmtime --enable-simd codecbench-simd.wasm emcc tools/codecbench.cpp src/vertexcodec.cpp src/vertexfilter.cpp src/overdrawanalyzer.cpp src/indexgenerator.cpp src/vcacheoptimizer.cpp src/clusterizer.cpp src/indexcodec.cpp src/vfetchanalyzer.cpp src/spatialorder.cpp src/allocator.cpp src/vcacheanalyzer.cpp src/vfetchoptimizer.cpp src/overdrawoptimizer.c pp src/simplifier.cpp src/stripifier.cpp -O3 -DNDEBUG -s TOTAL_MEMORY=268435456 -msimd128 -o codecbench-simd.wasm -g source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes Error: failed to run main module `codecbench-simd.wasm` Caused by: 0: failed to invoke `_start` 1: wasm trap: call stack exhausted, source location: @- wasm backtrace: 0: <unknown>!meshopt::decodeBytesSimd(unsigned char const*, unsigned char const*, unsigned char*, unsigned long) 1: <unknown>!meshopt::decodeVertexBlockSimd(unsigned char const*, unsigned char const*, unsigned char*, unsigned long, unsigned long, unsigned char*) 2: <unknown>!meshopt_decodeVertexBuffer 3: <unknown>!benchCodecs(std::__2::vector<Vertex, std::__2::allocator<Vertex> > const&, std::__2::vector<unsigned int, std::__2::allocator<unsigned int> > const&) 4: <unknown>!__original_main 5: <unknown>!_startThe expected call sequence here is meshopt_decodeVertexBuffer -> decodeVertexBlockSimd -> decodeBytesSimd -> decodeBytesGroupSimd, with no recursion. Unsure what "call stack exhausted" indicates here...
Unfortunately trying to add prints to this to understand the behavior fixes the issue as well, so the investigation here might be complicated. Is there a way to coerce wasmtime to generate a debuggable binary?
zeux edited a comment on Issue #1407:
I strongly suspect this isn't an issue in the benchmark; the behavior seems highly dependent on the inlining here. Adding
noinline
to decodeBytesGroupSimd & decodeBytesSimd & decodeVertexBlockSimd fixes this. With only decodeBytesSimd marked as noinline, I get this instead:/mnt/c/work/meshoptimizer $ make -B codecbench-simd.wasm && ../wasmtime/target/debug/wasmtime --enable-simd codecbench-simd.wasm emcc tools/codecbench.cpp src/vertexcodec.cpp src/vertexfilter.cpp src/overdrawanalyzer.cpp src/indexgenerator.cpp src/vcacheoptimizer.cpp src/clusterizer.cpp src/indexcodec.cpp src/vfetchanalyzer.cpp src/spatialorder.cpp src/allocator.cpp src/vcacheanalyzer.cpp src/vfetchoptimizer.cpp src/overdrawoptimizer.c pp src/simplifier.cpp src/stripifier.cpp -O3 -DNDEBUG -s TOTAL_MEMORY=268435456 -msimd128 -o codecbench-simd.wasm -g source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes Error: failed to run main module `codecbench-simd.wasm` Caused by: 0: failed to invoke `_start` 1: wasm trap: call stack exhausted, source location: @- wasm backtrace: 0: <unknown>!meshopt::decodeBytesSimd(unsigned char const*, unsigned char const*, unsigned char*, unsigned long) 1: <unknown>!meshopt::decodeVertexBlockSimd(unsigned char const*, unsigned char const*, unsigned char*, unsigned long, unsigned long, unsigned char*) 2: <unknown>!meshopt_decodeVertexBuffer 3: <unknown>!benchCodecs(std::__2::vector<Vertex, std::__2::allocator<Vertex> > const&, std::__2::vector<unsigned int, std::__2::allocator<unsigned int> > const&) 4: <unknown>!__original_main 5: <unknown>!_startThe expected call sequence here is meshopt_decodeVertexBuffer -> decodeVertexBlockSimd -> decodeBytesSimd -> decodeBytesGroupSimd, with no recursion. Unsure what "call stack exhausted" indicates here...
Also worth noting is that expanding various buffers to accomodate for possible overruns didn't help; in some configurations I'm not getting a stack overflow, but meshopt_decodeVertexBuffer returns a non-0 result because it exits early during parsing, which suggests some issues with control flow here.
Unfortunately trying to add prints to this to understand the behavior fixes the issue as well, so the investigation here might be complicated. Is there a way to coerce wasmtime to generate a debuggable binary?
zeux edited a comment on Issue #1407:
I strongly suspect this isn't an issue in the benchmark; the behavior seems highly dependent on the inlining here. Adding
noinline
to decodeBytesGroupSimd & decodeBytesSimd & decodeVertexBlockSimd fixes this. With only decodeBytesSimd marked as noinline, I get this instead:/mnt/c/work/meshoptimizer $ make -B codecbench-simd.wasm && ../wasmtime/target/debug/wasmtime --enable-simd codecbench-simd.wasm emcc tools/codecbench.cpp src/vertexcodec.cpp src/vertexfilter.cpp src/overdrawanalyzer.cpp src/indexgenerator.cpp src/vcacheoptimizer.cpp src/clusterizer.cpp src/indexcodec.cpp src/vfetchanalyzer.cpp src/spatialorder.cpp src/allocator.cpp src/vcacheanalyzer.cpp src/vfetchoptimizer.cpp src/overdrawoptimizer.c pp src/simplifier.cpp src/stripifier.cpp -O3 -DNDEBUG -s TOTAL_MEMORY=268435456 -msimd128 -o codecbench-simd.wasm -g source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes Error: failed to run main module `codecbench-simd.wasm` Caused by: 0: failed to invoke `_start` 1: wasm trap: call stack exhausted, source location: @- wasm backtrace: 0: <unknown>!meshopt::decodeBytesSimd(unsigned char const*, unsigned char const*, unsigned char*, unsigned long) 1: <unknown>!meshopt::decodeVertexBlockSimd(unsigned char const*, unsigned char const*, unsigned char*, unsigned long, unsigned long, unsigned char*) 2: <unknown>!meshopt_decodeVertexBuffer 3: <unknown>!benchCodecs(std::__2::vector<Vertex, std::__2::allocator<Vertex> > const&, std::__2::vector<unsigned int, std::__2::allocator<unsigned int> > const&) 4: <unknown>!__original_main 5: <unknown>!_startThe expected call sequence here is meshopt_decodeVertexBuffer -> decodeVertexBlockSimd -> decodeBytesSimd -> decodeBytesGroupSimd, with no recursion. Unsure what "call stack exhausted" indicates here...
Also worth noting is that expanding various buffers to accomodate for possible overruns didn't help; also in some inlining/codegen configurations I'm not getting a stack overflow or OOB, but meshopt_decodeVertexBuffer returns a non-0 result because it exits early during parsing, which suggests some issues with control flow here.
Unfortunately trying to add prints to this to understand the behavior fixes the issue as well, so the investigation here might be complicated. Is there a way to coerce wasmtime to generate a debuggable binary?
bjorn3 commented on Issue #1407:
Pass
-g
if your wasm file was build with debuginfo. I don't know if wasm2obj supports it though.
zeux commented on Issue #1407:
-g
doesn't seem to work with Emscripten-generated debug info here:Error: failed to emit debug sections Caused by: The end offset of a location list entry must not be before the beginning.`Might be an Emscripten bug, not sure.
zeux commented on Issue #1407:
One other observation is that
--opt-level 0
doesn't trigger this bug:/mnt/c/work/meshoptimizer $ ../wasmtime/target/debug/wasmtime --enable-simd --disable-cache --opt-level 0 codecbench-simd.wasm source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes decode: vertex 28.46 ms (1.05 GB/sec), index 25.10 ms (0.89 GB/sec); rv 0 ri 0 ... /mnt/c/work/meshoptimizer $ ../wasmtime/target/debug/wasmtime --enable-simd --disable-cache --opt-level 1 codecbench-simd.wasm source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes Error: failed to run main module `codecbench-simd.wasm` Caused by: 0: failed to invoke `_start` 1: wasm trap: out of bounds memory access, source location: @7ca4 wasm backtrace: 0: <unknown>!<wasm function 75> 1: <unknown>!<wasm function 37> 2: <unknown>!<wasm function 76> 3: <unknown>!<wasm function 28> 4: <unknown>!<wasm function 67>This is on a file without Emscripten-generated debug info.
codecbench-simd.zipI don't think I have enough understanding here to provide further help, but it looks to me as if the .wasm file in question has control flow that is complicated enough to trigger some miscompilation if optimizations are enabled, and the out of bounds access is just an odd side effect.
abrown commented on Issue #1407:
Glad you were able to replicate and that
--opt-level 0
difference is actually pretty interesting. There is a pass that converts a load with an offset that is the result of a sum to a complex load; I wonder if something weird is happening there.
abrown commented on Issue #1407:
I just re-ran the Wasm files above and they ran without issue in the master branch of wasmtime (except for
emscripten-built-for-js.wasm
of course--that failure is expected). It's hard to say exactly what has changed that would have fixed this but I'm going to close it since I can't reproduce now (thankfully!).
abrown closed Issue #1407:
What do you expect to happen? What does actually happen? Does it panic, and if so, with which assertion?
wasmtime
traps with an OOB memory access; Node does not.In Node the code runs as expected (though we have to run it from the JS wrapper):
$ node --version v13.9.0 $ node --experimental-wasm-simd emscripten-built-for-js.js source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes decode: vertex 16.32 ms (1.83 GB/sec), index 11.15 ms (2.00 GB/sec) decode: vertex 16.33 ms (1.83 GB/sec), index 11.15 ms (2.00 GB/sec) decode: vertex 16.57 ms (1.80 GB/sec), index 11.23 ms (1.99 GB/sec) decode: vertex 16.19 ms (1.84 GB/sec), index 11.35 ms (1.97 GB/sec) decode: vertex 16.18 ms (1.85 GB/sec), index 11.16 ms (2.00 GB/sec) decode: vertex 16.12 ms (1.85 GB/sec), index 11.19 ms (2.00 GB/sec) decode: vertex 16.19 ms (1.84 GB/sec), index 11.15 ms (2.01 GB/sec) decode: vertex 16.16 ms (1.85 GB/sec), index 11.14 ms (2.01 GB/sec) decode: vertex 16.15 ms (1.85 GB/sec), index 11.17 ms (2.00 GB/sec) decode: vertex 16.17 ms (1.85 GB/sec), index 11.16 ms (2.00 GB/sec) pass 1: vertex data 18518204 bytes, index data 2001016 bytes decode: vertex 16.12 ms (1.85 GB/sec), index 11.07 ms (2.02 GB/sec) decode: vertex 16.17 ms (1.85 GB/sec), index 11.08 ms (2.02 GB/sec) decode: vertex 16.11 ms (1.85 GB/sec), index 11.11 ms (2.01 GB/sec) decode: vertex 16.21 ms (1.84 GB/sec), index 11.09 ms (2.02 GB/sec) decode: vertex 16.17 ms (1.85 GB/sec), index 11.10 ms (2.01 GB/sec) decode: vertex 16.07 ms (1.86 GB/sec), index 11.13 ms (2.01 GB/sec) decode: vertex 16.19 ms (1.84 GB/sec), index 11.06 ms (2.02 GB/sec) decode: vertex 16.17 ms (1.85 GB/sec), index 11.10 ms (2.01 GB/sec) decode: vertex 16.04 ms (1.86 GB/sec), index 11.14 ms (2.01 GB/sec) decode: vertex 16.19 ms (1.84 GB/sec), index 11.07 ms (2.02 GB/sec) filters: oct8 data 4000000 bytes, oct12/quat12 data 8000000 bytes filter: oct8 2.12 ms (1.76 GB/sec), oct12 2.26 ms (3.29 GB/sec), quat12 2.84 ms (2.63 GB/sec) filter: oct8 2.11 ms (1.76 GB/sec), oct12 2.19 ms (3.40 GB/sec), quat12 2.79 ms (2.67 GB/sec) filter: oct8 2.11 ms (1.77 GB/sec), oct12 2.17 ms (3.43 GB/sec), quat12 2.79 ms (2.67 GB/sec) filter: oct8 2.13 ms (1.75 GB/sec), oct12 2.25 ms (3.32 GB/sec), quat12 2.86 ms (2.61 GB/sec) filter: oct8 2.10 ms (1.77 GB/sec), oct12 2.17 ms (3.43 GB/sec), quat12 2.80 ms (2.66 GB/sec) filter: oct8 2.09 ms (1.78 GB/sec), oct12 2.16 ms (3.45 GB/sec), quat12 2.81 ms (2.65 GB/sec) filter: oct8 2.13 ms (1.75 GB/sec), oct12 2.28 ms (3.26 GB/sec), quat12 2.82 ms (2.64 GB/sec) filter: oct8 2.23 ms (1.67 GB/sec), oct12 2.16 ms (3.44 GB/sec), quat12 2.81 ms (2.65 GB/sec) filter: oct8 2.10 ms (1.78 GB/sec), oct12 2.15 ms (3.47 GB/sec), quat12 2.83 ms (2.63 GB/sec) filter: oct8 2.14 ms (1.74 GB/sec), oct12 2.17 ms (3.44 GB/sec), quat12 2.80 ms (2.66 GB/sec)
In wasmtime (on branch https://github.com/abrown/wasmtime/tree/additional-i8x16-shift which implements needed instructions). I tried various versions of the same code built with different tools:
$ ls ../oob/*.wasm | xargs -I{} sh -c "cargo run -- run --enable-simd --disable-cache {}" Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/wasmtime run --enable-simd --disable-cache ../oob/emscripten-built-for-js.wasm` Error: failed to run main module `../oob/emscripten-built-for-js.wasm` Caused by: import module `a` was not found Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/wasmtime run --enable-simd --disable-cache ../oob/emscripten-built.wasm` source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes Error: failed to run main module `../oob/emscripten-built.wasm` Caused by: 0: failed to invoke `_start` 1: wasm trap: out of bounds memory access, source location: @7d97 wasm backtrace: 0: <unknown>!<wasm function 74> 1: <unknown>!<wasm function 37> 2: <unknown>!<wasm function 75> 3: <unknown>!<wasm function 28> 4: <unknown>!<wasm function 67> Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/wasmtime run --enable-simd --disable-cache ../oob/wasi-sdk-built-extra-memory.wasm` source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes Error: failed to run main module `../oob/wasi-sdk-built-extra-memory.wasm` Caused by: 0: failed to invoke `_start` 1: wasm trap: out of bounds memory access, source location: @22a5 wasm backtrace: 0: <unknown>!meshopt::decodeVertexBlockSimd(unsigned char const*, unsigned char const*, unsigned char*, unsigned long, unsigned long, unsigned char*) 1: <unknown>!meshopt_decodeVertexBuffer 2: <unknown>!benchCodecs(std::__2::vector<Vertex, std::__2::allocator<Vertex> > const&, std::__2::vector<unsigned int, std::__2::allocator<unsigned int> > const&) 3: <unknown>!__original_main 4: <unknown>!_start Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/wasmtime run --enable-simd --disable-cache ../oob/wasi-sdk-built.wasm` source: vertex data 32064032 bytes, index data 24000000 bytes pass 0: vertex data 18518385 bytes, index data 2332680 bytes Error: failed to run main module `../oob/wasi-sdk-built.wasm` Caused by: 0: failed to invoke `_start` 1: wasm trap: out of bounds memory access, source location: @22a4 wasm backtrace: 0: <unknown>!meshopt::decodeVertexBlockSimd(unsigned char const*, unsigned char const*, unsigned char*, unsigned long, unsigned long, unsigned char*) 1: <unknown>!meshopt_decodeVertexBuffer 2: <unknown>!benchCodecs(std::__2::vector<Vertex, std::__2::allocator<Vertex> > const&, std::__2::vector<unsigned int, std::__2::allocator<unsigned int> > const&) 3: <unknown>!__original_main 4: <unknown>!_start
Which Wasmtime version / commit hash / branch are you using?
On branch https://github.com/abrown/wasmtime/tree/additional-i8x16-shift which implements needed instructions.
What are the steps to reproduce the issue?
See above. Also, here are steps for building the Wasm modules from https://github.com/zeux/meshoptimizer/tree/9047ac1936351d0508bb26b5b82ec1101f9735b4:
# wasi-sdi-built.wasm (2^28 bytes of memory, 4096x64K pages) $ /opt/wasi-sdk/bin/clang++ --version clang version 11.0.0 (https://github.com/llvm/llvm-project 46bb6613a31fd43b6d4485ce7e71a387dc22cbc7) Target: wasm32-unknown-wasi Thread model: posix InstalledDir: /opt/wasi-sdk/bin $ make clean && make codecbench-simd.wasm /opt/wasi-sdk/bin/clang++ tools/codecbench.cpp src/vertexcodec.cpp src/vertexfilter.cpp src/overdrawanalyzer.cpp src/indexgenerator.cpp src/vcacheoptimizer.cpp src/indexcodec.cpp src/vfetchanalyzer.cpp src/spatialorder.cpp src/clusterizer.cpp src/allocator.cpp src/vcacheanalyzer.cpp src/vfetchoptimizer.cpp src/overdrawoptimizer.cpp src/simplifier.cpp src/stripifier.cpp -O3 -DNDEBUG -fno-exceptions -Wl,--initial-memory=268435456 -msimd128 -o codecbench-simd.wasm # wasi-sdk-built-extra-memory.wasm (2^30 bytes, 16384x64K pages) $ make clean && make codecbench-simd.wasm /opt/wasi-sdk/bin/clang++ tools/codecbench.cpp src/vertexcodec.cpp src/vertexfilter.cpp src/overdrawanalyzer.cpp src/indexgenerator.cpp src/vcacheoptimizer.cpp src/indexcodec.cpp src/vfetchanalyzer.cpp src/spatialorder.cpp src/clusterizer.cpp src/allocator.cpp src/vcacheanalyzer.cpp src/vfetchoptimizer.cpp src/overdrawoptimizer.cpp src/simplifier.cpp src/stripifier.cpp -O3 -DNDEBUG -fno-exceptions -Wl,--initial-memory=1073741824 -msimd128 -o codecbench-simd.wasm # emscripten-built.wasm $ emcc --version emcc (Emscripten gcc/clang-like replacement) 1.39.10 (commit 1bd7d547598f3fc74699c172f6c9c59a1e8484f1) $ make clean && make codecbench-simd.wasm emcc tools/codecbench.cpp src/vertexcodec.cpp src/vertexfilter.cpp src/overdrawanalyzer.cpp src/indexgenerator.cpp src/vcacheoptimizer.cpp src/indexcodec.cpp src/vfetchanalyzer.cpp src/spatialorder.cpp src/clusterizer.cpp src/allocator.cpp src/vcacheanalyzer.cpp src/vfetchoptimizer.cpp src/overdrawoptimizer.cpp src/simplifier.cpp src/stripifier.cpp -O3 -DNDEBUG -s TOTAL_MEMORY=268435456 -msimd128 -o codecbench-simd.wasm # then generated wat and dump files with ls *.wasm | xargs -I{} sh -c "wasm2wat --enable-all {} > {}.wat" ls *.wasm | xargs -I{} sh -c "wasm-objdump -d {} > {}.dump"
emscripten-built.wasm.txt
emscripten-built.wasm.wat.txt
emscripten-built-for-js.js.txt
emscripten-built-for-js.wasm.dump.txt
emscripten-built-for-js.wasm.txt
emscripten-built-for-js.wasm.wat.txt
wasi-sdk-built.wasm.dump.txt
wasi-sdk-built.wasm.txt
wasi-sdk-built.wasm.wat.txt
[wasi-sdk-b
[message truncated]
Last updated: Nov 22 2024 at 16:03 UTC