wasmtime / issue #2531 v8x16.shuffle optimizations needed · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #2531 v8x16.shuffle optimizations needed

Wasmtime GitHub notifications bot (Jun 22 2023 at 15:03):

Can confirm that all these shuffles are now implemented, even on aarch64 too. All i8x16.shuffle instructions present in the above module are compiled to single-instruction lowerings on both x86_64 and aarch64. In that case I'm going to close this.

Wasmtime GitHub notifications bot (Jun 22 2023 at 15:03):

alexcrichton closed issue #2531:

I translated the IDCT SSE code into Wasm. The algorithm uses lots of various punpckxxxx instructions, though WebAssembly has v8x16.shuffle. The v8 lowers into native SSE2 equivalents by matching immediate argument. I cannot find if we do it for any of the cranelift backends.

STR:

Use test case at https://github.com/yurydelendik/zbar-wasm/raw/0083a9a48c8c06e5555424d85f71ce5a4b560145/zbar_jpeg/test.wasm

Run time wasmtime run --enable-simd test.wasm --invoke test500

Observe the time; it is about 15 sec here. Node runs test.wasm (_initialize + test500) in about 11 sec here.

It is expected that wasmtime/cranelift will improve the performance by using specialized SSE2 instructions by 40-50%.

Last updated: Apr 18 2025 at 04:04 UTC