alexcrichton commented on issue #2531:
Can confirm that all these shuffles are now implemented, even on aarch64 too. All
i8x16.shuffle
instructions present in the above module are compiled to single-instruction lowerings on both x86_64 and aarch64. In that case I'm going to close this.
alexcrichton closed issue #2531:
I translated the IDCT SSE code into Wasm. The algorithm uses lots of various
punpckxxxx
instructions, though WebAssembly has v8x16.shuffle. The v8 lowers into native SSE2 equivalents by matching immediate argument. I cannot find if we do it for any of the cranelift backends.STR:
- Use test case at https://github.com/yurydelendik/zbar-wasm/raw/0083a9a48c8c06e5555424d85f71ce5a4b560145/zbar_jpeg/test.wasm
- Run
time wasmtime run --enable-simd test.wasm --invoke test500
Observe the time; it is about 15 sec here. Node runs test.wasm (
_initialize
+test500
) in about 11 sec here.It is expected that wasmtime/cranelift will improve the performance by using specialized SSE2 instructions by 40-50%.
Last updated: Nov 22 2024 at 16:03 UTC