yurydelendik opened Issue #2531:
I translated the IDCT SSE code into Wasm. The algorithm uses lots of varies
punpckxxxx
instructions, though WebAssembly has v8x16.shuffle. The v8 lowers into native SSE2 equivalents by matching immediate argument. I cannot find if we do it for any of the cranelift backends.STR:
- Use test case at https://github.com/yurydelendik/zbar-wasm/raw/0083a9a48c8c06e5555424d85f71ce5a4b560145/zbar_jpeg/test.wasm
- Run
time wasmtime run --enable-simd test.wasm --invoke test500
Observe the time; it is about 15 sec here. Node runs test.wasm (
_initialize
+test500
) in about 11 sec here.It is expected that wasmtime/cranelift will improve the performance by using specialized SSE2 instructions by 40-50%.
yurydelendik edited Issue #2531:
I translated the IDCT SSE code into Wasm. The algorithm uses lots of various
punpckxxxx
instructions, though WebAssembly has v8x16.shuffle. The v8 lowers into native SSE2 equivalents by matching immediate argument. I cannot find if we do it for any of the cranelift backends.STR:
- Use test case at https://github.com/yurydelendik/zbar-wasm/raw/0083a9a48c8c06e5555424d85f71ce5a4b560145/zbar_jpeg/test.wasm
- Run
time wasmtime run --enable-simd test.wasm --invoke test500
Observe the time; it is about 15 sec here. Node runs test.wasm (
_initialize
+test500
) in about 11 sec here.It is expected that wasmtime/cranelift will improve the performance by using specialized SSE2 instructions by 40-50%.
yurydelendik labeled Issue #2531:
I translated the IDCT SSE code into Wasm. The algorithm uses lots of various
punpckxxxx
instructions, though WebAssembly has v8x16.shuffle. The v8 lowers into native SSE2 equivalents by matching immediate argument. I cannot find if we do it for any of the cranelift backends.STR:
- Use test case at https://github.com/yurydelendik/zbar-wasm/raw/0083a9a48c8c06e5555424d85f71ce5a4b560145/zbar_jpeg/test.wasm
- Run
time wasmtime run --enable-simd test.wasm --invoke test500
Observe the time; it is about 15 sec here. Node runs test.wasm (
_initialize
+test500
) in about 11 sec here.It is expected that wasmtime/cranelift will improve the performance by using specialized SSE2 instructions by 40-50%.
yurydelendik labeled Issue #2531:
I translated the IDCT SSE code into Wasm. The algorithm uses lots of various
punpckxxxx
instructions, though WebAssembly has v8x16.shuffle. The v8 lowers into native SSE2 equivalents by matching immediate argument. I cannot find if we do it for any of the cranelift backends.STR:
- Use test case at https://github.com/yurydelendik/zbar-wasm/raw/0083a9a48c8c06e5555424d85f71ce5a4b560145/zbar_jpeg/test.wasm
- Run
time wasmtime run --enable-simd test.wasm --invoke test500
Observe the time; it is about 15 sec here. Node runs test.wasm (
_initialize
+test500
) in about 11 sec here.It is expected that wasmtime/cranelift will improve the performance by using specialized SSE2 instructions by 40-50%.
yurydelendik labeled Issue #2531:
I translated the IDCT SSE code into Wasm. The algorithm uses lots of various
punpckxxxx
instructions, though WebAssembly has v8x16.shuffle. The v8 lowers into native SSE2 equivalents by matching immediate argument. I cannot find if we do it for any of the cranelift backends.STR:
- Use test case at https://github.com/yurydelendik/zbar-wasm/raw/0083a9a48c8c06e5555424d85f71ce5a4b560145/zbar_jpeg/test.wasm
- Run
time wasmtime run --enable-simd test.wasm --invoke test500
Observe the time; it is about 15 sec here. Node runs test.wasm (
_initialize
+test500
) in about 11 sec here.It is expected that wasmtime/cranelift will improve the performance by using specialized SSE2 instructions by 40-50%.
yurydelendik labeled Issue #2531:
I translated the IDCT SSE code into Wasm. The algorithm uses lots of various
punpckxxxx
instructions, though WebAssembly has v8x16.shuffle. The v8 lowers into native SSE2 equivalents by matching immediate argument. I cannot find if we do it for any of the cranelift backends.STR:
- Use test case at https://github.com/yurydelendik/zbar-wasm/raw/0083a9a48c8c06e5555424d85f71ce5a4b560145/zbar_jpeg/test.wasm
- Run
time wasmtime run --enable-simd test.wasm --invoke test500
Observe the time; it is about 15 sec here. Node runs test.wasm (
_initialize
+test500
) in about 11 sec here.It is expected that wasmtime/cranelift will improve the performance by using specialized SSE2 instructions by 40-50%.
abrown commented on Issue #2531:
I agree; perhaps this benchmark could be adapted for sightglass?
yurydelendik commented on Issue #2531:
perhaps this benchmark could be adapted for sightglass?
Not familiar with the sightglass. Can you sketch what needs to be done?
abrown commented on Issue #2531:
Here is a high-level document describing the basic idea and here is an example.
Last updated: Jan 24 2025 at 00:11 UTC