Stream: git-wasmtime

Topic: wasmtime / Issue #2531 v8x16.shuffle optimizations needed


view this post on Zulip Wasmtime GitHub notifications bot (Dec 24 2020 at 14:28):

yurydelendik opened Issue #2531:

I translated the IDCT SSE code into Wasm. The algorithm uses lots of varies punpckxxxx instructions, though WebAssembly has v8x16.shuffle. The v8 lowers into native SSE2 equivalents by matching immediate argument. I cannot find if we do it for any of the cranelift backends.

STR:

  1. Use test case at https://github.com/yurydelendik/zbar-wasm/raw/0083a9a48c8c06e5555424d85f71ce5a4b560145/zbar_jpeg/test.wasm
  2. Run time wasmtime run --enable-simd test.wasm --invoke test500

Observe the time; it is about 15 sec here. Node runs test.wasm (_initialize + test500) in about 11 sec here.

It is expected that wasmtime/cranelift will improve the performance by using specialized SSE2 instructions by 40-50%.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 28 2020 at 14:11):

yurydelendik edited Issue #2531:

I translated the IDCT SSE code into Wasm. The algorithm uses lots of various punpckxxxx instructions, though WebAssembly has v8x16.shuffle. The v8 lowers into native SSE2 equivalents by matching immediate argument. I cannot find if we do it for any of the cranelift backends.

STR:

  1. Use test case at https://github.com/yurydelendik/zbar-wasm/raw/0083a9a48c8c06e5555424d85f71ce5a4b560145/zbar_jpeg/test.wasm
  2. Run time wasmtime run --enable-simd test.wasm --invoke test500

Observe the time; it is about 15 sec here. Node runs test.wasm (_initialize + test500) in about 11 sec here.

It is expected that wasmtime/cranelift will improve the performance by using specialized SSE2 instructions by 40-50%.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 28 2020 at 14:13):

yurydelendik labeled Issue #2531:

I translated the IDCT SSE code into Wasm. The algorithm uses lots of various punpckxxxx instructions, though WebAssembly has v8x16.shuffle. The v8 lowers into native SSE2 equivalents by matching immediate argument. I cannot find if we do it for any of the cranelift backends.

STR:

  1. Use test case at https://github.com/yurydelendik/zbar-wasm/raw/0083a9a48c8c06e5555424d85f71ce5a4b560145/zbar_jpeg/test.wasm
  2. Run time wasmtime run --enable-simd test.wasm --invoke test500

Observe the time; it is about 15 sec here. Node runs test.wasm (_initialize + test500) in about 11 sec here.

It is expected that wasmtime/cranelift will improve the performance by using specialized SSE2 instructions by 40-50%.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 28 2020 at 14:13):

yurydelendik labeled Issue #2531:

I translated the IDCT SSE code into Wasm. The algorithm uses lots of various punpckxxxx instructions, though WebAssembly has v8x16.shuffle. The v8 lowers into native SSE2 equivalents by matching immediate argument. I cannot find if we do it for any of the cranelift backends.

STR:

  1. Use test case at https://github.com/yurydelendik/zbar-wasm/raw/0083a9a48c8c06e5555424d85f71ce5a4b560145/zbar_jpeg/test.wasm
  2. Run time wasmtime run --enable-simd test.wasm --invoke test500

Observe the time; it is about 15 sec here. Node runs test.wasm (_initialize + test500) in about 11 sec here.

It is expected that wasmtime/cranelift will improve the performance by using specialized SSE2 instructions by 40-50%.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 28 2020 at 14:13):

yurydelendik labeled Issue #2531:

I translated the IDCT SSE code into Wasm. The algorithm uses lots of various punpckxxxx instructions, though WebAssembly has v8x16.shuffle. The v8 lowers into native SSE2 equivalents by matching immediate argument. I cannot find if we do it for any of the cranelift backends.

STR:

  1. Use test case at https://github.com/yurydelendik/zbar-wasm/raw/0083a9a48c8c06e5555424d85f71ce5a4b560145/zbar_jpeg/test.wasm
  2. Run time wasmtime run --enable-simd test.wasm --invoke test500

Observe the time; it is about 15 sec here. Node runs test.wasm (_initialize + test500) in about 11 sec here.

It is expected that wasmtime/cranelift will improve the performance by using specialized SSE2 instructions by 40-50%.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 28 2020 at 14:13):

yurydelendik labeled Issue #2531:

I translated the IDCT SSE code into Wasm. The algorithm uses lots of various punpckxxxx instructions, though WebAssembly has v8x16.shuffle. The v8 lowers into native SSE2 equivalents by matching immediate argument. I cannot find if we do it for any of the cranelift backends.

STR:

  1. Use test case at https://github.com/yurydelendik/zbar-wasm/raw/0083a9a48c8c06e5555424d85f71ce5a4b560145/zbar_jpeg/test.wasm
  2. Run time wasmtime run --enable-simd test.wasm --invoke test500

Observe the time; it is about 15 sec here. Node runs test.wasm (_initialize + test500) in about 11 sec here.

It is expected that wasmtime/cranelift will improve the performance by using specialized SSE2 instructions by 40-50%.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 30 2020 at 00:13):

abrown commented on Issue #2531:

I agree; perhaps this benchmark could be adapted for sightglass?

view this post on Zulip Wasmtime GitHub notifications bot (Dec 30 2020 at 01:07):

yurydelendik commented on Issue #2531:

perhaps this benchmark could be adapted for sightglass?

Not familiar with the sightglass. Can you sketch what needs to be done?

view this post on Zulip Wasmtime GitHub notifications bot (Dec 30 2020 at 22:53):

abrown commented on Issue #2531:

Here is a high-level document describing the basic idea and here is an example.


Last updated: Jan 24 2025 at 00:11 UTC