Stream: git-wasmtime

Topic: wasmtime / PR #9853 pulley: Implement integer vector comp...


view this post on Zulip Wasmtime GitHub notifications bot (Dec 18 2024 at 18:41):

alexcrichton opened PR #9853 from alexcrichton:pulley-simd-compare to bytecodealliance:main:

More wast tests passing.

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Dec 18 2024 at 18:41):

alexcrichton requested cfallin for a review on PR #9853.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 18 2024 at 18:41):

alexcrichton requested wasmtime-compiler-reviewers for a review on PR #9853.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 18 2024 at 18:41):

alexcrichton requested dicej for a review on PR #9853.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 18 2024 at 18:41):

alexcrichton requested wasmtime-core-reviewers for a review on PR #9853.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 18 2024 at 18:41):

alexcrichton requested wasmtime-default-reviewers for a review on PR #9853.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 18 2024 at 19:09):

cfallin submitted PR review:

Thanks! Looks right to me; I checked over most of this for copy-pastos but am additionally trusting the runtests as a backstop. Item of stray curiosity below but nothing to block on.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 18 2024 at 19:09):

cfallin created PR review comment:

Stray curiosity: did you happen to look if LLVM can autovectorize this? It sure would be neat to have vector op implementations bottom out in native vector instructions when Pulley runs on a SIMD-capable host...

(No worries if not, it's not the main goal, but if it inspires anything then all the better)

view this post on Zulip Wasmtime GitHub notifications bot (Dec 18 2024 at 19:20):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 18 2024 at 19:20):

alexcrichton created PR review comment:

Heh I've been double-checking this along the way for most of the simd opcodes. The good news is yes! LLVM does a pretty good job at auto-vectorizing all these methods.

For example vaddi32x4 looks like this:

0000000000000000 <_ZN97_$LT$pulley_interpreter..interp..Interpreter$u20$as$u20$pulley_interpreter..decode..OpVisitor$GT$9vaddi32x417h24bc21fe57c19519E>:
   0:   48 8b 07                mov    (%rdi),%rax
   3:   89 f1                   mov    %esi,%ecx
   5:   40 0f b6 d6             movzbl %sil,%edx
   9:   c1 ee 04                shr    $0x4,%esi
   c:   81 e6 f0 0f 00 00       and    $0xff0,%esi
  12:   c1 e9 0c                shr    $0xc,%ecx
  15:   81 e1 f0 0f 00 00       and    $0xff0,%ecx
  1b:   c1 e2 04                shl    $0x4,%edx
  1e:   c5 f9 6f 04 30          vmovdqa (%rax,%rsi,1),%xmm0
  23:   c5 f9 fe 04 08          vpaddd (%rax,%rcx,1),%xmm0,%xmm0
  28:   c5 f9 7f 04 10          vmovdqa %xmm0,(%rax,%rdx,1)
  2d:   31 c0                   xor    %eax,%eax
  2f:   c3                      ret

and the method here looks like this:

0000000000000000 <_ZN105_$LT$pulley_interpreter..interp..Interpreter$u20$as$u20$pulley_interpreter..decode..ExtendedOpVisitor$GT$7veq8x1617h73aa3ce30a2d51abE>:
   0:   48 8b 07                mov    (%rdi),%rax
   3:   89 f1                   mov    %esi,%ecx
   5:   40 0f b6 d6             movzbl %sil,%edx
   9:   c1 ee 04                shr    $0x4,%esi
   c:   81 e6 f0 0f 00 00       and    $0xff0,%esi
  12:   c1 e9 0c                shr    $0xc,%ecx
  15:   81 e1 f0 0f 00 00       and    $0xff0,%ecx
  1b:   c1 e2 04                shl    $0x4,%edx
  1e:   c5 f9 6f 04 30          vmovdqa (%rax,%rsi,1),%xmm0
  23:   c5 f9 74 04 08          vpcmpeqb (%rax,%rcx,1),%xmm0,%xmm0
  28:   c5 f9 7f 04 10          vmovdqa %xmm0,(%rax,%rdx,1)
  2d:   31 c0                   xor    %eax,%eax
  2f:   c3                      ret

Most of the complexity here is decoding BinaryOperands<VReg> where it's three 5-bit values packed into a 16-bit value, but otherwise it's pretty optimal in terms of lowering.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 18 2024 at 19:21):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 18 2024 at 19:21):

cfallin created PR review comment:

Nice, that's great!

view this post on Zulip Wasmtime GitHub notifications bot (Dec 18 2024 at 19:28):

cfallin merged PR #9853.


Last updated: Dec 23 2024 at 12:05 UTC