Stream: git-wasmtime

Topic: wasmtime / PR #8313 NaN-canonicalization without branchin...


view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 07:53):

adambratschikaye opened PR #8313 from adambratschikaye:abk/no-branch-nan-canonicalization to bytecodealliance:main:

Modify the cranelift pass that performs NaN-canonicalization to avoid branches on x64. The current implementation uses two branches:

       8: be 00 00 c0 7f                mov esi, 0x7fc00000
       d: c5 f9 6e de                   vmovd   xmm3, esi
      11: 0f 2e c0                      ucomiss xmm0, xmm0
      14: 0f 8b 04 00 00 00             jnp 0x1e <wasm[0]::function[0]+0x1e>
      1a: f2 0f 10 c3                   movsd   xmm0, xmm3              # xmm0 = xmm3[0],xmm0[1]
      1e: 0f 84 04 00 00 00             je  0x28 <wasm[0]::function[0]+0x28>
      24: f2 0f 10 c3                   movsd   xmm0, xmm3              # xmm0 = xmm3[0],xmm0[1]

With these changes, NaN-canonicalization becomes:

       8: c5 e8 c2 da 03                vcmpunordps xmm3, xmm2, xmm2
       d: be 00 00 c0 7f                mov esi, 0x7fc00000
      12: c5 f9 6e e6                   vmovd   xmm4, esi
      16: c4 e3 69 4c c4 30             vpblendvb   xmm0, xmm2, xmm4, xmm3

Running both versions against an small image classification inference benchmark here resulted in a ~50% improvement:

image_classification/opt-level=0
                        time:   [728.16 ms 730.00 ms 732.05 ms]
                        change: [-44.476% -44.251% -44.029%] (p = 0.00 < 0.05)
                        Performance has improved.

image_classification/opt-level=1
                        time:   [593.90 ms 595.51 ms 597.34 ms]
                        change: [-51.561% -51.396% -51.211%] (p = 0.00 < 0.05)
                        Performance has improved.

As a side note, I didn't notice any sightglass benchmark that was performing mainly float arithmetic to test against. I'd be happy to add this image classification case if there's interest.

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 07:54):

adambratschikaye updated PR #8313.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 07:59):

adambratschikaye submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 07:59):

adambratschikaye submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 07:59):

adambratschikaye created PR review comment:

Without any of the other changes, just changing this comparison from NotEqual to Unordered removes one of the two jumps which is a significant improvement. With the other changes I don't think there's a difference between using NotEqual or Unordered, but Unordered seemed more precise.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 07:59):

adambratschikaye created PR review comment:

The original pattern was never triggered because fcmp will result in either an I32X4 or I64X2 which always needs to be bitcast back to F32X4 or F64X2 before it can be passed to bitselect.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 08:19):

adambratschikaye updated PR #8313.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 08:26):

adambratschikaye edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 09:06):

adambratschikaye has marked PR #8313 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 09:06):

adambratschikaye requested wasmtime-compiler-reviewers for a review on PR #8313.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 09:06):

adambratschikaye requested abrown for a review on PR #8313.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 09:07):

adambratschikaye commented on PR #8313:

Also is there a way to enable NaN-canonicalization in a clif test to add a test for this?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 09:29):

afonso360 commented on PR #8313:

You should be able to add something along these lines, to test with nan canonicalization enabled:

test {run,compile,etc...}
set enable_nan_canonicalization=true
target x86_64

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 09:33):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 09:33):

afonso360 created PR review comment:

I'm not sure this works with the RISC-V backend, our vector support is gated on a feature flag which may not always be available.

One solution would be to keep the previous code around for RISC-V, but it feels wrong to select on targets at this level.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 14:48):

adambratschikaye updated PR #8313.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 14:52):

adambratschikaye submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 14:52):

adambratschikaye created PR review comment:

Ah I see that now. What about choosing based on the targets and features (that is specifically riscv64 when has_v=false)? Maybe in the future there will be other target/feature combinations that don't support vectors?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 22:05):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2024 at 22:05):

afonso360 created PR review comment:

I think that is what we need right now. I'm not sure if eventually we should add something to TargetIsa, but It might be worth reviewing later on if this situation ends up happening more often.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 09 2024 at 08:41):

adambratschikaye updated PR #8313.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 09 2024 at 08:44):

adambratschikaye submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 09 2024 at 08:44):

adambratschikaye created PR review comment:

Added the fall back to the old version for RiscV here.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 09 2024 at 10:15):

afonso360 submitted PR review:

This LGTM! Thanks! I don't know if @abrown wants to review it as well.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 09 2024 at 11:07):

abrown submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 09 2024 at 11:34):

abrown merged PR #8313.


Last updated: Nov 22 2024 at 17:03 UTC