wasmtime / PR #3517 Fix for issue 3327. Updates Bnot to h... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #3517 Fix for issue 3327. Updates Bnot to h...

Wasmtime GitHub notifications bot (Nov 12 2021 at 01:36):

jlb6740 opened PR #3517 from fix-issue-3327 to main:

Wasmtime GitHub notifications bot (Nov 12 2021 at 01:40):

jlb6740 requested alexcrichton for a review on PR #3517.

Wasmtime GitHub notifications bot (Nov 12 2021 at 01:40):

jlb6740 requested cfallin for a review on PR #3517.

Wasmtime GitHub notifications bot (Nov 12 2021 at 21:13):

abrown submitted PR review.

Wasmtime GitHub notifications bot (Nov 12 2021 at 21:13):

abrown created PR review comment:

                ctx.emit(Inst::equals(types::I32X4, RegMem::from(tmp), tmp));
This change would avoid the extra instruction and might fix the NaN comparison issue.

Wasmtime GitHub notifications bot (Nov 15 2021 at 14:31):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Nov 15 2021 at 14:31):

alexcrichton created PR review comment:

I believe this would fix the issue, yes, because this would avoid doing a floating-point-compare which runs the risk of doing NaN comparisons. I don't know about the performance implications of this, though, and whether it's better to do or not.

Otherwise though I think that this xor is probably worth both a comment and a test, or otherwise somehow sharing code with the other spot in f32x4.abs which generates an all-ones mask.

Wasmtime GitHub notifications bot (Nov 15 2021 at 16:55):

jlb6740 submitted PR review.

Wasmtime GitHub notifications bot (Nov 15 2021 at 16:55):

jlb6740 created PR review comment:

@alexcrichton @abrown The xorps isn't a new idea. It's what we do in a couple of places. For example here is the sequence that prompted the bug:

(module
  (func (result v128)
    v128.const f32x4 0 0 0 0
    f32x4.abs
    v128.not)
  (export "1" (func 0))
)

0000000000000000 <_wasm_function_0>:
       0:       55                      push   %rbp
       1:       48 89 e5                mov    %rsp,%rbp
       4:       f3 0f 6f 05 24 00 00    movdqu 0x24(%rip),%xmm0        # 30 <_wasm_function_0+0x30>
       b:       00
       c:       0f 57 c9                xorps  %xmm1,%xmm1
       f:       0f c2 c9 00             cmpeqps %xmm1,%xmm1
      13:       66 0f 72 d1 01          psrld  $0x1,%xmm1
      18:       0f 54 c1                andps  %xmm1,%xmm0
      1b:       0f c2 c9 00             cmpeqps %xmm1,%xmm1
      1f:       0f 57 c1                xorps  %xmm1,%xmm0
      22:       48 89 ec                mov    %rbp,%rsp
      25:       5d                      pop    %rbp
      26:       c3                      retq

The instruction at address "c" does an xorps instead of an integer compare and we do this because of the types. Is it really a good idea to override the types used? Should we override types for the abs as well?

Wasmtime GitHub notifications bot (Nov 15 2021 at 17:01):

jlb6740 edited PR review comment.

Wasmtime GitHub notifications bot (Nov 15 2021 at 17:01):

jlb6740 edited PR review comment.

Wasmtime GitHub notifications bot (Nov 15 2021 at 18:47):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Nov 15 2021 at 18:47):

alexcrichton created PR review comment:

Ah yeah I don't know myself which is better. I agree that both should do the same thing, but whether we do an integer compare then move that to floating-point-typed instructions or instead to xor/cmp both with a floating point type I dunno.

FWIW for f32x4.abs LLVM generates this code for x86_64:
.LCPI0_0:
        .long   0x7fffffff                      # float NaN
        .long   0x7fffffff                      # float NaN
        .long   0x7fffffff                      # float NaN
        .long   0x7fffffff                      # float NaN
foo:                                    # @foo
        andps   xmm0, xmmword ptr [rip + .LCPI0_0]
        ret
and the rough v128.not equiavlent for LLVM seems to be:
foo:                                    # @foo
        pcmpeqd xmm1, xmm1
        pxor    xmm0, xmm1
        ret
Although this is in isolation where it's not doing other float ops around there so this isn't necessarily definitive I think.

Wasmtime GitHub notifications bot (Nov 15 2021 at 19:16):

jlb6740 updated PR #3517 from fix-issue-3327 to main.

Wasmtime GitHub notifications bot (Nov 15 2021 at 19:20):

jlb6740 submitted PR review.

Wasmtime GitHub notifications bot (Nov 15 2021 at 19:20):

jlb6740 created PR review comment:

Thanks @alexcrichton So looks like LLVM is using integer compare too. I've made that update. I didn't touch abs because I think that is a separate patch but looks like that could be more efficient as well.

Wasmtime GitHub notifications bot (Nov 15 2021 at 20:03):

abrown submitted PR review.

Wasmtime GitHub notifications bot (Nov 15 2021 at 20:03):

abrown created PR review comment:

I agree that from a Cranelift perspective it is a bit odd to use the types::I32X4 since it may have nothing to do with the types we are lowering here, but I think that is a refactoring problem. We actually don't want to be writing Inst::equals(...) here (and in other places)--we want to write construct_all_ones(...) since that is actually what we are trying to do. construct_all_ones does not exist and I can't remember if there was some other helper that could do this type of thing (and the inverse, construct all zeroes, as well).

Inside such a helper, I think the best x86 code to emit is probably PCMPEQD (the output of Inst::equals(types::I32X4, ...) since that fixes the NaN issue and avoids an extra instruction. I checked to see how much the extra instruction would affect things and my conclusion is that 1) in tight loops the extra instruction could be a factor and 2) the int-to-FP domain switching is not an issue.

Wasmtime GitHub notifications bot (Nov 15 2021 at 20:42):

jlb6740 submitted PR review.

Wasmtime GitHub notifications bot (Nov 15 2021 at 20:42):

jlb6740 created PR review comment:

I commented to @abrown offline .. but my concern was also about mixing integer and float instructions for SIMD. In real codes (not this fuzzing example) they'll likely be other floating point SIMD instructions operating on these registers and it's supposed to be a bad for performance to mix between integer and floating point operations. Not sure where in the manuals but it's mentioned here in this stack overflow on some older system: https://stackoverflow.com/questions/4996384/do-i-get-a-performance-penalty-when-mixing-sse-integer-float-simd-instructions . None-the-less I pushed the patch to hardcode the type.

Wasmtime GitHub notifications bot (Nov 15 2021 at 20:43):

jlb6740 edited PR review comment.

Wasmtime GitHub notifications bot (Nov 15 2021 at 20:48):

jlb6740 submitted PR review.

Wasmtime GitHub notifications bot (Nov 15 2021 at 20:48):

jlb6740 created PR review comment:

@abrown .. Yes construct all ones is exactly what we are inlining here.

Wasmtime GitHub notifications bot (Nov 15 2021 at 21:10):

jlb6740 updated PR #3517 from fix-issue-3327 to main.

Wasmtime GitHub notifications bot (Nov 15 2021 at 23:22):

jlb6740 updated PR #3517 from fix-issue-3327 to main.

Wasmtime GitHub notifications bot (Nov 15 2021 at 23:22):

jlb6740 updated PR #3517 from fix-issue-3327 to main.

Wasmtime GitHub notifications bot (Nov 15 2021 at 23:55):

abrown submitted PR review.

Wasmtime GitHub notifications bot (Nov 16 2021 at 01:04):

jlb6740 updated PR #3517 from fix-issue-3327 to main.

Wasmtime GitHub notifications bot (Nov 16 2021 at 02:04):

jlb6740 updated PR #3517 from fix-issue-3327 to main.

Wasmtime GitHub notifications bot (Nov 16 2021 at 02:47):

jlb6740 merged PR #3517.

Last updated: Apr 18 2025 at 20:03 UTC