wasmtime / issue #13405 Compiler should detect equality/i... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #13405 Compiler should detect equality/i...

Wasmtime GitHub notifications bot (May 18 2026 at 21:50):

AngelicosPhosphoros opened issue #13405:

Feature

WASM lacks clear equivalence instruction for v128 vectors. There are multiple ways it can be simulated though.

E.g. (referred as examples 1, 2 and 3)

    local.get 0
    local.get 1
    v128.andnot
    v128.any_true
    i32.eqz

    local.get 0
    local.get 1
    i8x16.eq
    i8x16.all_true

    local.get 0
    local.get 1
    i8x16.ne
    i8x16.any_true
    i32.eqz

Ideally, they would compile to a single VPTEST instruction on x86-64, which puts 1 to the CF flag on equality.
Unfortunately, those sequences compile less than ideal:

Best case is 1 which compiles into two vector instructions instead of one:

<wasm[0]::function[0]::exact_eq_128>:
       0: 55                            pushq   %rbp
       1: 48 89 e5                      movq    %rsp, %rbp
       4: c5 f1 df f0                   vpandn  %xmm0, %xmm1, %xmm6
       8: c4 e2 79 17 f6                vptest  %xmm6, %xmm6
       d: 40 0f 94 c7                   sete    %dil
      11: 40 0f b6 c7                   movzbl  %dil, %eax
      15: 48 89 ec                      movq    %rbp, %rsp
      18: 5d                            popq    %rbp
      19: c3                            retq

Case 2 compiles into 4 instructions:

<wasm[0]::function[1]::exact_eq_128_2>:
      20: 55                            pushq   %rbp
      21: 48 89 e5                      movq    %rsp, %rbp
      24: c5 f9 74 c1                   vpcmpeqb    %xmm1, %xmm0, %xmm0
      28: c5 c9 ef ce                   vpxor   %xmm6, %xmm6, %xmm1
      2c: c5 f9 74 c1                   vpcmpeqb    %xmm1, %xmm0, %xmm0
      30: c4 e2 79 17 c0                vptest  %xmm0, %xmm0
      35: 0f 94 c0                      sete    %al
      38: 0f b6 c0                      movzbl  %al, %eax
      3b: 48 89 ec                      movq    %rbp, %rsp
      3e: 5d                            popq    %rbp
      3f: c3                            retq

Case 3 compiles into 4 instructions too:

<wasm[0]::function[2]::exact_eq_128_3>:
      40: 55                            pushq   %rbp
      41: 48 89 e5                      movq    %rsp, %rbp
      44: c5 f9 74 f1                   vpcmpeqb    %xmm1, %xmm0, %xmm6
      48: c5 f9 76 c0                   vpcmpeqd    %xmm0, %xmm0, %xmm0
      4c: c5 c9 ef c0                   vpxor   %xmm0, %xmm6, %xmm0
      50: c4 e2 79 17 c0                vptest  %xmm0, %xmm0
      55: 41 0f 94 c2                   sete    %r10b
      59: 41 0f b6 c2                   movzbl  %r10b, %eax
      5d: 48 89 ec                      movq    %rbp, %rsp
      60: 5d                            popq    %rbp
      61: c3                            retq

Benefit

This would reduce amount of generated code and instructions which would make them run faster and fit into i-cache better.

It is hard to say what exact performance impact of this would be.

Code and commands used for generating examples

(module $eq_v128_rs.wasm
  (type (func (param v128 v128) (result i32)))
  (func $exact_eq_128 (type 0) (param v128 v128) (result i32)
    local.get 0
    local.get 1
    v128.andnot
    v128.any_true
    i32.eqz
  )
  (func $exact_eq_128_2 (type 0) (param v128 v128) (result i32)
    local.get 0
    local.get 1
    i8x16.eq
    i8x16.all_true
  )

  (func $exact_eq_128_3 (type 0) (param v128 v128) (result i32)
    local.get 0
    local.get 1
    i8x16.ne
    v128.any_true
    i32.eqz
  )

  (export "exact_eq_128" (func $exact_eq_128))
  (export "exact_eq_128_2" (func $exact_eq_128_2))
  (export "exact_eq_128_3" (func $exact_eq_128_3))
)

Commands:

wasmtime compile eq_module.wat -o eq_module.cwasm -Ddebug-info=n -O opt-level=2
llvm-objdump -D .\eq_module.cwasm > eq_module.asm

</details>

Current wasmtime version:

wasmtime --version --verbose
wasmtime 44.0.1

Wasmtime GitHub notifications bot (May 18 2026 at 21:51):

AngelicosPhosphoros edited issue #13405:

Feature

WASM lacks clear equivalence instruction for v128 vectors. There are multiple ways it can be simulated though.

E.g. (referred as examples 1, 2 and 3)

    local.get 0
    local.get 1
    v128.andnot
    v128.any_true
    i32.eqz

    local.get 0
    local.get 1
    i8x16.eq
    i8x16.all_true

    local.get 0
    local.get 1
    i8x16.ne
    i8x16.any_true
    i32.eqz

Ideally, they would compile to a single VPTEST instruction on x86-64, which puts 1 to the CF flag on equality.
Unfortunately, those sequences compile less than ideal:

Best case is 1 which compiles into two vector instructions instead of one:

<wasm[0]::function[0]::exact_eq_128>:
       0: 55                            pushq   %rbp
       1: 48 89 e5                      movq    %rsp, %rbp
       4: c5 f1 df f0                   vpandn  %xmm0, %xmm1, %xmm6
       8: c4 e2 79 17 f6                vptest  %xmm6, %xmm6
       d: 40 0f 94 c7                   sete    %dil
      11: 40 0f b6 c7                   movzbl  %dil, %eax
      15: 48 89 ec                      movq    %rbp, %rsp
      18: 5d                            popq    %rbp
      19: c3                            retq

Case 2 compiles into four instructions:

<wasm[0]::function[1]::exact_eq_128_2>:
      20: 55                            pushq   %rbp
      21: 48 89 e5                      movq    %rsp, %rbp
      24: c5 f9 74 c1                   vpcmpeqb    %xmm1, %xmm0, %xmm0
      28: c5 c9 ef ce                   vpxor   %xmm6, %xmm6, %xmm1
      2c: c5 f9 74 c1                   vpcmpeqb    %xmm1, %xmm0, %xmm0
      30: c4 e2 79 17 c0                vptest  %xmm0, %xmm0
      35: 0f 94 c0                      sete    %al
      38: 0f b6 c0                      movzbl  %al, %eax
      3b: 48 89 ec                      movq    %rbp, %rsp
      3e: 5d                            popq    %rbp
      3f: c3                            retq

Case 3 compiles into four instructions too:

<wasm[0]::function[2]::exact_eq_128_3>:
      40: 55                            pushq   %rbp
      41: 48 89 e5                      movq    %rsp, %rbp
      44: c5 f9 74 f1                   vpcmpeqb    %xmm1, %xmm0, %xmm6
      48: c5 f9 76 c0                   vpcmpeqd    %xmm0, %xmm0, %xmm0
      4c: c5 c9 ef c0                   vpxor   %xmm0, %xmm6, %xmm0
      50: c4 e2 79 17 c0                vptest  %xmm0, %xmm0
      55: 41 0f 94 c2                   sete    %r10b
      59: 41 0f b6 c2                   movzbl  %r10b, %eax
      5d: 48 89 ec                      movq    %rbp, %rsp
      60: 5d                            popq    %rbp
      61: c3                            retq

Benefit

This would reduce amount of generated code and instructions which would make them run faster and fit into i-cache better.

It is hard to say what exact performance impact of this would be.

Code and commands used for generating examples

(module $eq_v128_rs.wasm
  (type (func (param v128 v128) (result i32)))
  (func $exact_eq_128 (type 0) (param v128 v128) (result i32)
    local.get 0
    local.get 1
    v128.andnot
    v128.any_true
    i32.eqz
  )
  (func $exact_eq_128_2 (type 0) (param v128 v128) (result i32)
    local.get 0
    local.get 1
    i8x16.eq
    i8x16.all_true
  )

  (func $exact_eq_128_3 (type 0) (param v128 v128) (result i32)
    local.get 0
    local.get 1
    i8x16.ne
    v128.any_true
    i32.eqz
  )

  (export "exact_eq_128" (func $exact_eq_128))
  (export "exact_eq_128_2" (func $exact_eq_128_2))
  (export "exact_eq_128_3" (func $exact_eq_128_3))
)

Commands:

wasmtime compile eq_module.wat -o eq_module.cwasm -Ddebug-info=n -O opt-level=2
llvm-objdump -D .\eq_module.cwasm > eq_module.asm

</details>

Current wasmtime version:

wasmtime --version --verbose
wasmtime 44.0.1

Wasmtime GitHub notifications bot (May 18 2026 at 21:52):

cfallin commented on issue #13405:

This seems like it should be reasonable to write a left-hand side pattern for in our ISLE lowering patterns -- happy to review a PR if you'd like.

Wasmtime GitHub notifications bot (May 21 2026 at 16:09):

fitzgen added the cranelift:goal:optimize-speed label to Issue #13405.

Wasmtime GitHub notifications bot (May 21 2026 at 16:09):

fitzgen added the wasm-proposal:simd label to Issue #13405.

Wasmtime GitHub notifications bot (May 26 2026 at 22:47):

alexcrichton closed issue #13405:

Feature

WASM lacks clear equivalence instruction for v128 vectors. There are multiple ways it can be simulated though.

E.g. (referred as examples 1, 2 and 3)

    local.get 0
    local.get 1
    v128.andnot
    v128.any_true
    i32.eqz

    local.get 0
    local.get 1
    i8x16.eq
    i8x16.all_true

    local.get 0
    local.get 1
    i8x16.ne
    i8x16.any_true
    i32.eqz

Ideally, they would compile to a single VPTEST instruction on x86-64, which puts 1 to the CF flag on equality.
Unfortunately, those sequences compile less than ideal:

Best case is 1 which compiles into two vector instructions instead of one:

<wasm[0]::function[0]::exact_eq_128>:
       0: 55                            pushq   %rbp
       1: 48 89 e5                      movq    %rsp, %rbp
       4: c5 f1 df f0                   vpandn  %xmm0, %xmm1, %xmm6
       8: c4 e2 79 17 f6                vptest  %xmm6, %xmm6
       d: 40 0f 94 c7                   sete    %dil
      11: 40 0f b6 c7                   movzbl  %dil, %eax
      15: 48 89 ec                      movq    %rbp, %rsp
      18: 5d                            popq    %rbp
      19: c3                            retq

Case 2 compiles into four instructions:

<wasm[0]::function[1]::exact_eq_128_2>:
      20: 55                            pushq   %rbp
      21: 48 89 e5                      movq    %rsp, %rbp
      24: c5 f9 74 c1                   vpcmpeqb    %xmm1, %xmm0, %xmm0
      28: c5 c9 ef ce                   vpxor   %xmm6, %xmm6, %xmm1
      2c: c5 f9 74 c1                   vpcmpeqb    %xmm1, %xmm0, %xmm0
      30: c4 e2 79 17 c0                vptest  %xmm0, %xmm0
      35: 0f 94 c0                      sete    %al
      38: 0f b6 c0                      movzbl  %al, %eax
      3b: 48 89 ec                      movq    %rbp, %rsp
      3e: 5d                            popq    %rbp
      3f: c3                            retq

Case 3 compiles into four instructions too:

<wasm[0]::function[2]::exact_eq_128_3>:
      40: 55                            pushq   %rbp
      41: 48 89 e5                      movq    %rsp, %rbp
      44: c5 f9 74 f1                   vpcmpeqb    %xmm1, %xmm0, %xmm6
      48: c5 f9 76 c0                   vpcmpeqd    %xmm0, %xmm0, %xmm0
      4c: c5 c9 ef c0                   vpxor   %xmm0, %xmm6, %xmm0
      50: c4 e2 79 17 c0                vptest  %xmm0, %xmm0
      55: 41 0f 94 c2                   sete    %r10b
      59: 41 0f b6 c2                   movzbl  %r10b, %eax
      5d: 48 89 ec                      movq    %rbp, %rsp
      60: 5d                            popq    %rbp
      61: c3                            retq

Benefit

This would reduce amount of generated code and instructions which would make them run faster and fit into i-cache better.

It is hard to say what exact performance impact of this would be.

Code and commands used for generating examples

(module $eq_v128_rs.wasm
  (type (func (param v128 v128) (result i32)))
  (func $exact_eq_128 (type 0) (param v128 v128) (result i32)
    local.get 0
    local.get 1
    v128.andnot
    v128.any_true
    i32.eqz
  )
  (func $exact_eq_128_2 (type 0) (param v128 v128) (result i32)
    local.get 0
    local.get 1
    i8x16.eq
    i8x16.all_true
  )

  (func $exact_eq_128_3 (type 0) (param v128 v128) (result i32)
    local.get 0
    local.get 1
    i8x16.ne
    v128.any_true
    i32.eqz
  )

  (export "exact_eq_128" (func $exact_eq_128))
  (export "exact_eq_128_2" (func $exact_eq_128_2))
  (export "exact_eq_128_3" (func $exact_eq_128_3))
)

Commands:

wasmtime compile eq_module.wat -o eq_module.cwasm -Ddebug-info=n -O opt-level=2
llvm-objdump -D .\eq_module.cwasm > eq_module.asm

</details>

Current wasmtime version:

wasmtime --version --verbose
wasmtime 44.0.1

Last updated: Jul 29 2026 at 05:03 UTC