AngelicosPhosphoros opened issue #13405:
Feature
WASM lacks clear equivalence instruction for
v128vectors. There are multiple ways it can be simulated though.E.g. (referred as examples 1, 2 and 3)
local.get 0 local.get 1 v128.andnot v128.any_true i32.eqzlocal.get 0 local.get 1 i8x16.eq i8x16.all_truelocal.get 0 local.get 1 i8x16.ne i8x16.any_true i32.eqzIdeally, they would compile to a single
VPTESTinstruction on x86-64, which puts 1 to theCFflag on equality.
Unfortunately, those sequences compile less than ideal:Best case is 1 which compiles into two vector instructions instead of one:
<wasm[0]::function[0]::exact_eq_128>: 0: 55 pushq %rbp 1: 48 89 e5 movq %rsp, %rbp 4: c5 f1 df f0 vpandn %xmm0, %xmm1, %xmm6 8: c4 e2 79 17 f6 vptest %xmm6, %xmm6 d: 40 0f 94 c7 sete %dil 11: 40 0f b6 c7 movzbl %dil, %eax 15: 48 89 ec movq %rbp, %rsp 18: 5d popq %rbp 19: c3 retqCase 2 compiles into 4 instructions:
<wasm[0]::function[1]::exact_eq_128_2>: 20: 55 pushq %rbp 21: 48 89 e5 movq %rsp, %rbp 24: c5 f9 74 c1 vpcmpeqb %xmm1, %xmm0, %xmm0 28: c5 c9 ef ce vpxor %xmm6, %xmm6, %xmm1 2c: c5 f9 74 c1 vpcmpeqb %xmm1, %xmm0, %xmm0 30: c4 e2 79 17 c0 vptest %xmm0, %xmm0 35: 0f 94 c0 sete %al 38: 0f b6 c0 movzbl %al, %eax 3b: 48 89 ec movq %rbp, %rsp 3e: 5d popq %rbp 3f: c3 retqCase 3 compiles into 4 instructions too:
<wasm[0]::function[2]::exact_eq_128_3>: 40: 55 pushq %rbp 41: 48 89 e5 movq %rsp, %rbp 44: c5 f9 74 f1 vpcmpeqb %xmm1, %xmm0, %xmm6 48: c5 f9 76 c0 vpcmpeqd %xmm0, %xmm0, %xmm0 4c: c5 c9 ef c0 vpxor %xmm0, %xmm6, %xmm0 50: c4 e2 79 17 c0 vptest %xmm0, %xmm0 55: 41 0f 94 c2 sete %r10b 59: 41 0f b6 c2 movzbl %r10b, %eax 5d: 48 89 ec movq %rbp, %rsp 60: 5d popq %rbp 61: c3 retqBenefit
This would reduce amount of generated code and instructions which would make them run faster and fit into i-cache better.
It is hard to say what exact performance impact of this would be.
Code and commands used for generating examples
<details>
(module $eq_v128_rs.wasm (type (func (param v128 v128) (result i32))) (func $exact_eq_128 (type 0) (param v128 v128) (result i32) local.get 0 local.get 1 v128.andnot v128.any_true i32.eqz ) (func $exact_eq_128_2 (type 0) (param v128 v128) (result i32) local.get 0 local.get 1 i8x16.eq i8x16.all_true ) (func $exact_eq_128_3 (type 0) (param v128 v128) (result i32) local.get 0 local.get 1 i8x16.ne v128.any_true i32.eqz ) (export "exact_eq_128" (func $exact_eq_128)) (export "exact_eq_128_2" (func $exact_eq_128_2)) (export "exact_eq_128_3" (func $exact_eq_128_3)) )Commands:
wasmtime compile eq_module.wat -o eq_module.cwasm -Ddebug-info=n -O opt-level=2 llvm-objdump -D .\eq_module.cwasm > eq_module.asm</details>
Current wasmtime version:
wasmtime --version --verbose wasmtime 44.0.1
AngelicosPhosphoros edited issue #13405:
Feature
WASM lacks clear equivalence instruction for
v128vectors. There are multiple ways it can be simulated though.E.g. (referred as examples 1, 2 and 3)
local.get 0 local.get 1 v128.andnot v128.any_true i32.eqzlocal.get 0 local.get 1 i8x16.eq i8x16.all_truelocal.get 0 local.get 1 i8x16.ne i8x16.any_true i32.eqzIdeally, they would compile to a single
VPTESTinstruction on x86-64, which puts 1 to theCFflag on equality.
Unfortunately, those sequences compile less than ideal:Best case is 1 which compiles into two vector instructions instead of one:
<wasm[0]::function[0]::exact_eq_128>: 0: 55 pushq %rbp 1: 48 89 e5 movq %rsp, %rbp 4: c5 f1 df f0 vpandn %xmm0, %xmm1, %xmm6 8: c4 e2 79 17 f6 vptest %xmm6, %xmm6 d: 40 0f 94 c7 sete %dil 11: 40 0f b6 c7 movzbl %dil, %eax 15: 48 89 ec movq %rbp, %rsp 18: 5d popq %rbp 19: c3 retqCase 2 compiles into four instructions:
<wasm[0]::function[1]::exact_eq_128_2>: 20: 55 pushq %rbp 21: 48 89 e5 movq %rsp, %rbp 24: c5 f9 74 c1 vpcmpeqb %xmm1, %xmm0, %xmm0 28: c5 c9 ef ce vpxor %xmm6, %xmm6, %xmm1 2c: c5 f9 74 c1 vpcmpeqb %xmm1, %xmm0, %xmm0 30: c4 e2 79 17 c0 vptest %xmm0, %xmm0 35: 0f 94 c0 sete %al 38: 0f b6 c0 movzbl %al, %eax 3b: 48 89 ec movq %rbp, %rsp 3e: 5d popq %rbp 3f: c3 retqCase 3 compiles into four instructions too:
<wasm[0]::function[2]::exact_eq_128_3>: 40: 55 pushq %rbp 41: 48 89 e5 movq %rsp, %rbp 44: c5 f9 74 f1 vpcmpeqb %xmm1, %xmm0, %xmm6 48: c5 f9 76 c0 vpcmpeqd %xmm0, %xmm0, %xmm0 4c: c5 c9 ef c0 vpxor %xmm0, %xmm6, %xmm0 50: c4 e2 79 17 c0 vptest %xmm0, %xmm0 55: 41 0f 94 c2 sete %r10b 59: 41 0f b6 c2 movzbl %r10b, %eax 5d: 48 89 ec movq %rbp, %rsp 60: 5d popq %rbp 61: c3 retqBenefit
This would reduce amount of generated code and instructions which would make them run faster and fit into i-cache better.
It is hard to say what exact performance impact of this would be.
Code and commands used for generating examples
<details>
(module $eq_v128_rs.wasm (type (func (param v128 v128) (result i32))) (func $exact_eq_128 (type 0) (param v128 v128) (result i32) local.get 0 local.get 1 v128.andnot v128.any_true i32.eqz ) (func $exact_eq_128_2 (type 0) (param v128 v128) (result i32) local.get 0 local.get 1 i8x16.eq i8x16.all_true ) (func $exact_eq_128_3 (type 0) (param v128 v128) (result i32) local.get 0 local.get 1 i8x16.ne v128.any_true i32.eqz ) (export "exact_eq_128" (func $exact_eq_128)) (export "exact_eq_128_2" (func $exact_eq_128_2)) (export "exact_eq_128_3" (func $exact_eq_128_3)) )Commands:
wasmtime compile eq_module.wat -o eq_module.cwasm -Ddebug-info=n -O opt-level=2 llvm-objdump -D .\eq_module.cwasm > eq_module.asm</details>
Current wasmtime version:
wasmtime --version --verbose wasmtime 44.0.1
cfallin commented on issue #13405:
This seems like it should be reasonable to write a left-hand side pattern for in our ISLE lowering patterns -- happy to review a PR if you'd like.
fitzgen added the cranelift:goal:optimize-speed label to Issue #13405.
fitzgen added the wasm-proposal:simd label to Issue #13405.
alexcrichton closed issue #13405:
Feature
WASM lacks clear equivalence instruction for
v128vectors. There are multiple ways it can be simulated though.E.g. (referred as examples 1, 2 and 3)
local.get 0 local.get 1 v128.andnot v128.any_true i32.eqzlocal.get 0 local.get 1 i8x16.eq i8x16.all_truelocal.get 0 local.get 1 i8x16.ne i8x16.any_true i32.eqzIdeally, they would compile to a single
VPTESTinstruction on x86-64, which puts 1 to theCFflag on equality.
Unfortunately, those sequences compile less than ideal:Best case is 1 which compiles into two vector instructions instead of one:
<wasm[0]::function[0]::exact_eq_128>: 0: 55 pushq %rbp 1: 48 89 e5 movq %rsp, %rbp 4: c5 f1 df f0 vpandn %xmm0, %xmm1, %xmm6 8: c4 e2 79 17 f6 vptest %xmm6, %xmm6 d: 40 0f 94 c7 sete %dil 11: 40 0f b6 c7 movzbl %dil, %eax 15: 48 89 ec movq %rbp, %rsp 18: 5d popq %rbp 19: c3 retqCase 2 compiles into four instructions:
<wasm[0]::function[1]::exact_eq_128_2>: 20: 55 pushq %rbp 21: 48 89 e5 movq %rsp, %rbp 24: c5 f9 74 c1 vpcmpeqb %xmm1, %xmm0, %xmm0 28: c5 c9 ef ce vpxor %xmm6, %xmm6, %xmm1 2c: c5 f9 74 c1 vpcmpeqb %xmm1, %xmm0, %xmm0 30: c4 e2 79 17 c0 vptest %xmm0, %xmm0 35: 0f 94 c0 sete %al 38: 0f b6 c0 movzbl %al, %eax 3b: 48 89 ec movq %rbp, %rsp 3e: 5d popq %rbp 3f: c3 retqCase 3 compiles into four instructions too:
<wasm[0]::function[2]::exact_eq_128_3>: 40: 55 pushq %rbp 41: 48 89 e5 movq %rsp, %rbp 44: c5 f9 74 f1 vpcmpeqb %xmm1, %xmm0, %xmm6 48: c5 f9 76 c0 vpcmpeqd %xmm0, %xmm0, %xmm0 4c: c5 c9 ef c0 vpxor %xmm0, %xmm6, %xmm0 50: c4 e2 79 17 c0 vptest %xmm0, %xmm0 55: 41 0f 94 c2 sete %r10b 59: 41 0f b6 c2 movzbl %r10b, %eax 5d: 48 89 ec movq %rbp, %rsp 60: 5d popq %rbp 61: c3 retqBenefit
This would reduce amount of generated code and instructions which would make them run faster and fit into i-cache better.
It is hard to say what exact performance impact of this would be.
Code and commands used for generating examples
<details>
(module $eq_v128_rs.wasm (type (func (param v128 v128) (result i32))) (func $exact_eq_128 (type 0) (param v128 v128) (result i32) local.get 0 local.get 1 v128.andnot v128.any_true i32.eqz ) (func $exact_eq_128_2 (type 0) (param v128 v128) (result i32) local.get 0 local.get 1 i8x16.eq i8x16.all_true ) (func $exact_eq_128_3 (type 0) (param v128 v128) (result i32) local.get 0 local.get 1 i8x16.ne v128.any_true i32.eqz ) (export "exact_eq_128" (func $exact_eq_128)) (export "exact_eq_128_2" (func $exact_eq_128_2)) (export "exact_eq_128_3" (func $exact_eq_128_3)) )Commands:
wasmtime compile eq_module.wat -o eq_module.cwasm -Ddebug-info=n -O opt-level=2 llvm-objdump -D .\eq_module.cwasm > eq_module.asm</details>
Current wasmtime version:
wasmtime --version --verbose wasmtime 44.0.1
Last updated: Jun 01 2026 at 09:49 UTC