wasmtime / PR #6399 x64: Add non-SSE4.1 fallback for blen... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #6399 x64: Add non-SSE4.1 fallback for blen...

Wasmtime GitHub notifications bot (May 17 2023 at 17:28):

alexcrichton opened PR #6399 from alexcrichton:x64-sse2-blend to bytecodealliance:main:

This commit avoids the usage of blend-related instructions when the SSE4.1 feature is not available. This means that bitselect uses the less optimal lowering and additionally the x86_blendv instruction is disabled for relaxed simd lowerings based on whether SSE4.1 is available or not (falling back to the deterministic lowering which is supported if it's not available).

Additionally the iabs implementation for i64x2 was updated to have a non-blend-based lowering, modeled after LLVM.

Wasmtime GitHub notifications bot (May 17 2023 at 17:28):

alexcrichton requested abrown for a review on PR #6399.

Wasmtime GitHub notifications bot (May 17 2023 at 17:29):

alexcrichton has marked PR #6399 as ready for review.

Wasmtime GitHub notifications bot (May 17 2023 at 17:29):

alexcrichton requested wasmtime-compiler-reviewers for a review on PR #6399.

Wasmtime GitHub notifications bot (May 17 2023 at 17:29):

alexcrichton requested jameysharp for a review on PR #6399.

Wasmtime GitHub notifications bot (May 17 2023 at 17:29):

alexcrichton requested wasmtime-core-reviewers for a review on PR #6399.

Wasmtime GitHub notifications bot (May 18 2023 at 21:34):

alexcrichton updated PR #6399.

Wasmtime GitHub notifications bot (May 22 2023 at 23:03):

abrown submitted PR review:

LGTM; not really sure what to do about my comment about x86_blendv so I'll leave it up to you.

Wasmtime GitHub notifications bot (May 22 2023 at 23:03):

abrown submitted PR review:

LGTM; not really sure what to do about my comment about x86_blendv so I'll leave it up to you.

Wasmtime GitHub notifications bot (May 22 2023 at 23:03):

abrown created PR review comment:

I'm not sure about this: it seems like we are propagating to more places the x86-specificness of all of this, but maybe I'm missing why this refactoring is needed.

Wasmtime GitHub notifications bot (May 23 2023 at 13:31):

alexcrichton created PR review comment:

The purpose here was to conditionally use the cranelift x86_blendv instruction based on what the backend implements. If SSE4.1 is disabled then x86_blendv shouldn't be used because it's not available and the fallback bitselect implementation is more appropriate I think. Given that need I figured some extra accessors were necessary to plumb this around to allow backends themselves to report whether they implement this instruction or not, although this is only ever going to return true for the x64 backend.

I could perhaps change this to something else like environ.use_x86_for_relaxed_simd(RelaxedSimd::LaneSelect, ty) or something like that, though. There's a few more instructions coming down the pike for SSSE3 which are gated like this where new methods are added to test whether the instruction is available.

As for plumbing around the x86-specific-ness, that's sort of inherent in the relaxed-simd proposal where the semantics are either x86 or the standard simd semantics. I've for now chosen to implement both for Wasmtime to have speedier implementations on x86 (while still preserving determinism in some cases), which is also the reason for leaking x86-specific-ness.

Wasmtime GitHub notifications bot (May 26 2023 at 14:57):

cfallin created PR review comment:

Just to second @alexcrichton 's points above and add context: I think something like this is close to the best we can do; maybe the refactor above is a good idea, I don't have strong opinions on it in particular. When we talked about how to integrate the initial relaxed-SIMD implementation there was a tension between the determinism of CLIF, where every opcode has a strict machine-independent spec of its bitwise output, and the core idea of relaxed-SIMD, which is that some Wasm ops... don't have that. We figured it was best to push the "semantic switch" upward to the translator, above CLIF: so CLIF has different opcodes that encapsulate different kinds of e.g. SIMD blends, every opcode has bit-exact semantics, but x86 may implement only those it has hardware for and machine X implements others and the translator uses only opcodes that are available on the target. (This is like "old Cranelift" with machine-specific opcodes to some degree, the difference being how we think about it -- the opcode is "x86-like semantics" vs. "specifically for x86"). It's not perfect but it seems better than e.g. having a config option that controls what the semantics of opcodes are, or leaving it underspecified.

Wasmtime GitHub notifications bot (May 26 2023 at 15:47):

alexcrichton merged PR #6399.

Last updated: Apr 18 2025 at 21:03 UTC