alexcrichton opened PR #6399 from alexcrichton:x64-sse2-blend
to bytecodealliance:main
:
This commit avoids the usage of blend-related instructions when the SSE4.1 feature is not available. This means that
bitselect
uses the less optimal lowering and additionally thex86_blendv
instruction is disabled for relaxed simd lowerings based on whether SSE4.1 is available or not (falling back to the deterministic lowering which is supported if it's not available).Additionally the
iabs
implementation for i64x2 was updated to have a non-blend-based lowering, modeled after LLVM.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
alexcrichton requested abrown for a review on PR #6399.
alexcrichton has marked PR #6399 as ready for review.
alexcrichton requested wasmtime-compiler-reviewers for a review on PR #6399.
alexcrichton requested jameysharp for a review on PR #6399.
alexcrichton requested wasmtime-core-reviewers for a review on PR #6399.
alexcrichton updated PR #6399.
abrown submitted PR review:
LGTM; not really sure what to do about my comment about
x86_blendv
so I'll leave it up to you.
abrown submitted PR review:
LGTM; not really sure what to do about my comment about
x86_blendv
so I'll leave it up to you.
abrown created PR review comment:
I'm not sure about this: it seems like we are propagating to more places the x86-specificness of all of this, but maybe I'm missing why this refactoring is needed.
alexcrichton created PR review comment:
The purpose here was to conditionally use the cranelift
x86_blendv
instruction based on what the backend implements. If SSE4.1 is disabled thenx86_blendv
shouldn't be used because it's not available and the fallbackbitselect
implementation is more appropriate I think. Given that need I figured some extra accessors were necessary to plumb this around to allow backends themselves to report whether they implement this instruction or not, although this is only ever going to return true for the x64 backend.I could perhaps change this to something else like
environ.use_x86_for_relaxed_simd(RelaxedSimd::LaneSelect, ty)
or something like that, though. There's a few more instructions coming down the pike for SSSE3 which are gated like this where new methods are added to test whether the instruction is available.As for plumbing around the x86-specific-ness, that's sort of inherent in the relaxed-simd proposal where the semantics are either x86 or the standard simd semantics. I've for now chosen to implement both for Wasmtime to have speedier implementations on x86 (while still preserving determinism in some cases), which is also the reason for leaking x86-specific-ness.
cfallin created PR review comment:
Just to second @alexcrichton 's points above and add context: I think something like this is close to the best we can do; maybe the refactor above is a good idea, I don't have strong opinions on it in particular. When we talked about how to integrate the initial relaxed-SIMD implementation there was a tension between the determinism of CLIF, where every opcode has a strict machine-independent spec of its bitwise output, and the core idea of relaxed-SIMD, which is that some Wasm ops... don't have that. We figured it was best to push the "semantic switch" upward to the translator, above CLIF: so CLIF has different opcodes that encapsulate different kinds of e.g. SIMD blends, every opcode has bit-exact semantics, but x86 may implement only those it has hardware for and machine X implements others and the translator uses only opcodes that are available on the target. (This is like "old Cranelift" with machine-specific opcodes to some degree, the difference being how we think about it -- the opcode is "x86-like semantics" vs. "specifically for x86"). It's not perfect but it seems better than e.g. having a config option that controls what the semantics of opcodes are, or leaving it underspecified.
alexcrichton merged PR #6399.
Last updated: Jan 24 2025 at 00:11 UTC