alexcrichton requested abrown for a review on PR #6323.
alexcrichton opened PR #6323 from alexcrichton:x64-insertlane-sse2
to bytecodealliance:main
:
This commit avoids the use of
pinsr*
when SSE 4.1 is enabled by using alternative means of inserting values into vectors.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
alexcrichton requested wasmtime-compiler-reviewers for a review on PR #6323.
abrown submitted PR review:
LGTM but see comments... (Sorry about the delay; I had my comments pending and forgot to submit it).
abrown submitted PR review:
LGTM but see comments... (Sorry about the delay; I had my comments pending and forgot to submit it).
abrown created PR review comment:
Before I look at the rest of the PR, I just wanted to recall something that bit me long ago: if moving from another XMM register, the bottom lane is merged with the rest; but if loading from memory, the upper lanes are zeroed. We just need to make sure Cranelift doesn't load-coalesce into this instruction.
abrown created PR review comment:
The way this reads, we shuffle the lane:
n >> 2
? Why? The shuffle immediates seem to line up with the originaln
, notn >> 2
.
abrown created PR review comment:
So no load-coalescing after all, right?
alexcrichton updated PR #6323.
alexcrichton created PR review comment:
Indeed! The
x64_movss_regmove
helper has a doc block above it forx64_movsd_regmove
covering why load coalesing is disallowed there.
alexcrichton created PR review comment:
I've updated the doc block above this to hopefully help clear up, but the low 2 bits of
n
are where in a 32-bit lane the value goes and the upper 2 bits are which 32-bit lane is chosen for the finalpshufd
(which the documentation was definitely not at all clearly saying)
alexcrichton has enabled auto merge for PR #6323.
alexcrichton merged PR #6323.
Last updated: Jan 24 2025 at 00:11 UTC