alexcrichton opened PR #5931 from x64-more-avx
to main
:
One aspect of AVX that I have just recently become aware of is that there's apparently a performance penalty associated with mixing AVX and SSE intructions. One reason for this is that AVX instructions which operate over 128-bit values always zero the higher-than-128-bits of each register operate on. SSE instructions, however, don't do this. This means that false dependencies can be created between instructions because SSE instructions look like they're intentionally preserving higher bits where AVX instructions intentionally zero them. According to this stackoverflow question the processor also tracks whether an instruction has been executed and there's a "scary red line" for mixing AVX/SSE.
On the local meshoptimizer benchmark this PR doesn't actually have any effect on the generate code's performance, or not one that I can measure. In that sense this is more of a hygiene thing than anything else.
Specifically the changes here were to refactor many ISLE helpers that were generating instructions with
SseOpcode.XXX
manually to instead using the instruction helpers which will use the AVX variant if enabled. Additionally more AVX instructions were added for moving data to/from memory and such.I don't think this 100% handles all the SSE instructions cranelift can generate when AVX is enabled, but it at least raises the bar further and removes a bunch of cases of SSE-generated instructions when AVX is enabled.
alexcrichton requested abrown for a review on PR #5931.
abrown submitted PR review.
abrown submitted PR review.
abrown created PR review comment:
Maybe a rename?
(xmm_movrm_sse (SseOpcode.Movdqu) addr data))
Back when I was adding things like this, I tried to keep a convention of
x64_<instruction>
but it's been a while.
abrown created PR review comment:
A rename might help here:
(rule 1 (x64_xor_vector $F32 x y) (x64_xorps x y))
Not sure about the
x64_
part (see my other comment), but I don't think this is exclusively SSE anymore and so we could go withxor_vector
to distinguish from the scalar version.
alexcrichton updated PR #5931 from x64-more-avx
to main
.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
I opted to drop the
_sse
suffix since the otherxmm_*
helpers don't have that, although it might be good to go back and renameXmm*
toSse
andXmm*Vex
toAvx*
perhaps.
alexcrichton has enabled auto merge for PR #5931.
alexcrichton updated PR #5931 from x64-more-avx
to main
.
alexcrichton has disabled auto merge for PR #5931.
alexcrichton has enabled auto merge for PR #5931.
alexcrichton merged PR #5931.
Last updated: Jan 24 2025 at 00:11 UTC