alexcrichton opened PR #5841 from avx-unalign
to main
:
This commit unlocks the ability to fold loads into operands of SSE and AVX instructions. This is beneficial for both function size when it happens in addition to being able to reduce register pressure. Previously this was not done because most SSE instructions require memory to be aligned. AVX instructions, however, do not have alignment requirements.
The solution implemented here is one recommended by Chris which is to add a new
XmmMemAligned
newtype wrapper aroundXmmMem
. All SSE instructions are now annotated as requiring anXmmMemAligned
operand except for a new new instruction styles used specifically for instructions that don't require alignment (e.g.movdqu
,*sd
, and*ss
instructions). All existing instruction helpers continue to takeXmmMem
, however. This way if an AVX lowering is chosen it can be used as-is. If an SSE lowering is chosen, however, then an automatic conversion fromXmmMem
toXmmMemAligned
kicks in. This automatic conversion only fails for unaligned addresses in which case a load instruction is emitted and the operand becomes a temporary register instead. A number of priorXmm
arguments have now been converted toXmmMem
as well.One change from this commit is that loading an unaligned operand for an SSE instruction previously would use the "correct type" of load, e.g.
movups
for f32x4 ormovup
for f64x2, but now the loading happens in a context without type information so themovdqu
instruction is generated. According to [this stack overflow question][question] it looks like modern processors won't penalize this "wrong" choice of type when the operand is then used for f32 or f64 oriented instructions.Finally this commit improves some reuse of logic in the
put_in_*_mem*
helper to share code withsinkable_load
and avoid duplication. With this in place some various ISLE rules have been updated as well.In the tests it can be seen that AVX-instructions are now automatically load-coalesced and use memory operands in a few cases.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
alexcrichton updated PR #5841 from avx-unalign
to main
.
alexcrichton updated PR #5841 from avx-unalign
to main
.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Noting here also a comment I left below where this is designated an implicit conversion: it definitely looks like something else at first sight (my comment on my first pass through this PR here was "can we call this
assert_xmm_mem_aligned
to make it apparent what this conversion is doing?"). It implicitly emits a load if the arg is actually in memory, which is why it can trivially guarantee alignment of any returned memory op (because it never returns a memory op). We should leave a comment about this I think.
cfallin created PR review comment:
Here we should have a comment I think -- it looks wrong at first sight, as in general we shouldn't be able to take an unaligned thing and coerce it to aligned. The key is that the constructor actually emits a load and returns a register-kind value instead, so we (trivially) avoid unalignment by just never carrying through a memory operand.
cfallin created PR review comment:
No
aligned:true
on theimm
variant?
alexcrichton updated PR #5841 from avx-unalign
to main
.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
I... don't know how the macro previously compiled. It refers to the
$aligned
variable which was in an entirely different repetition clause and somehow worked? Anyway added!
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Good point! I've updated the comment here and on the
(convert ...)
items added too.
cfallin submitted PR review.
cfallin has enabled auto merge for PR #5841.
cfallin merged PR #5841.
Last updated: Jan 24 2025 at 00:11 UTC