abrown opened PR #2699 from i64x2-abs
to main
:
This instruction has a single instruction lowering in AVX512F/VL and a three instruction lowering in AVX but neither is currently supported in the x64 backend. To implement this, we instead subtract the vector from 0 and use a blending instruction to pick the lanes containing the absolute value.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
abrown submitted PR Review.
abrown created PR Review Comment:
@cfallin, I am suspicious about the
src.clone()
above as well as theWritable::from_reg(regs::xmm0())
... am I breaking anything?
abrown requested cfallin for a review on PR #2699.
abrown updated PR #2699 from i64x2-abs
to main
.
cfallin submitted PR Review.
cfallin created PR Review Comment:
I think this should be fine -- cloning the source is harmless (its live range will extend until its last use), and the explicit
Writable
coercion is fine with a fixed reg (we don't have a helper likewritable_xmm0()
). And no issues with the fixed reg use vs. src -- its live-range overlaps with src so the register allocator will know to move src elsewhere if originally in xmm0.
cfallin submitted PR Review.
abrown merged PR #2699.
Last updated: Dec 23 2024 at 12:05 UTC