afonso360 opened PR #4520 from vec-shift
to main
:
:wave: Hey,
In #4424 @uweigand reported that we have different semantics for scalar and SIMD shifts, which is unusual in our instruction set.
This PR corrects the AArch64 and x86 backends to mask the shift amount before performing SIMD shifts. The s390x should already follow this behavior.
It also reorganizes the
ishl
/ushl
/sshl
tests into separate files. Since we had a bunch of duplicate tests in hard to find files that were somewhat unrelated.I tried to keep changes contained in separate commits so that it's easier to review.
Fixes #4424
CC: @cfallin @abrown @uweigand @akirilov-arm
afonso360 edited PR #4520 from vec-shift
to main
:
:wave: Hey,
In #4424 @uweigand reported that we have different semantics for scalar and SIMD shifts, which is unusual in our instruction set.
This PR corrects the AArch64 and x86 backends to mask the shift amount before performing SIMD shifts. The s390x backend should already follow this behavior.
It also reorganizes the
ishl
/ushl
/sshl
tests into separate files. Since we had a bunch of duplicate tests in hard to find files that were somewhat unrelated.I tried to keep changes contained in separate commits so that it's easier to review.
Fixes #4424
CC: @cfallin @abrown @uweigand @akirilov-arm
jameysharp submitted PR review.
akirilov-arm created PR review comment:
Just a quick comment - I am surprised that passing
$I32
tosub
works forI64X2
vectors, since the shift amount should end up being a large positive number (or 0) instead of a small negative number.
akirilov-arm submitted PR review.
akirilov-arm submitted PR review.
akirilov-arm created PR review comment:
Ditto.
akirilov-arm created PR review comment:
PR #4521 should remove these unnecessary instructions.
afonso360 updated PR #4520 from vec-shift
to main
.
afonso360 submitted PR review.
afonso360 created PR review comment:
I'm guessing
dup
will sign extend? That's the only way I can see this working, but I can't find any mention of sign extending in the docs.I've switched to a 64 bit sub.
afonso360 submitted PR review.
afonso360 created PR review comment:
:+1: Will rebase once that lands
akirilov-arm submitted PR review.
akirilov-arm created PR review comment:
Ah, now I remember why the original code works and is in fact correct, but I still like the usage of
$I64
- I forgot that the vector shifts used only the least significant byte of the shift amount, so there is in fact masking, just not the kind we need.
afonso360 updated PR #4520 from vec-shift
to main
.
akirilov-arm submitted PR review.
akirilov-arm submitted PR review.
akirilov-arm created PR review comment:
Could we add a
debug_assert!()
thatty.lane_bits()
is a power of 2?
akirilov-arm submitted PR review.
afonso360 updated PR #4520 from vec-shift
to main
.
akirilov-arm submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Likewise here, let's update this comment
cfallin created PR review comment:
There is a
.is_power_of_two()
method on the integer types that makes this clearer, IMHO (internally I suspect it uses exactly this idiom).
cfallin created PR review comment:
This comment is slightly out of date now ("can each use a single instruction, once the shift amount is masked")
cfallin submitted PR review.
afonso360 updated PR #4520 from vec-shift
to main
.
jameysharp has enabled auto merge for PR #4520.
jameysharp merged PR #4520.
Last updated: Dec 23 2024 at 12:05 UTC