cfallin opened PR #2058 from aarch64-fix-bool
to main
:
It seems that this is actually the correct behavior for bool types wider
thanb1
; some of the vector instruction optimizations depend on bool
lanes representing false and true as all-zeroes and all-ones
respectively. Forb8
..b64
, this results in an extra negation after a
cset
when a bool is produced by anicmp
/fcmp
, but the most common
case (b1
) is unaffected, because an all-ones one-bit value is just
1
.An example of this assumption can be seen here:
Thanks to Joey Gouly of ARM for noting this issue while implementing
SIMD support, and digging into the source (finding the above example) to
determine the correct behavior.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin requested bnjbvr and julian-seward1 for a review on PR #2058.
cfallin requested bnjbvr and julian-seward1 for a review on PR #2058.
cfallin edited PR #2058 from aarch64-fix-bool
to main
:
It seems that this is actually the correct behavior for bool types wider
thanb1
; some of the vector instruction optimizations depend on bool
lanes representing false and true as all-zeroes and all-ones
respectively. Forb8
..b64
, this results in an extra negation after a
cset
when a bool is produced by anicmp
/fcmp
, but the most common
case (b1
) is unaffected, because an all-ones one-bit value is just
1
.An example of this assumption can be seen here:
Thanks to Joey Gouly of ARM for noting this issue while implementing
SIMD support, and digging into the source (finding the above example) to
determine the correct behavior.This PR also makes an unrelated cleanup: it appears that we had two ways of saying
cset
, namelyInst::CSet
andInst::CondSet
. Probably an artifact of our
implementation sprint with multiple people adding instructions at once. Removed
Inst::CondSet
and settled onInst::CSet
.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
bnjbvr submitted PR Review.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
nit: for future proofness, maybe assert
to_bits == 32
here?
bnjbvr created PR Review Comment:
This entire commit should go in the last branch below, if I understand correctly.
bnjbvr created PR Review Comment:
Is the AND actually necessary, that is, are B1 inputs only one bit in practice?
bnjbvr created PR Review Comment:
nit: maybe use packed assignment and destructuring here:
let (imm_ty, alu_op) = if output_bits > 32 { (I64, ALUOp::And64) } else ...
cfallin updated PR #2058 from aarch64-fix-bool
to main
:
It seems that this is actually the correct behavior for bool types wider
thanb1
; some of the vector instruction optimizations depend on bool
lanes representing false and true as all-zeroes and all-ones
respectively. Forb8
..b64
, this results in an extra negation after a
cset
when a bool is produced by anicmp
/fcmp
, but the most common
case (b1
) is unaffected, because an all-ones one-bit value is just
1
.An example of this assumption can be seen here:
Thanks to Joey Gouly of ARM for noting this issue while implementing
SIMD support, and digging into the source (finding the above example) to
determine the correct behavior.This PR also makes an unrelated cleanup: it appears that we had two ways of saying
cset
, namelyInst::CSet
andInst::CondSet
. Probably an artifact of our
implementation sprint with multiple people adding instructions at once. Removed
Inst::CondSet
and settled onInst::CSet
.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin submitted PR Review.
cfallin created PR Review Comment:
Clarified, thanks -- the first case is also semantically a sign extension, but generated specially; noted both cases in this comment.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done! (Could also be smaller than 32, so assert is
<=
.)
cfallin submitted PR Review.
cfallin created PR Review Comment:
Right now it's the case that every
b1
we generate is0
or1
up to the full width of the register, I think, but I'd prefer to make use of that in a followup, as it needs some careful thought and auditing of the other ops. I suppose we could say that bits 1..7 are always clear (so ab1
is stored as a0
or1
in the lowest byte), then this would allow us to use the register-extending subtract format to do a zero-extend and negation at once; I'll think more about this. Thanks for bringing this up!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done.
cfallin merged PR #2058.
Last updated: Nov 22 2024 at 16:03 UTC