jlb6740 edited PR #2581 from fix-simd-flt-nan-bug-for-2432
to main
:
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
jlb6740 updated PR #2581 from fix-simd-flt-nan-bug-for-2432
to main
:
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin submitted PR Review.
jlb6740 merged PR #2581.
abrown submitted PR Review.
abrown created PR Review Comment:
Looks like this should have been
FloatCC::UnorderedOrGreaterThanOrEqual
which usesFcmpImm::UnorderedOrGreaterThanOrEqual
to encode to the0x05
immediate. Using that might avoid the extraXORPS
.
abrown created PR Review Comment:
In the old backend we were using
PCMPEQB
to generate the "all ones vector" so this wasn't a problem, but I think in this new backend we should check around and make sure no more of the "all ones vectors" we create will have this same issue. Before Christmas I was working on a PR to simplify constant emission so that we only have one place to check... looks like I should have finished that.
abrown submitted PR Review.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Yes, my understanding (after some more reading) is that the i8x16 (pcmpeqb) variant is more canonical; would prefer that over reasoning about FP comparison edge cases. +1 to centralizing this logic!
jlb6740 submitted PR Review.
jlb6740 created PR Review Comment:
UnorderedOrGreaterThanOrEqual
Yes, I suppose any unordered instruction that also checks for equality should work. Just depends on what that actually lowers to and the supported CPU feature. That encoding to 0x05 ( NLT_US) looks good except it signals. Not really sure what that means for cranelift .. maybe @cfallin @abrown you have comment. Perhaps instead we'd want to encode as 0x15 or NLT_UQ which does not signal.
jlb6740 submitted PR Review.
jlb6740 created PR Review Comment:
abrown submitted PR Review.
abrown created PR Review Comment:
That does look better :+1:
Last updated: Jan 24 2025 at 00:11 UTC