jlb6740 opened PR #2602 from x64_lower_with_sse_level_support
to main
:
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
jlb6740 requested cfallin for a review on PR #2602.
jlb6740 assigned PR #2602 to abrown.
jlb6740 assigned PR #2602 to abrown.
jlb6740 unassigned PR #2602.
jlb6740 unassigned PR #2602.
jlb6740 requested cfallin, abrown and bnjbvr for a review on PR #2602.
jlb6740 requested cfallin, abrown and bnjbvr for a review on PR #2602.
bnjbvr submitted PR Review.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Can we avoid considering this to be a "modified" register, when the instruction can properly handle different source and destination registers? This would require changing the
x64_get_regs
, in particular the match case for this particular instruction, but that seems fine for now. And then we could useput_input_in_reg_or_mem
for thesrc
operand, and avoid the move before the instruction.
bnjbvr created PR Review Comment:
We could break the
match
into two: one for the opcode, one for the type; that would avoid some duplication here, since the two matched things are independent.
jlb6740 submitted PR Review.
jlb6740 created PR Review Comment:
Hi @bnjbvr .. The match here is already using two parameters, an opcode and type so I am missing what you mean here. Do you mean I could include the vector types in the same match statement so that I am checking for both types::F32 and types::F32X4, etc. That's how I will interpret it since I think that would be cleaner logic that would avoid duplication at the source level.
jlb6740 updated PR #2602 from x64_lower_with_sse_level_support
to main
:
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
jlb6740 submitted PR Review.
jlb6740 created PR Review Comment:
@bnjbvr latest push maybe does what is suggested here but let me know.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Ah, that's nice! At least it'll make the error for vector opcodes very explicit by panicking.
Sorry for being unclear. I was actually suggesting to break the
match (op, ty) { ... }
into two: first, amatch op
that'll give us theRoundImm
, then amatch ty
that will give us theSseOpcode
. We can do this to make the code more readable, since determining the SSE opcode is an independent task from determining the rounding immediate. Does it make sense?
jlb6740 updated PR #2602 from x64_lower_with_sse_level_support
to main
:
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
jlb6740 has marked PR #2602 as ready for review.
jlb6740 created PR Review Comment:
Hi .. yes, I think I did this now. Two things confused me here .. but one was I thought you wanted me to create a put_input_in_reg_or_mem function as it didn't exist but I think you were just suggesting to use input_to_reg_mem which does exist. The move is avoided now.
jlb6740 submitted PR Review.
jlb6740 updated PR #2602 from x64_lower_with_sse_level_support
to main
:
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
jlb6740 submitted PR Review.
jlb6740 created PR Review Comment:
@bnjbvr .. Hi I got you .. sorry about misunderstanding. I think I've made that improvement now this latest update. May be good to go now.
bnjbvr submitted PR Review.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
nit: not
type
, butopcode
here
bnjbvr created PR Review Comment:
Can you add in the comment that this will cause a panic for SIMD on non-SSE4.1 machines?
bnjbvr created PR Review Comment:
(uber nit) Can we use
use_sse41
here? In theory, there's a difference between having access to a given CPU feature (has_sse41
), and wanting to use it in practice (the user could decide to set it off).
bnjbvr created PR Review Comment:
Maybe this comment is a bit redundant with the code below? I'll let you decide, as you prefer.
bnjbvr created PR Review Comment:
and conversely here
jlb6740 submitted PR Review.
jlb6740 created PR Review Comment:
I've removed it.
jlb6740 submitted PR Review.
jlb6740 created PR Review Comment:
@bnjbvr .. What do you mean here? Why would there be a panic if SSE4.1 is not supported?
jlb6740 submitted PR Review.
jlb6740 updated PR #2602 from x64_lower_with_sse_level_support
to main
:
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Before, the code was conceptually checking
if vector type { use SSE4.1 instructions } else { VM calls }
. Now we've changed the code intoif use SSE4.1 { use SSE4.1 instructions for both scalars and vectors } else { VM calls }
. So non-SSE4.1 vector types aren't covered anymore.Note this is still an improvement compared to the previous situation where we could generate SSE4.1 code on non-SSE4.1 platforms.
The right thing to do would be to implement something equivalent to the ROUND instructions, either as a VM call taking vector types as inputs, or as sequences of other SSE2 opcodes emulating the behavior. Or... we could say that wasm SIMD is only available for SSE4.1+ capable x86 machines :)
jlb6740 submitted PR Review.
jlb6740 created PR Review Comment:
Sorry yes, I see!
jlb6740 updated PR #2602 from x64_lower_with_sse_level_support
to main
:
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
jlb6740 merged PR #2602.
Last updated: Jan 24 2025 at 00:11 UTC