jlb6740 opened PR #3031 from extend-add-pairwise-x64
to main
:
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
akirilov-arm submitted PR review.
akirilov-arm created PR review comment:
An alternative approach is to introduce a binary pairwise addition IR operation, e.g.
iadd_pairwise
, which sounds like a more useful primitive operation - it could be used both by other existing Wasm instructions likei32x4.dot_i16x8_s
and possible future additions (some of which have already been discussed - see here for an example); in that caseextended_pairwise_add_signed
would becomeiadd_pairwise(swiden_low, swiden_high)
. We need to do some pattern-matching for optimal AArch64 code generation (which is a single instruction), but I am not sure about other architectures - any thoughts from the people working on the backends?cc @abrown @cfallin @uweigand
akirilov-arm edited PR review comment.
abrown submitted PR review.
abrown created PR review comment:
A general comment: in the past we were implicitly following the "make a new CLIF instruction for each Wasm SIMD instruction" paradigm so a lot of existing CLIF instructions could be split up in the same way you describe above. I think this is a thing that should be done (it just seems to make sense to simplify CLIF even at the risk of a few extra cycles during lowering) but I don't feel very strongly about it so I would be fine with either:
- we do the pattern matching now AND we start a review of the existing SIMD instructions in a separate issue to clean up CLIF
- we just start the review of existing SIMD instructions now and eventually remove the unnecessary CLIF instructions and switch to more pattern matching; this would mean following the "make a new CLIF instruction for each Wasm SIMD instruction" for the last few outstanding instructions
In summary, I don't mind if we do
iadd_pairwise
now or later, but I do think we need to commit to doing the same to the rest of SIMD CLIF.
akirilov-arm created PR review comment:
I forgot to link to the discussion in #2982 for context.
@abrown While I don't have any strong feelings either, in #3044 I have already started with the first of your suggested approaches. In fact, the only Wasm SIMD operations that are not already covered by merged code or pull requests are the family of
i*.extmul_*_i*
operations, for which the specification text itself provides a simple way to express them in terms of other basic operations.I agree that opening an issue for CLIF clean-up is definitely one of the steps forward, no matter what we decide to do.
akirilov-arm submitted PR review.
abrown submitted PR review.
abrown created PR review comment:
Opened https://github.com/bytecodealliance/wasmtime/issues/3045 for this.
cfallin submitted PR review.
cfallin created PR review comment:
Just commented over in #3045 with thoughts on the general question; for this specific case I think it does seem reasonable to split into the pairwise add with extends on the inputs. But let's see what we come to over there!
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 submitted PR review.
jlb6740 created PR review comment:
@akirilov-arm @cfallin Hi .. so the suggestion here is to remove the new IR instructions extended_pairwise_add_signed and extended_pairwise_add_unsigned and instead create one called iadd_pairwise. When lowering .. we would keep the lowering code that is there now, but pattern match for the instruction and type during lowering in order to get the input, or is the suggestion to scrape that lowering code too, actually lower swiden_low and swiden_high where we then are using the output of those particular instructions to be input to iadd_pairwise which needs to then be a new implementation from what is there now, essentially creating a new patch? I look at the change made in #2982 and it really is just avoiding creating another IR instruction while maybe adding a little logic of doing the matching, but keeps the original sequence. Here I am not sure if the suggestion is to go even further?
jlb6740 edited PR review comment.
jlb6740 edited PR review comment.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
akirilov-arm submitted PR review.
akirilov-arm created PR review comment:
IMHO the handling should be equivalent to the one for the
fcvt_from_uint
operation in #2982 - that is, there should be a code path that handlesiadd_pairwise
irrespective of its inputs, to serve as a fallback. Then right before that path there should be a check if the inputs match the pattern; if they do, execution should proceed through your current lowering code forextended_pairwise_add_*
.BTW @sparker-arm already has the common
iadd_pairwise
implementation and the AArch64-specific bits, but he is waiting for this pull request to get merged, so that you do not step on each other's toes. In the meantime, he may be able to offer some advice.
sparker-arm submitted PR review.
sparker-arm created PR review comment:
Yeah, I don't mind posting this up if people are interested.
jlb6740 submitted PR review.
jlb6740 created PR review comment:
Hi @spaker-arm .. can you post here?@akirilov-arm I still am not sure I understand what changes are in mind. Currently in code_translators there is this new instruction extended_pairwise_add_signed called here:
Operator::I16x8ExtAddPairwiseI8x16S => { let a = pop1_with_bitcast(state, I8X16, builder); state.push1(builder.ins().extended_pairwise_add_signed(a)) }
and new instruction extended_pairwise_add_unsigned called here:
Operator::I16x8ExtAddPairwiseI8x16U => { let a = pop1_with_bitcast(state, I8X16, builder); state.push1(builder.ins().extended_pairwise_add_unsigned(a)) }
You are saying to instead just call one instruction called iadd_pairwise and then lower from there? I think this is what you are saying and I'll go ahead and repush this.
jlb6740 edited PR review comment.
jlb6740 edited PR review comment.
jlb6740 edited PR review comment.
jlb6740 edited PR review comment.
jlb6740 edited PR review comment.
jlb6740 edited PR review comment.
sparker-arm submitted PR review.
sparker-arm created PR review comment:
Sure, I'll post the whole thing up later today. But just to point out that we'll know the signedness because of the [u|s]widen inputs, for example:
Operator::I16x8ExtAddPairwiseI8x16S => { let a = pop1_with_bitcast(state, I8X16, builder); let widen_low = builder.ins().swiden_low(a); let widen_high = builder.ins().swiden_high(a); state.push1(builder.ins().iadd_pairwise(widen_low, widen_high)); }
This creates a bit more work in the AArch64 backend, but it does also mean that we can fallback to an ADDP when we don't successfully match the extending inputs. As a side queston, unless I'm mistaken wasm doesn't have a horizontal add - does anyone know why?
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 requested akirilov-arm for a review on PR #3031.
jlb6740 requested cfallin for a review on PR #3031.
sparker-arm submitted PR review.
sparker-arm submitted PR review.
sparker-arm created PR review comment:
I think it would be good to have a clearer definition of the semantics, this currently doesn't tell me which elements make a pair.
sparker-arm created PR review comment:
Sorry, I don't know anything about the ISA... but since you're using 'src' here, shouldn't you be checking that the input to the swiden_low and swiden_high is the same?
cfallin submitted PR review.
cfallin created PR review comment:
These nested conditionals could be written together in one level, I think; this would also avoid the subtle asymmetry (possibly leading to a bug with later additions) that the
else if
attaches only to the outerif
. Something like:
if let (Some(swiden_low), Some(swiden_high)) = (matches_input(...), matches_input(...)) {
cfallin submitted PR review.
cfallin created PR review comment:
+1, this needs to be part of the condition above (where we check input opcodes) such that we fall back to generic codegen otherwise, as it's perfectly legal to pairwise-add extends of two different values.
cfallin created PR review comment:
At this point we can now remove
x64_should_panic
entirely, since it always returns false, right?
cfallin created PR review comment:
Are these still the old opcodes?
cfallin created PR review comment:
I think we need an
else
branch here to handle a pairwise add without the extends, no? Or at least anunimplemented!()
if we don't support that opcode by itself.
jlb6740 submitted PR review.
jlb6740 created PR review comment:
Yes .. I was not sure if we wanted to leave it in for future instructions .. relaxed SIMD maybe? But at the very least I was going to remove the match here. I can just remove the function and find the call and remove that to.
jlb6740 submitted PR review.
jlb6740 created PR review comment:
@sparker-arm @cfallin .. This is exactly what I was thinking about. This is the tension between treating a clif as a 1:1 wasm instruction mapping versus the decomposition approach. I didn't add the check because as noted, it is perfect legal for swiden_low and swiden_high to have to not be the same per the definition of iadd_pairwise. Here, we are not lowering ext_addpairwise which would require the condition that the input be the same ... we are lowering iadd_pairwise. I was concerned that some exercise or fuzz testing or something lowering just this cliff instruction could be invalidated incorrectly. I can add the check as I want this to merge, but I do think we need to think about or even audit the consistency of similar checks and whether they map to the clif instruction that we are actually lowering or the wasm instruction that we are intending to lower.
jlb6740 submitted PR review.
jlb6740 created PR review comment:
Glad you caught this. Yes, was going to do one more once through but may have missed this. Surprised it doesn't warn?
jlb6740 edited PR review comment.
cfallin submitted PR review.
cfallin created PR review comment:
Here, we are not lowering ext_addpairwise which would require the condition that the input be the same ... we are lowering iadd_pairwise
Well, that's true in the outer scope, but the instructions that this case is emitting are specifically for the same-input case, no?
IMHO it's fine for now to fall back to
unimplemented!()
if we know for sure that we won't generate aniadd_pairwise
with other (arbitrary) inputs, but at some point we should fill in that case too, for completeness.should audit ...
I agree! I have lots of thoughts on verification that this text box is too small to contain :-) IMHO this is the next big effort after getting current projects (including some sort of isel DSL) landed.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, yeah, I think it's fine to remove the whole helper and put it back later if we need it.
cfallin submitted PR review.
cfallin created PR review comment:
Me too; @abrown I guess we're not building the CLIF interpreter by default in any CI target?
abrown submitted PR review.
abrown created PR review comment:
I thought we were: there are
test interpret
filetest that should use it and @afonso360 has a fuzz target that should use it.
abrown edited PR review comment.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 requested cfallin for a review on PR #3031.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
I don't think this is testing what we want to test -- this is just checking that we grabbed input 0 on each of the swiden ops (which is always true by construction, since we built
swiden_input
just above).What we really want is to
put_input_in_reg(ctx, swiden_input[1])
, and then assert that we have the same register, I think.Also, if the inputs do not match, we should fail more gracefully than with an assert, I think; asserts imply a compiler bug, whereas the different-inputs case is really just unimplemented behavior, right? So perhaps we could do (after getting
src0
andsrc1
regs)if src0 != src1 { unimplemented!("iadd_pairwise not implemented for general case with different inputs"); }
or something like that?
cfallin created PR review comment:
s/adjcent/adjacent/
and I think s/lines/lanes/
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 submitted PR review.
jlb6740 created PR review comment:
Sorry .. Yes. When I first did this I was not sure if that's what we wanted here and forgot to highlight that concern. Good catch. Also updated for the unimplemented vs assert comment.
jlb6740 updated PR #3031 from extend-add-pairwise-x64
to main
.
jlb6740 requested cfallin for a review on PR #3031.
jlb6740 requested abrown for a review on PR #3031.
cfallin submitted PR review.
cfallin merged PR #3031.
cfallin submitted PR review.
Last updated: Jan 24 2025 at 00:11 UTC