alexcrichton opened PR #5918 from remove-vselect
to main
:
This instruction is documented as selecting lanes based on the "truthy" value of the condition lane, but the current status of the implementation of this instruction is:
x64 - uses the high bit for
f32x4
andf64x2
and otherwise uses the high bit of each byte doing a byte-wise lane select rather than whatever the controlling type is.AArch64 - this is the same as
bitselect
which is a bit-wise selection rather than a lane-wise selection.s390x - this is the same as AArch64, a bit-wise selection rather than lane-wise.
interpreter - the interpreter implements the documented semantics of selecting based on "truthy" values.
Coupled with the status of the implementation is the fact that this instruction is not used by WebAssembly SIMD today either. The only use of this instruction in Cranelift is the nan-canonicalization pass. By moving nan-canonicalization to
bitselect
, since that has the desired semantics, there's no longer any need forvselect
.Given this situation this commit subsqeuently removes
vselect
and all usage of it throughout Cranelift.Closes #5917
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
afonso360 submitted PR review.
afonso360 submitted PR review.
afonso360 created PR review comment:
Same here?
afonso360 created PR review comment:
I think these were removed by mistake right?
alexcrichton updated PR #5918 from remove-vselect
to main
.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Oh dear now I'm not sure how these leaked in...
alexcrichton updated PR #5918 from remove-vselect
to main
.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
Maybe some additional documentation along these lines:
This instruction selects whole values. Use `bitselect` to choose each bit according to a mask.
jameysharp created PR review comment:
Might as well fix the rest of the comment while we're here I guess.
;; we can use x64_blend.
jameysharp created PR review comment:
I think all of these
vselect
/icmp
rewrites are still valid if we match onbitselect
instead, right?as_bool
on a typeset containing vectors of integers produces the same typeset, so the result oficmp
should be the same as its arguments on vectors, I think.
jameysharp created PR review comment:
I'm hoping we can replace
vselect
withbitselect
in this file (and rename it) and have all these tests still pass.
jameysharp created PR review comment:
I think replacing
vselect
withbitselect
almost works forfcmp
as well, except that typesets containing vectors of floats turn into vectors of ints of the same size, butbitselect
requires all the types to be the same.Maybe we should extend
bitselect
to accept int vectors of the right width even for selecting between float vectors? I think that would remove the need for this PR's changes to NaN canonicalization, and probably be more useful in general.
jameysharp created PR review comment:
I see the remaining uses of this
vselect
helper are always fed by the result oficmp
, to implement the min and max instructions. It'd be nice to clean that up. Opening a "good first issue" as a followup for this PR would be fine though.
alexcrichton updated PR #5918 from remove-vselect
to main
.
alexcrichton updated PR #5918 from remove-vselect
to main
.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Would you be ok with a follow-up issue for this?
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
I'll feel a little better about adding this
bitcast
if it's only done right next to thebitselect
call that it feeds into. I hope that'll make it easier to spot for somebody cleaning up the types ofbitselect
later. Also it reduces the diff in this file to pretty much just this closure.let vector_select = |pos: &mut FuncCursor, canon_nan: Value| { let is_nan = pos.ins().bitcast(val_type, MemFlags::new(), is_nan);
jameysharp created PR review comment:
Since
scalar_select
is only called with theis_nan
value from the outer scope, can you revert adding this argument?
jameysharp created PR review comment:
Sure, assuming the changes I've just suggested to NaN canonicalization seem reasonable to you, we can think about
bitselect
semantics in more detail later. If we see a lot more cases offcmp
->bitcast
->bitselect
I might prioritize it higher.For the record, I'm not concerned about losing these two specific optimization rules, since they weren't motivated by seeing this pattern in any benchmarks; we just added them because we could. But it'd be nice if you'd leave a comment behind mentioning something like "TODO: this optimization also applies to vectors but the pattern to match is more complicated".
alexcrichton updated PR #5918 from remove-vselect
to main
.
alexcrichton updated PR #5918 from remove-vselect
to main
.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Ok sounds good. I mostly want to not pile up too much on this PR and I'm not too familiar with cranelift's instruction format builde, so I'd prefer to set that aside for another time rather than doing it here-and-now. I do think it makes sense, though that
bitselect
always takes an integer mask. Most of these optimizations probably won't fire from wasm code though since wasmv128.bitselect
forces everything throughi8x16
which loses most of the type information. That being said these optimizations aren't all that important for wasm probably since if you're using SIMD you're probably using LLVM which would already have done optimizations like this ideally.
alexcrichton has enabled auto merge for PR #5918.
alexcrichton merged PR #5918.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
I've opened https://github.com/bytecodealliance/wasmtime/issues/5962 for this
Last updated: Dec 23 2024 at 13:07 UTC