jlb6740 opened PR #2887 from x64_implement_packed_popcnt
to main
:
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
You can use
[0x0F; 16]
.
jlb6740 updated PR #2887 from x64_implement_packed_popcnt
to main
.
jlb6740 updated PR #2887 from x64_implement_packed_popcnt
to main
.
abrown submitted PR review.
abrown created PR review comment:
Why don't we just use
popcnt
? Ifpopcnt
andvpopcnt
have the same semantics then we can just usepopcnt
with a vector type?
jlb6740 submitted PR review.
jlb6740 created PR review comment:
It's been a while since I made that decision but it does indeed look like I've created identical lowerings. Perhaps I was duplicating an earlier effort where two instruction conversions were needed due to the input args?? Not sure. In any case, good question and I'll attempt to combine the two and hopefully it will cause not trouble. Thanks!
jlb6740 edited PR review comment.
jlb6740 created PR review comment:
Ahh .. now I see why we had to do that. It has something to do with needing to pop1_with_bitcast in the code_translator. Ultimately the verifier did not like the vector sharing the definition when compiling even though the translations in instructions.rs are identical. Perhaps you know or @cfallin knows but I remember trying this now and consciously implementing a separate translation/lowering because of an issue along these lines:
`Caused by:
0: WebAssembly failed to compile
1: Compilation error: function u0:0(i64 vmctx, i64, i8x16) -> i8x16 wasmtime_system_v {
gv0 = vmctx
gv1 = load.i64 notrap aligned readonly gv0
gv2 = load.i64 notrap aligned gv1
stack_limit = gv2block0(v0: i64, v1: i64, v2: i8x16): @004f v4 = popcnt v2 ;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ; error: inst0 (v4 = popcnt.i8x16 v2): has an invalid controlling type i8x16 @0051 jump block1(v4) block1(v3: i8x16): @0051 return v3 } ; 1 verifier error detected (see above). Compilation aborted. ', /home/jlbirch/wasmtime_jlb6740/target/debug/build/wasmtime-cli-89f47d7a9050963a/out/wast_testsuite_tests.rs:3645:18
note: run with
RUST_BACKTRACE=1
environment variable to display a backtrace`
jlb6740 submitted PR review.
abrown submitted PR review.
abrown created PR review comment:
That's a good find; I think the issue is that the types allowed for
popcnt
are too restrictive.popcnt
uses theiB
type, which only includes scalar integers, but I think we should make it use theInt
type, which allows both scalar and vector integers.
abrown edited PR review comment.
jlb6740 submitted PR review.
jlb6740 created PR review comment:
Ok .. Yes, makes sense. Do we want to just merged this here and refactor that in a separate patch or do it here?? My only thought is fixing a patch that currently works (should work) .. creating some unintended consequence not realizing the restriction to 1b was there for some non-obvious reason. In any case I do suddenly have failures below to check on so I'll fix that first. Will likely continue to be very slow to respond to updates this coming week unfortunately but will get back to normal soon.
jlb6740 edited PR review comment.
cfallin submitted PR review.
cfallin created PR review comment:
I think it's probably better to avoid adding
vpopcnt
in the first place rather than cleaning up in a subsequent PR -- hopefully the changes are minor on the lowering side (basically it should be just a conditional in theOpcode::Popcnt
case, on the type, like the arithmetic operators have). Thanks!
jlb6740 updated PR #2887 from x64_implement_packed_popcnt
to main
.
jlb6740 updated PR #2887 from x64_implement_packed_popcnt
to main
.
abrown created PR review comment:
let ty = ty.unwrap(); if !ty.is_vector() {
abrown submitted PR review.
abrown submitted PR review.
abrown created PR review comment:
Rename
ty
toinput_ty
? Or is this necessary?
abrown submitted PR review.
abrown created PR review comment:
@bnjbvr, what does
ty
mean here?
abrown submitted PR review.
abrown edited PR review comment.
abrown deleted PR review comment.
abrown submitted PR review.
jlb6740 updated PR #2887 from x64_implement_packed_popcnt
to main
.
jlb6740 updated PR #2887 from x64_implement_packed_popcnt
to main
.
jlb6740 merged PR #2887.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
Yeah, it was something along those lines, from peaking at the previous version of this source code, when the input type is narrower than 32-bits, we'd just do the 32-bits version of the instruction, zero-extending the input prior to this.
Last updated: Jan 24 2025 at 00:11 UTC