itsrainy opened PR #6531 from bytecodealliance:rainy/winch-popcnt
to bytecodealliance:main
.
fitzgen requested saulecabrera for a review on PR #6531.
saulecabrera submitted PR review:
Thanks for putting this together! The current changes look good to me. I have one thought/comment though: are you planning on providing a fallback implementation for
popcnt
if thehas_popcnt
flag is not enabled, similar to what Cranelift provides? If it's too burdensome to do as part of this PR a TODO is also totally fine I think, but we might want to consider updating the fuzzing configuration so that it always enables thehas_popcnt
flag to avoid failures when fuzzing winch. Even though we only enable the fuzzer locally, it's still useful for verifying our changes.Also, you might also want to add
<I32|I64>Popcnt
here https://github.com/bytecodealliance/wasmtime/blob/main/fuzz/fuzz_targets/differential.rs#L335
itsrainy updated PR #6531.
saulecabrera created PR review comment:
If I'm not wrong, in the case of a fallback we'd need extra temporary register because the scratch is clobbered by all the calls to
load_constant
right?
saulecabrera submitted PR review:
I think that it'd be nice to save a couple of instructions if possible, so I'm leaning towards the second fallback. This is also how SpiderMonkey handles this case https://searchfox.org/mozilla-central/source/js/src/jit/x86-shared/MacroAssembler-x86-shared-inl.h#110
I left a couple of extra comments, but feel free to ignore if those are things that you were already thinking about!
saulecabrera submitted PR review:
I think that it'd be nice to save a couple of instructions if possible, so I'm leaning towards the second fallback. This is also how SpiderMonkey handles this case https://searchfox.org/mozilla-central/source/js/src/jit/x86-shared/MacroAssembler-x86-shared-inl.h#110
I left a couple of extra comments, but feel free to ignore if those are things that you were already thinking about!
saulecabrera created PR review comment:
Could we replace the
emit
calls in this function with the emit functions in the assembler? That has the added benefit that it already handles constant loading based on the operand size.
saulecabrera created PR review comment:
If that's the case, perhaps we can pass in a mutable reference to the
CodeGenContext
toMacroAssembler::popcnt
and do the dispatching at that level and if we need to emit the fallback, we can request an extra temporary register viaCodeGenContext::any_gpr
? Similar to to the implementation ofclz
https://github.com/bytecodealliance/wasmtime/blob/main/winch/codegen/src/isa/x64/masm.rs#L343
itsrainy created PR review comment:
scratch is clobbered by all the calls to load_constant right?
Oh this is good to know! I didn't know that. I am still learning the relationships between the different reg types and how their different abstraction layers are used, so if it seems like I'm doing something weird, please let me know :)
I'll take the approach in your second comment!
itsrainy created PR review comment:
I think so. So, for instance I could replace this:
self.emit(Inst::AluRmiR { size: size.into(), op: AluRmiROpcode::Sub, src1: reg.into(), src2: masked1.into(), dst: reg.into(), });
with something like this:
self.sub_rr(reg, masked, size);
right? I noticed some of the instructions don't have corresponding functions on the assembler (
shiftr
,and
), should I add functions for those?
saulecabrera created PR review comment:
Yeah exactly.
For
shift
we haveshift_ir
andshift_rr
in the assembler and forand
we haveand_rr
andand_ir
, I think that should cover all the cases for the lowering here. But if it doesn't feel free to make any additions to the assembler!
itsrainy updated PR #6531.
itsrainy updated PR #6531.
itsrainy updated PR #6531.
itsrainy has marked PR #6531 as ready for review.
itsrainy requested jameysharp for a review on PR #6531.
itsrainy requested wasmtime-fuzz-reviewers for a review on PR #6531.
itsrainy requested wasmtime-compiler-reviewers for a review on PR #6531.
Last updated: Dec 23 2024 at 12:05 UTC