itsrainy edited PR #6531.
jameysharp submitted PR review:
Seems right to me! I agree that tests are a good idea before merging but assuming those go okay then I think this is good.
jameysharp created PR review comment:
Can this and the other
movbelow be written like this? I'm not sure it makes any difference but it seems a little more clear to me if we can avoid using.into()so much. I'm also curious if there are other methods named perhapssub_rrorand_irfor the other instructions in this sequence.self.asm.mov_rr(tmp, dst, size);
itsrainy created PR review comment:
I'm also curious if there are other methods named perhaps sub_rr or and_ir for the other instructions in this sequence.
there are! I'll change those
itsrainy updated PR #6531.
itsrainy updated PR #6531.
itsrainy updated PR #6531.
saulecabrera submitted PR review:
Left one minor comment, but overall this looks great to me, thanks!
saulecabrera submitted PR review:
Left one minor comment, but overall this looks great to me, thanks!
saulecabrera created PR review comment:
We currently have
impl From<Reg> for Gprandimpl From<Reg> for WritableGprso you could use those here if you wanted to reduce the boilerplate.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
Would you add a comment that the fallback is based on
MacroAssembler::popcnt32in https://searchfox.org/mozilla-central/source/js/src/jit/x86-shared/MacroAssembler-x86-shared-inl.h?
jameysharp created PR review comment:
I was pretty confused about why you didn't use
and_ir, until I dug into the implementation enough to understand that x86 only supports 32-bit immediates forand, so doing it this way lets the underlying assembler decide whether to useload_constantinto the scratch register or to emit the immediate operand inline.I don't know what to suggest but maybe there's a comment that could go here. Or maybe we should just assume that people reading this are more familiar with Winch idioms than I am. :laughing:
Also, it's a little unfortunate that in the 64-bit case,
masks[1]gets loaded into the scratch register twice. It would be nice to avoid that, but it would clutter the implementation here and maybe that's not worth doing.
jameysharp created PR review comment:
This is perfectly reasonable as-is but it keeps bothering me that the constant masks are duplicated in this way. One alternative might be:
let masks = [ 0x5555555555555555, // m1 0x3333333333333333, // m2 0x0f0f0f0f0f0f0f0f, // m4 0x0101010101010101, // h01 ]; let (mask, shift_amt) = match size { OperandSize::S64 => (u64::MAX as i64, 56u8), OperandSize::S32 => (u32::MAX as i64, 24u8), };Then use e.g.
masks[0] & mask. (Maybe renamemaskthough.)Another approach is to generate the constants using bit-twiddling tricks. I don't think this is a _good_ idea but I spent enough time figuring out how it would work that I'm going to write it down anyway. You can divide
u64::MAXoru32::MAXby certain constants to get these repeating patterns: specifically,[0x3, 0x5, 0x11, 0xff]to produce[0x55..., 0x33..., 0x0f..., 0x01...]respectively.
saulecabrera created PR review comment:
Or maybe we should just assume that people reading this are more familiar with Winch idioms than I am. :laughing:
The scratch register has caused confusion, I agree. This is something that I'd like to improve, so I'm considering adding guards, which will hopefully make its usage traceable and more explicit.
saulecabrera created PR review comment:
I just haven't gotten to it :big_smile:
itsrainy created PR review comment:
Yeah I did have the thought about 0x333 getting loaded in twice, so I had briefly replaced the second
RegImm::imm(masks[0]).into()withregs::scratch().into(), but that wouldn't work in the 32-bit case and then trying to work around that seemed like added complexity for not a lot of gain
itsrainy edited PR review comment.
jameysharp created PR review comment:
I don't think it would be terrible to unconditionally call
load_constanthere to put the0x333...mask in the scratch register, and useand_rrexplicitly. It's one extra instruction in the 32-bit case, but it's only a move-immediate, and the generated code is shorter due to only emitting the constant once.
itsrainy created PR review comment:
I thiiiink I'm inclined to keep it as is, though I don't feel too strongly about that. I initially had the 32-bit and 64-bit code as completely separate branches (this is what spidermonkey and cranelift do) and wasn't sure if I should even combine them in the first place. I kind of like the different constants being explicitly in the code, but I also see that using one to generate the other is tempting. Happy to go either way here
jameysharp created PR review comment:
Yup, I don't feel strongly about it either, so I'm good with leaving it as is. :+1:
itsrainy updated PR #6531.
itsrainy updated PR #6531.
itsrainy merged PR #6531.
Last updated: Dec 13 2025 at 19:03 UTC