itsrainy edited PR #6531.
jameysharp submitted PR review:
Seems right to me! I agree that tests are a good idea before merging but assuming those go okay then I think this is good.
jameysharp created PR review comment:
Can this and the other
mov
below be written like this? I'm not sure it makes any difference but it seems a little more clear to me if we can avoid using.into()
so much. I'm also curious if there are other methods named perhapssub_rr
orand_ir
for the other instructions in this sequence.self.asm.mov_rr(tmp, dst, size);
itsrainy created PR review comment:
I'm also curious if there are other methods named perhaps sub_rr or and_ir for the other instructions in this sequence.
there are! I'll change those
itsrainy updated PR #6531.
itsrainy updated PR #6531.
itsrainy updated PR #6531.
saulecabrera submitted PR review:
Left one minor comment, but overall this looks great to me, thanks!
saulecabrera submitted PR review:
Left one minor comment, but overall this looks great to me, thanks!
saulecabrera created PR review comment:
We currently have
impl From<Reg> for Gpr
andimpl From<Reg> for WritableGpr
so you could use those here if you wanted to reduce the boilerplate.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
Would you add a comment that the fallback is based on
MacroAssembler::popcnt32
in https://searchfox.org/mozilla-central/source/js/src/jit/x86-shared/MacroAssembler-x86-shared-inl.h?
jameysharp created PR review comment:
I was pretty confused about why you didn't use
and_ir
, until I dug into the implementation enough to understand that x86 only supports 32-bit immediates forand
, so doing it this way lets the underlying assembler decide whether to useload_constant
into the scratch register or to emit the immediate operand inline.I don't know what to suggest but maybe there's a comment that could go here. Or maybe we should just assume that people reading this are more familiar with Winch idioms than I am. :laughing:
Also, it's a little unfortunate that in the 64-bit case,
masks[1]
gets loaded into the scratch register twice. It would be nice to avoid that, but it would clutter the implementation here and maybe that's not worth doing.
jameysharp created PR review comment:
This is perfectly reasonable as-is but it keeps bothering me that the constant masks are duplicated in this way. One alternative might be:
let masks = [ 0x5555555555555555, // m1 0x3333333333333333, // m2 0x0f0f0f0f0f0f0f0f, // m4 0x0101010101010101, // h01 ]; let (mask, shift_amt) = match size { OperandSize::S64 => (u64::MAX as i64, 56u8), OperandSize::S32 => (u32::MAX as i64, 24u8), };
Then use e.g.
masks[0] & mask
. (Maybe renamemask
though.)Another approach is to generate the constants using bit-twiddling tricks. I don't think this is a _good_ idea but I spent enough time figuring out how it would work that I'm going to write it down anyway. You can divide
u64::MAX
oru32::MAX
by certain constants to get these repeating patterns: specifically,[0x3, 0x5, 0x11, 0xff]
to produce[0x55..., 0x33..., 0x0f..., 0x01...]
respectively.
saulecabrera created PR review comment:
Or maybe we should just assume that people reading this are more familiar with Winch idioms than I am. :laughing:
The scratch register has caused confusion, I agree. This is something that I'd like to improve, so I'm considering adding guards, which will hopefully make its usage traceable and more explicit.
saulecabrera created PR review comment:
I just haven't gotten to it :big_smile:
itsrainy created PR review comment:
Yeah I did have the thought about 0x333 getting loaded in twice, so I had briefly replaced the second
RegImm::imm(masks[0]).into()
withregs::scratch().into()
, but that wouldn't work in the 32-bit case and then trying to work around that seemed like added complexity for not a lot of gain
itsrainy edited PR review comment.
jameysharp created PR review comment:
I don't think it would be terrible to unconditionally call
load_constant
here to put the0x333...
mask in the scratch register, and useand_rr
explicitly. It's one extra instruction in the 32-bit case, but it's only a move-immediate, and the generated code is shorter due to only emitting the constant once.
itsrainy created PR review comment:
I thiiiink I'm inclined to keep it as is, though I don't feel too strongly about that. I initially had the 32-bit and 64-bit code as completely separate branches (this is what spidermonkey and cranelift do) and wasn't sure if I should even combine them in the first place. I kind of like the different constants being explicitly in the code, but I also see that using one to generate the other is tempting. Happy to go either way here
jameysharp created PR review comment:
Yup, I don't feel strongly about it either, so I'm good with leaving it as is. :+1:
itsrainy updated PR #6531.
itsrainy updated PR #6531.
itsrainy merged PR #6531.
Last updated: Dec 23 2024 at 13:07 UTC