Stream: git-wasmtime

Topic: wasmtime / PR #6531 Add i32.popcnt and i64.popcnt to winch


view this post on Zulip Wasmtime GitHub notifications bot (Jun 13 2023 at 22:18):

itsrainy edited PR #6531.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 13 2023 at 22:42):

jameysharp submitted PR review:

Seems right to me! I agree that tests are a good idea before merging but assuming those go okay then I think this is good.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 13 2023 at 22:42):

jameysharp created PR review comment:

Can this and the other mov below be written like this? I'm not sure it makes any difference but it seems a little more clear to me if we can avoid using .into() so much. I'm also curious if there are other methods named perhaps sub_rr or and_ir for the other instructions in this sequence.

            self.asm.mov_rr(tmp, dst, size);

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 00:57):

itsrainy created PR review comment:

I'm also curious if there are other methods named perhaps sub_rr or and_ir for the other instructions in this sequence.

there are! I'll change those

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 00:58):

itsrainy updated PR #6531.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 15:22):

itsrainy updated PR #6531.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 18:48):

itsrainy updated PR #6531.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 19:16):

saulecabrera submitted PR review:

Left one minor comment, but overall this looks great to me, thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 19:16):

saulecabrera submitted PR review:

Left one minor comment, but overall this looks great to me, thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 19:16):

saulecabrera created PR review comment:

We currently have impl From<Reg> for Gpr and impl From<Reg> for WritableGpr so you could use those here if you wanted to reduce the boilerplate.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 19:29):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 19:29):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 19:29):

jameysharp created PR review comment:

Would you add a comment that the fallback is based on MacroAssembler::popcnt32 in https://searchfox.org/mozilla-central/source/js/src/jit/x86-shared/MacroAssembler-x86-shared-inl.h?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 19:29):

jameysharp created PR review comment:

I was pretty confused about why you didn't use and_ir, until I dug into the implementation enough to understand that x86 only supports 32-bit immediates for and, so doing it this way lets the underlying assembler decide whether to use load_constant into the scratch register or to emit the immediate operand inline.

I don't know what to suggest but maybe there's a comment that could go here. Or maybe we should just assume that people reading this are more familiar with Winch idioms than I am. :laughing:

Also, it's a little unfortunate that in the 64-bit case, masks[1] gets loaded into the scratch register twice. It would be nice to avoid that, but it would clutter the implementation here and maybe that's not worth doing.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 19:29):

jameysharp created PR review comment:

This is perfectly reasonable as-is but it keeps bothering me that the constant masks are duplicated in this way. One alternative might be:

            let masks = [
                0x5555555555555555, // m1
                0x3333333333333333, // m2
                0x0f0f0f0f0f0f0f0f, // m4
                0x0101010101010101, // h01
            ];
            let (mask, shift_amt) = match size {
                OperandSize::S64 => (u64::MAX as i64, 56u8),
                OperandSize::S32 => (u32::MAX as i64, 24u8),
            };

Then use e.g. masks[0] & mask. (Maybe rename mask though.)

Another approach is to generate the constants using bit-twiddling tricks. I don't think this is a _good_ idea but I spent enough time figuring out how it would work that I'm going to write it down anyway. You can divide u64::MAX or u32::MAX by certain constants to get these repeating patterns: specifically, [0x3, 0x5, 0x11, 0xff] to produce [0x55..., 0x33..., 0x0f..., 0x01...] respectively.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 19:41):

saulecabrera created PR review comment:

Or maybe we should just assume that people reading this are more familiar with Winch idioms than I am. :laughing:

The scratch register has caused confusion, I agree. This is something that I'd like to improve, so I'm considering adding guards, which will hopefully make its usage traceable and more explicit.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 19:42):

saulecabrera created PR review comment:

I just haven't gotten to it :big_smile:

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 19:42):

itsrainy created PR review comment:

Yeah I did have the thought about 0x333 getting loaded in twice, so I had briefly replaced the second RegImm::imm(masks[0]).into() with regs::scratch().into(), but that wouldn't work in the 32-bit case and then trying to work around that seemed like added complexity for not a lot of gain

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 19:43):

itsrainy edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 19:48):

jameysharp created PR review comment:

I don't think it would be terrible to unconditionally call load_constant here to put the 0x333... mask in the scratch register, and use and_rr explicitly. It's one extra instruction in the 32-bit case, but it's only a move-immediate, and the generated code is shorter due to only emitting the constant once.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 19:49):

itsrainy created PR review comment:

I thiiiink I'm inclined to keep it as is, though I don't feel too strongly about that. I initially had the 32-bit and 64-bit code as completely separate branches (this is what spidermonkey and cranelift do) and wasn't sure if I should even combine them in the first place. I kind of like the different constants being explicitly in the code, but I also see that using one to generate the other is tempting. Happy to go either way here

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 19:55):

jameysharp created PR review comment:

Yup, I don't feel strongly about it either, so I'm good with leaving it as is. :+1:

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 20:02):

itsrainy updated PR #6531.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 20:03):

itsrainy updated PR #6531.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 14 2023 at 22:20):

itsrainy merged PR #6531.


Last updated: Jan 24 2025 at 00:11 UTC