fitzgen commented on issue #4496:
Would be really interested in sightglass benchmark results for this work when you get it in a good spot!
https://github.com/bytecodealliance/sightglass#comparing-a-feature-branch-to-main
elliottt commented on issue #4496:
Would be really interested in sightglass benchmark results for this work when you get it in a good spot!
https://github.com/bytecodealliance/sightglass#comparing-a-feature-branch-to-main
Currently it shows a slowdown when running on the spidermonkey benchmark, but my guess is that it's from boxing
SyntheticAmode
. I'm experimenting with packingSyntheticAmode
into 16 bytes to see if that leads to a similar space savings without the indirection :)
jameysharp commented on issue #4496:
Huh, looking at
PackedAmode
now, I'm pretty sure you can get the compiler to produce code at least this good using a regular Rustenum
. When we talked about this before I hadn't realized every field was aligned to some multiple of 8 bits.The key thing is to make
SyntheticAmode
copy all the cases ofAmode
and just add a couple more. Converting between them requires a slightly tediousmatch
, but this ensures that all five cases are in a single one-byte tag, instead of needing to pack a second tag somewhere. In addition, if I'm not mistaken,Amode
is 4-byte aligned so embedding it in one of the cases ofSyntheticAmode
wastes three bytes in padding.So the result is something like the following. Happily, converting from
Amode
toSyntheticAmode
can't fail.enum SyntheticAmode { ImmReg { simm32: u32, base: Reg, flags: MemFlags }, ImmRegRegShift { simm32: u32, base: Gpr, index: Gpr, shift: u8, flags: MemFlags }, RipRelative { target: MachLabel }, NominalSPOffset { simm32: u32 }, ConstantOffset(VCodeConstant), } impl From<Amode> for SyntheticAmode { fn from(src: Amode) -> SyntheticAmode { match src { Amode::ImmReg { simm32, base, flags } => SyntheticAmode::ImmReg { simm32, base, flags }, ... } } }
One nice thing here is that by default Rust will reorder struct fields to pack them with as little padding as possible. Also, it picks the smallest tag representation that is big enough to distinguish all the enum cases. So I think the representation it'll pick for the above
SyntheticAmode
is actually exactly what you did by hand. Convenient, right?I'd hope that LLVM is clever enough to notice that the above implementation of
from
doesn't change the memory representation, too.
cfallin commented on issue #4496:
One thing @elliottt and I have discussed is that it's likely possible to get this down to 12 bytes, actually, by taking advantage of the fact that VReg indices are limited to 21 bits (because regalloc2 also plays bitpacking tricks...). Then the worst case is 2*vreg (42 bits) + simm32 (32 bits) + tag (3 bits) + flags (5 bits now, up to 8 bits) --> 82..85 bits. To do that we do need custom bitpacking, unfortunately (though if
rustc
supported C-style bitfields in structs... no wait, please don't throw me out the window...).
jameysharp commented on issue #4496:
Can you define a type equivalent to
(VReg, VReg)
that's six bytes, and avoid doing magic bit-packing for anything else?
jameysharp edited a comment on issue #4496:
Can you define a type equivalent to
(VReg, VReg)
that's six bytes, and avoid doing magic bit-packing for anything else?Edit: It'd need to be 2-byte aligned at most, so
[u16; 3]
under the hood, I imagine. And would it make sense to also use that type inAmode
to shave four bytes off there too?
elliottt commented on issue #4496:
Can you define a type equivalent to
(VReg, VReg)
that's six bytes, and avoid doing magic bit-packing for anything else?Edit: It'd need to be 2-byte aligned at most, so
[u16; 3]
under the hood, I imagine. And would it make sense to also use that type inAmode
to shave four bytes off there too?I'd love it if we could find a way to make that work! My experiment on godbolt suggests that the tag bumps the enum back up to 16 bytes: https://rust.godbolt.org/z/1E8s3rns8.
elliottt edited a comment on issue #4496:
Can you define a type equivalent to
(VReg, VReg)
that's six bytes, and avoid doing magic bit-packing for anything else?Edit: It'd need to be 2-byte aligned at most, so
[u16; 3]
under the hood, I imagine. And would it make sense to also use that type inAmode
to shave four bytes off there too?I'd love it if we could find a way to make that work -- the bit shifting implementation is a little less readable :)
My experiment on godbolt suggests that the tag bumps the enum back up to 16 bytes: https://rust.godbolt.org/z/1E8s3rns8.
jameysharp commented on issue #4496:
I miscounted, but my new idea is to pack
base
,index
, and alsoshift
into 6 bytes. I believe x86 shifts are limited to something like two or three bits, right? So six bits should be plenty....I just really want to use a normal enum okay :laughing:
cfallin commented on issue #4496:
@jameysharp if you can come up with a working Godbolt demo with a 12-byte size that'd be great; but I think the main focus right now is just to get something working and evaluate the performance gains of a smaller
Inst
. Then after that we can incrementally refine :-)
jameysharp commented on issue #4496:
I mean, it's easy enough as long as it's safe to use only six bits for the shift: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=114c56c3b49d2465ce46456ef752f68d
So, _is_ it safe to use six or fewer bits for the shift?
cfallin commented on issue #4496:
Yep, the shift amount is always one of 1, 2, 4, 8; so that representation should work. Thanks!
cfallin commented on issue #4496:
(Or rather, the scale is 1, 2, 4, 8; shift is 0, 1, 2, 3, so two bits.)
MaxGraey commented on issue #4496:
Can you define a type equivalent to
(VReg, VReg)
that's six bytes, and avoid doing magic bit-packing for anything else?
Edit: It'd need to be 2-byte aligned at most, so[u16; 3]
under the hood, I imagine. And would it make sense to also use that type inAmode
to shave four bytes off there too?I'd love it if we could find a way to make that work -- the bit shifting implementation is a little less readable :)
My experiment on godbolt suggests that the tag bumps the enum back up to 16 bytes: https://rust.godbolt.org/z/1E8s3rns8.
But you can switch to
union
with packed repr forImmRegRegShift
. It will turn union layout to12
bytes without bits magic:
https://rust.godbolt.org/z/aWW939b7b
jameysharp commented on issue #4496:
But you can switch to
union
with packed repr forImmRegRegShift
. It will turn union layout to12
bytes without bits magic: https://rust.godbolt.org/z/aWW939b7bThere still has to be a tag somewhere. With that
union
we'd have no idea which variant was being used. We'd also have to useunsafe
everywhere we wanted to read from these values, for the same reason.Fortunately, just packing the base, index, and shift fields together, in the one case that uses all three, gets us down to 12 bytes anyway.
elliottt commented on issue #4496:
Closing this PR in favor of #4514.
Last updated: Jan 24 2025 at 00:11 UTC