cfallin edited issue #10719:
See the following existing compilation test:
A constant from the post-function pool is being loaded into
%xmm1, but the instruction that would use itpaddusb, then re-loads it as it writes into%xmm1. Now, themovdquseems correct in one sense: memory addresses to packed instructions must be 128-bit aligned so, when converting anXmmMemAlignedto anXmmMem, Cranelift inserts an unaligned load. But the direct use bypaddusbmight also be correct: we may be aligning the start of the constant pool and so this may be just fine.Either way, something is not quite right, though:
- if the constant pool is _not_ aligned or this can't be communicated to the instruction,
paddusbshould not be using the address directly- if the constant pool _is_ aligned and communicated properly, then the
movdquis redundant and should be removedWhich is it? I'm leaning towards the latter because of sequences like the following:
Those instructions have no trouble accessing the pool by aligned address... or are they incorrect?
abrown commented on issue #10719:
Ok, I was getting confused by all of these constants and RIP-relative offsets. In the example I gave in the issue description, the disassembly corresponds to:
CLIF Disassembly v0 = vconst.i8x16 ...movdqu 0x24(%rip), %xmm0v1 = vconst.i8x16 ...movdqu 0x1c(%rip), %xmm1v2 = swizzle v0, v1paddusb 0x24(%rip), %xmm1'' pshufb %xmm1, %xmm0That lowering of
swizzlecorresponds to:So there are three different constants being referred to here and
paddusbis reading the constant we've already read into%xmm1and adding to it the0x7070...constant. All is well.I was worried about something more dire, so I feel we should close the issue, but you're right that we could deduplicate the constants in
v0andv1.
cfallin closed issue #10719:
See the following existing compilation test:
A constant from the post-function pool is being loaded into
%xmm1, but the instruction that would use itpaddusb, then re-loads it as it writes into%xmm1. Now, themovdquseems correct in one sense: memory addresses to packed instructions must be 128-bit aligned so, when converting anXmmMemAlignedto anXmmMem, Cranelift inserts an unaligned load. But the direct use bypaddusbmight also be correct: we may be aligning the start of the constant pool and so this may be just fine.Either way, something is not quite right, though:
- if the constant pool is _not_ aligned or this can't be communicated to the instruction,
paddusbshould not be using the address directly- if the constant pool _is_ aligned and communicated properly, then the
movdquis redundant and should be removedWhich is it? I'm leaning towards the latter because of sequences like the following:
Those instructions have no trouble accessing the pool by aligned address... or are they incorrect?
cfallin commented on issue #10719:
Ah, great. I think also I missed previously that constants are deduplicated at the CLIF level; writing a little test by hand with two
vconst.i8x16 ABCD...instances shows that they reduce to twovconst.i8x16 const0instructions. So we're doing all we can in the compiler already, and I agree we can close this.
Last updated: Dec 06 2025 at 06:05 UTC