Stream: git-wasmtime

Topic: wasmtime / issue #10719 Cranelift: deduplicate 128-bit (a...


view this post on Zulip Wasmtime GitHub notifications bot (May 02 2025 at 23:57):

cfallin edited issue #10719:

See the following existing compilation test:

https://github.com/bytecodealliance/wasmtime/blob/e0431281326eb13b45af0d6dd7829304e18581d4/cranelift/filetests/filetests/isa/x64/simd-lane-access-compile.clif#L134-L137

A constant from the post-function pool is being loaded into %xmm1, but the instruction that would use it paddusb, then re-loads it as it writes into %xmm1. Now, the movdqu seems correct in one sense: memory addresses to packed instructions must be 128-bit aligned so, when converting an XmmMemAligned to an XmmMem, Cranelift inserts an unaligned load. But the direct use by paddusb might also be correct: we may be aligning the start of the constant pool and so this may be just fine.

Either way, something is not quite right, though:

Which is it? I'm leaning towards the latter because of sequences like the following:

https://github.com/bytecodealliance/wasmtime/blob/e0431281326eb13b45af0d6dd7829304e18581d4/cranelift/filetests/filetests/isa/x64/simd-pairwise-add.clif#L154-L156

Those instructions have no trouble accessing the pool by aligned address... or are they incorrect?

view this post on Zulip Wasmtime GitHub notifications bot (May 05 2025 at 23:56):

abrown commented on issue #10719:

Ok, I was getting confused by all of these constants and RIP-relative offsets. In the example I gave in the issue description, the disassembly corresponds to:

CLIF Disassembly
v0 = vconst.i8x16 ... movdqu 0x24(%rip), %xmm0
v1 = vconst.i8x16 ... movdqu 0x1c(%rip), %xmm1
v2 = swizzle v0, v1 paddusb 0x24(%rip), %xmm1
'' pshufb %xmm1, %xmm0

That lowering of swizzle corresponds to:

https://github.com/bytecodealliance/wasmtime/blob/1761bc3340438897fd9b8ce0676ab811912347d2/cranelift/codegen/src/isa/x64/lower.isle#L4644-L4646

So there are three different constants being referred to here and paddusb is reading the constant we've already read into %xmm1 and adding to it the 0x7070... constant. All is well.

I was worried about something more dire, so I feel we should close the issue, but you're right that we could deduplicate the constants in v0 and v1.

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2025 at 00:15):

cfallin closed issue #10719:

See the following existing compilation test:

https://github.com/bytecodealliance/wasmtime/blob/e0431281326eb13b45af0d6dd7829304e18581d4/cranelift/filetests/filetests/isa/x64/simd-lane-access-compile.clif#L134-L137

A constant from the post-function pool is being loaded into %xmm1, but the instruction that would use it paddusb, then re-loads it as it writes into %xmm1. Now, the movdqu seems correct in one sense: memory addresses to packed instructions must be 128-bit aligned so, when converting an XmmMemAligned to an XmmMem, Cranelift inserts an unaligned load. But the direct use by paddusb might also be correct: we may be aligning the start of the constant pool and so this may be just fine.

Either way, something is not quite right, though:

Which is it? I'm leaning towards the latter because of sequences like the following:

https://github.com/bytecodealliance/wasmtime/blob/e0431281326eb13b45af0d6dd7829304e18581d4/cranelift/filetests/filetests/isa/x64/simd-pairwise-add.clif#L154-L156

Those instructions have no trouble accessing the pool by aligned address... or are they incorrect?

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2025 at 00:15):

cfallin commented on issue #10719:

Ah, great. I think also I missed previously that constants are deduplicated at the CLIF level; writing a little test by hand with two vconst.i8x16 ABCD... instances shows that they reduce to two vconst.i8x16 const0 instructions. So we're doing all we can in the compiler already, and I agree we can close this.


Last updated: Dec 06 2025 at 06:05 UTC