abrown opened Issue #2256:
As suggested by @bnjbvr in https://github.com/bytecodealliance/wasmtime/pull/2248#issuecomment-702627995, we should benchmark whether clearing a register with
PXOR
before emitting the sequence forsplat
will cause a slowdown on x64. Currently, #2248 adds a weird meta-instruction,XmmUninitializedValue
, that tells the register allocator that thedst
register is adef
, not amod
, because the sequence of instructions emitted forsplat
will overwrite all lanes ofdst
.XmmUninitializedValue
is dangerous, though, because we must be very careful to ensure the "overwrite all lanes" invariant holds--it would be preferable to remove it. One way to do so would be to initially emit aPXOR dst, dst
, which the new backend recognizes as adef
. I avoided this in #2248 because of increased code size, potential slowdown, and the fact that the old backend did not have it, but if we find that its emission causes no slowdown, we should add it and removeXmmUninitializedValue
.
abrown labeled Issue #2256:
As suggested by @bnjbvr in https://github.com/bytecodealliance/wasmtime/pull/2248#issuecomment-702627995, we should benchmark whether clearing a register with
PXOR
before emitting the sequence forsplat
will cause a slowdown on x64. Currently, #2248 adds a weird meta-instruction,XmmUninitializedValue
, that tells the register allocator that thedst
register is adef
, not amod
, because the sequence of instructions emitted forsplat
will overwrite all lanes ofdst
.XmmUninitializedValue
is dangerous, though, because we must be very careful to ensure the "overwrite all lanes" invariant holds--it would be preferable to remove it. One way to do so would be to initially emit aPXOR dst, dst
, which the new backend recognizes as adef
. I avoided this in #2248 because of increased code size, potential slowdown, and the fact that the old backend did not have it, but if we find that its emission causes no slowdown, we should add it and removeXmmUninitializedValue
.
Last updated: Oct 23 2024 at 20:03 UTC