cfallin opened PR #8253 from cfallin:faster-i64x2-vector-construction
to bytecodealliance:main
:
Sometimes, when in the course of silly optimizations to make the most of one's registers, one might want to pack two
i64
s into onev128
, and one might want to do it without any loads or stores.In clang targeting Wasm at least, building an
i64x2
(withwasm_i64x2_make(a, b)
from<wasm_simd128.h>
) will generate (i) ani64x2.splat
to create a new v128 with lane 0's value in both lanes, theni64x2.replace_lane
to put lane 1's value in place. Or, in the case that one of the lanes is zero, it will generate av128.const 0
then insert the other lane.Cranelift's lowerings for both of these patterns on x64 are slightly less optimal than they could be.
For the former (replace-lane of splat), the 64-bit value is moved over to the XMM register, then the rest of the
splat
semantics are implemented by apshufd
(shuffle), even though we're just about to overwrite the only other lane. We could omit that shuffle instead, and everything would work fine.This optimization is specific to
i64x2
(that is, only two lanes): we need to know that the only other lane that thesplat
is splatting into is overwritten. We could in theory match a chain of replace-lane operators for higher-lane-count types, but let's save that for the case that we actually need it later.For the latter (replace-lane of constant zero), the load of a constant zero from the constant pool is the part that bothers me most. While I like zeroed memory as much as the next person, there is a vector XOR instruction right there under our noses, and we'd be silly not to use it. This applies to any
vconst 0
, not just ones that occur as a source to replace-lane.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
cfallin requested wasmtime-compiler-reviewers for a review on PR #8253.
cfallin requested abrown for a review on PR #8253.
fitzgen submitted PR review:
LGTM!
fitzgen merged PR #8253.
Last updated: Nov 22 2024 at 16:03 UTC