cfallin requested alexcrichton for a review on PR #8113.
cfallin opened PR #8113 from cfallin:fix-xmm-select-load
to bytecodealliance:main
:
…usion.
We have an instruction lowering rule on x86-64 that allows a
select
operation to perform load fusion: when presented with any xmm-register-typed values (f32
,f64
, orv128
), an argument to the select can become a load. Unfortunately, this lowering behavior is incorrect in the case of narrower-than-128-bit values: the cmove is converted into an if-else diamond with two 128-bit moves and so the load becomes a full 128-bit-width load.The fix is to disallow load fusion of selects of XMM-typed values. We could make the rules more fine-grained and keep
v128
-typed load fusion, but we're opting for the simpler and more conservative fix first here.Fixes #8112.
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
cfallin requested elliottt for a review on PR #8113.
cfallin requested wasmtime-compiler-reviewers for a review on PR #8113.
cfallin edited PR #8113.
cfallin edited PR #8113:
We have an instruction lowering rule on x86-64 that allows a
select
operation to perform load fusion: when presented with any xmm-register-typed values (f32
,f64
, orv128
), an argument to the select can become a load. Unfortunately, this lowering behavior is incorrect in the case of narrower-than-128-bit values: the cmove is converted into an if-else diamond with two 128-bit moves and so the load becomes a full 128-bit-width load.The fix is to disallow load fusion of selects of XMM-typed values. We could make the rules more fine-grained and keep
v128
-typed load fusion, but we're opting for the simpler and more conservative fix first here.Fixes #8112.
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
cfallin commented on PR #8113:
We could make the rules more fine-grained and keep v128-typed load fusion
And actually, for posterity, I just realized we can't do this because it makes the load conditional as well (which it can't be, per Wasm trap semantics). What fun! Given that, I think this conservative fix is the most reasonable one and there's no further refinement to do.
elliottt submitted PR review.
cfallin merged PR #8113.
Last updated: Jan 24 2025 at 00:11 UTC