alexcrichton opened PR #7098 from alexcrichton:x64-int-to-float
to bytecodealliance:main
:
This commit is a result of the investigation on #7085. The int-to-float conversion instructions used right now on the x64 backend will implicitly source the upper bits of the result from a different register. This implicitly creates a dependency on further consumers using the conversion result on whatever previously defined the upper bits, even though they aren't used. This false dependency is the primary reason for the slowdown witnessed in #7085.
The fix chosen in this commit is to model the int-to-float instructions with a new shape of instruction instead of the previous
GprToXmm{,Vex}
. This previous shape was modeled as single-input and single-output, but this does not reflect the actual nature of thecvtsi2s{s,d}
instructions. Instead these now useCvtIntToFloat{,Vex}
which have two source operands and one destination operand, modeling how the upper bits of a different register are used. In lowerings using this instruction the upper bits to preserver are always sourced from a zero'd out register to force breaking dependencies between instructions.Closes #7085
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
alexcrichton requested cfallin for a review on PR #7098.
alexcrichton requested wasmtime-compiler-reviewers for a review on PR #7098.
alexcrichton updated PR #7098.
fitzgen submitted PR review:
This was a fun one!
alexcrichton updated PR #7098.
alexcrichton has enabled auto merge for PR #7098.
cfallin has disabled auto merge for PR #7098.
cfallin submitted PR review:
LGTM with changes; I think some of the regalloc metadata (or uses of it) is out-of-order.
cfallin submitted PR review:
LGTM with changes; I think some of the regalloc metadata (or uses of it) is out-of-order.
cfallin created PR review comment:
I think these might be flipped -- the order in the regalloc metadata method below is src1, dst, src2 (since dst is a reuse-def)?
cfallin created PR review comment:
I think these are flipped -- it won't matter here since they must be equal but we should match the order in the metadata method regardless.
cfallin created PR review comment:
We could note here that a mid-end rule could rewrite
fcvt_from_sint
and friends to an x86-specific opcode that uses a zero constant, then that would be subject to normal LICM and whatnot.
cfallin created PR review comment:
I guess the reason we can't use an existing format above is that the sources are Xmm and GprMem rather than, say, Xmm/XmmMem or Gpr/GprMem? Could we have a comment here noting that?
alexcrichton updated PR #7098.
alexcrichton has enabled auto merge for PR #7098.
alexcrichton merged PR #7098.
Last updated: Jan 24 2025 at 00:11 UTC