alexcrichton opened PR #7818 from alexcrichton:fix-false-dependencies
to bytecodealliance:main
:
This commit takes a stab at #7816 without diving a whole lot into it. I noticed that the loop started with
vcvtss2sd
which is along the same lines as previous false dependencies found earlier in PRs such as #7098. I had forgotten these instructions at the time and meant to go back and touch them up and #7731 has provided sufficient motivation to do so!Locally this takes that test case from 1.6s to 0.4s for me.
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
alexcrichton requested abrown for a review on PR #7818.
alexcrichton requested wasmtime-compiler-reviewers for a review on PR #7818.
alexcrichton updated PR #7818.
alexcrichton updated PR #7818.
github-actions[bot] commented on PR #7818:
Subscribe to Label Action
cc @saulecabrera
<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "winch"Thus the following users have been cc'd because of the following labels:
- saulecabrera: winch
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
fitzgen submitted PR review:
Nice!! Out of curiosity, how did you end up root causing that perf bug to this false dependency?
fitzgen merged PR #7818.
alexcrichton commented on PR #7818:
Ah it was mostly from previous experience. I knew there were a set of instructions in the back of my mind which we still did the "fake the output register as the input" for AVX (e.g. the instructions modified here) and when I ran
perf
over the program the first very hot instruction in a loop wasvcvtss2sd
which I remembered was one of those. To test out I split the dependencies and then the performance improved so I assumed it was the cause.
Last updated: Dec 23 2024 at 12:05 UTC