alexcrichton commented on issue #5986:
Oh I should also mention that on the
meshoptimizer
program this improves performance by 5% for me on one of the benchmarks.
github-actions[bot] commented on issue #5986:
Subscribe to Label Action
cc @cfallin, @fitzgen
<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "isle"Thus the following users have been cc'd because of the following labels:
- cfallin: isle
- fitzgen: isle
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
alexcrichton commented on issue #5986:
Before I merge this one thing I'd like to explore is to try a strategy of "always use
lea
" but then during emission switch to usingadd
if the src/dst registers are the same. Something like anAddOrLea
instruction which, depending on the results of regalloc, emits one of the two instructions. I'm interested to compare that to this performance and it more easily leaves this open in the future to a tweak in regalloc's constraints where something could be specified like "please try to make these two the same"
alexcrichton commented on issue #5986:
Ok I've pushed up a different way of doing this, namely deciding after regalloc whether to use
add
orlea
. The localmeshoptimizer
benchmark shows no loss on the number that went down iflea
is always used, and the number that went up due to usinglea
still went up. Sightglass locally says:compilation :: cycles :: benchmarks/bz2/benchmark.wasm lea.so is 1.05x to 1.06x faster than main.so! execution :: cycles :: benchmarks/bz2/benchmark.wasm lea.so is 1.02x to 1.03x faster than main.so! ution :: cycles :: benchmarks/spidermonkey/benchmark.wasm lea.so is 1.01x to 1.02x faster than main.so! compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm lea.so is 1.00x to 1.01x faster than main.so!
with all others as "no difference"
Given that I'm tempted to go with this strategy instead and leave it open to, in the future, add a form of regalloc constraint or heuristic that attempts to keep the src/dst here in the same register.
Last updated: Dec 23 2024 at 12:05 UTC