Stream: git-wasmtime

Topic: wasmtime / issue #5986 x64: Add lea-based lowering for iadd


view this post on Zulip Wasmtime GitHub notifications bot (Mar 10 2023 at 21:23):

alexcrichton commented on issue #5986:

Oh I should also mention that on the meshoptimizer program this improves performance by 5% for me on one of the benchmarks.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 10 2023 at 21:45):

github-actions[bot] commented on issue #5986:

Subscribe to Label Action

cc @cfallin, @fitzgen

<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "isle"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Mar 13 2023 at 14:21):

alexcrichton commented on issue #5986:

Before I merge this one thing I'd like to explore is to try a strategy of "always use lea" but then during emission switch to using add if the src/dst registers are the same. Something like an AddOrLea instruction which, depending on the results of regalloc, emits one of the two instructions. I'm interested to compare that to this performance and it more easily leaves this open in the future to a tweak in regalloc's constraints where something could be specified like "please try to make these two the same"

view this post on Zulip Wasmtime GitHub notifications bot (Mar 15 2023 at 02:10):

alexcrichton commented on issue #5986:

Ok I've pushed up a different way of doing this, namely deciding after regalloc whether to use add or lea. The local meshoptimizer benchmark shows no loss on the number that went down if lea is always used, and the number that went up due to using lea still went up. Sightglass locally says:

compilation :: cycles :: benchmarks/bz2/benchmark.wasm
  lea.so is 1.05x to 1.06x faster than main.so!
execution :: cycles :: benchmarks/bz2/benchmark.wasm
  lea.so is 1.02x to 1.03x faster than main.so!
ution :: cycles :: benchmarks/spidermonkey/benchmark.wasm
  lea.so is 1.01x to 1.02x faster than main.so!
compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm
  lea.so is 1.00x to 1.01x faster than main.so!

with all others as "no difference"

Given that I'm tempted to go with this strategy instead and leave it open to, in the future, add a form of regalloc constraint or heuristic that attempts to keep the src/dst here in the same register.


Last updated: Dec 23 2024 at 12:05 UTC