Stream: git-wasmtime

Topic: wasmtime / PR #8227 cranelift: Delete redundant DCE optim...


view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2024 at 01:04):

jameysharp requested cfallin for a review on PR #8227.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2024 at 01:04):

jameysharp requested wasmtime-compiler-reviewers for a review on PR #8227.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2024 at 01:04):

jameysharp opened PR #8227 from jameysharp:dedup-dce to bytecodealliance:main:

The egraph pass and the dead-code elimination pass both remove instructions whose results are unused. If the optimization level is "none", neither pass runs, and if it's anything else both passes run. I don't think we should do this work twice.

Note that the DCE pass is different than the "eliminate unreachable code" pass, which removes entire blocks that are unreachable from the entry block. That pass might still be necessary.

@cfallin, what do you think? There's some advantage to not calling simplify on instructions that we can quickly prove are unused, but is it worth keeping a redundant pass around for that?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2024 at 03:13):

cfallin commented on PR #8227:

IIRC, the "legacy" passes we call before egraph opt are the ones that are important for compile time, or at least were when egraph opt was introduced. Basically as you said: better not to invoke ISLE at all if code is unused (and egraph's natural DCE comes only during elaboration after that). Would you mind measuring the compile-time impact of this? It's always possible things are better now of course!

view this post on Zulip Wasmtime GitHub notifications bot (May 08 2024 at 00:34):

jameysharp commented on PR #8227:

I finally got back to this. Sightglass says there's no impact on the smaller benchmarks (bz2, pulldown-cmark) in any phase or by any perf counter. On the Spidermonkey benchmark, there's no impact on instructions retired, but when measuring CPU cycles there's a slight but statistically significant slowdown in compilation, and weirdly, a slight speedup in execution.

I'm inclined to think this is pretty much still in the noise and we should go ahead and merge this so we have less code to maintain. What do you think?

compilation :: cpu-cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 2481348.89 ± 2266894.53 (confidence = 99%)

  no-dedup-dce-b545cc508.so is 1.00x to 1.02x faster than dedup-dce-8e9f6f135.so!

  [291935280 305568325.04 334203152] dedup-dce-8e9f6f135.so
  [261612097 303086976.15 312831291] no-dedup-dce-b545cc508.so

execution :: cpu-cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 7533948.79 ± 2750726.94 (confidence = 99%)

  dedup-dce-8e9f6f135.so is 1.00x to 1.01x faster than no-dedup-dce-b545cc508.so!

  [965718291 973510333.10 1006394505] dedup-dce-8e9f6f135.so
  [965415521 981044281.89 1015728504] no-dedup-dce-b545cc508.so

view this post on Zulip Wasmtime GitHub notifications bot (May 13 2024 at 16:03):

cfallin submitted PR review:

Ah, that delta is smaller than I had seen when we initially added the egraph optimizer: I recall compile-time slowdowns on the order of 5% then. It's entirely possible that by optimizing other things or making other changes we've shifted the tradeoffs.

I still think in principle it could be useful to do a separate DCE without the rewriting overhead first, since there might be some other frontend that produces a lot of junk code, but honestly that's a much lower-priority problem IMHO.

Given all this, I'm happy to see this change go in!

view this post on Zulip Wasmtime GitHub notifications bot (May 13 2024 at 16:34):

jameysharp merged PR #8227.


Last updated: Dec 23 2024 at 12:05 UTC