Stream: git-wasmtime

Topic: wasmtime / PR #1664 Add a transformation pass which remov...


view this post on Zulip Wasmtime GitHub notifications bot (May 06 2020 at 07:17):

julian-seward1 opened PR #1664 from remove-constant-phis to master:

…onstrate

that only one value ever flows. Has been observed to improve generated code
run times by up to 8%. Compilation cost increases by about 0.6%, but up to 7%
total cost has been observed to be saved; iow it can be a significant win in
terms of compilation time, overall.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2020 at 07:17):

julian-seward1 requested cfallin for a review on PR #1664.

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2020 at 16:30):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2020 at 16:30):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2020 at 16:30):

cfallin created PR Review Comment:

tiny, tiny nit: s/arm64/AArch64/ (we did a bulk find/replace during merge and have tried to standardize on the official name as far as we can)

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2020 at 16:30):

cfallin created PR Review Comment:

since AbstractValue is Copy, the params can be self, other: AbstractValue and the dereferences below can be removed.

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2020 at 16:30):

cfallin created PR Review Comment:

We should be able to just add PartialEq, Eq to the #[derive(...)] and get equality automatically, no?

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2020 at 16:30):

cfallin created PR Review Comment:

It sounds like in practice this converges quickly, but I wonder if it might be worth using a worklist-of-blocks-with-changed-input-state algorithm instead (see e.g. the dataflow analysis in the regalloc checker) to avoid a potential worst case where a small subgraph takes a while to converge within a large function body?

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2020 at 16:30):

cfallin created PR Review Comment:

Might be worth a comment here that multi-dest transfers (i.e., branch tables) don't carry parameters in our IR, so we only have to care about SingleDest here. (At least my intuition from most analyses otherwise is that one has to consider every CFG edge, which of course isn't the case here.)

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2020 at 16:30):

cfallin created PR Review Comment:

for absval in state.absvals.values()?

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2020 at 16:30):

cfallin created PR Review Comment:

for (formal, actual) in dst_formals.iter().zip(src_actuals.iter())?

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2020 at 04:54):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2020 at 04:54):

julian-seward1 created PR Review Comment:

I was being paranoid here .. I wanted to be certain that nothing would mess up join and equality operations on the abstract values. Could use derived Eq I suppose.

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2020 at 05:12):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2020 at 05:12):

julian-seward1 created PR Review Comment:

For the liveness analysis in the register allocator, I used a worklist initialised in reverse postorder, plus a mechanism to avoid duplicates in the worklist (== all bells and whistles I could think of). That achieves around 1.8-2.8 evaluations per block on average, for large functions. From those experiments I saw that the by far the most important factor was to initialise the worklist in reverse postorder, and operate it as a queue thereafter.

This one does something simpler, which is to repeatedly visit the blocks in reverse postorder, until there are no changes. Many inputs require 3 passes (3.0 evaluations per block), and the worst I saw, for joey_big.clif, required 4 passes. For the following reasons, I concluded that using a worklist wasn't worth the extra complexity:

One simple thing I could do is to filter out summaries for blocks that have neither formal parameters nor actual parameters in their jumps, before iterating. From what I've seen there are many such blocks and they can't possibly contribute anything to the overall result.

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2020 at 09:34):

julian-seward1 created PR Review Comment:

One simple thing I could do is to filter out summaries for blocks that have neither formal parameters nor actual parameters in their jumps, before iterating.

I tried that. For a couple of large tests, it roughly halved the number of summaries (meaning, from around 1/2 the number of blocks to around 1/4 the number of blocks) but reduced the overall insn count for the pass only by around 3%. And the fixpointing is done over the summaries. So the fixpointing cost must be pretty minimal already.

I'll leave this refinement in the respin patch.

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2020 at 09:34):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2020 at 09:45):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2020 at 09:45):

julian-seward1 created PR Review Comment:

Good point. Added.

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2020 at 09:51):

julian-seward1 edited PR Review Comment.

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2020 at 09:59):

julian-seward1 updated PR #1664 from remove-constant-phis to master:

…onstrate

that only one value ever flows. Has been observed to improve generated code
run times by up to 8%. Compilation cost increases by about 0.6%, but up to 7%
total cost has been observed to be saved; iow it can be a significant win in
terms of compilation time, overall.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2020 at 10:48):

julian-seward1 requested cfallin for a review on PR #1664.

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2020 at 16:50):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2020 at 16:50):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2020 at 16:50):

cfallin created PR Review Comment:

Sounds good -- thanks for the detailed analysis!

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2020 at 16:50):

cfallin created PR Review Comment:

(detached from other thread because this line changed -- related to Eq discussion)

Yup, I'd prefer to stay idiomatic here and remove fn equals, using the derived Eq instead -- if the derived code isn't doing the right thing then we have bigger problems (and others who maintain the code in the future might expect to use == anyway).

view this post on Zulip Wasmtime GitHub notifications bot (May 08 2020 at 07:08):

julian-seward1 updated PR #1664 from remove-constant-phis to master:

…onstrate

that only one value ever flows. Has been observed to improve generated code
run times by up to 8%. Compilation cost increases by about 0.6%, but up to 7%
total cost has been observed to be saved; iow it can be a significant win in
terms of compilation time, overall.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (May 08 2020 at 07:41):

julian-seward1 merged PR #1664.


Last updated: Jan 24 2025 at 00:11 UTC