fitzgen requested wasmtime-compiler-reviewers for a review on PR #9071.
fitzgen requested elliottt for a review on PR #9071.
fitzgen opened PR #9071 from fitzgen:stack-maps-and-liveness-and-loops
to bytecodealliance:main
:
Previously, we were not properly handling back edges. This manifested in values incorrectly being considered not-live inside loop bodies where they definitely were live. Consider the following example:
block0: v0 = needs stack map block1: call foo(v0) call foo(v0) jump block1
We were previously considering
v0
live only for the firstcall foo(v0)
but not the second, because we mistakenly concluded thatv0
would not be used again after that secondcall
. While it won't be used again in this iteration of the loop, it will be used again in the next iteration of the loop.Trevor and I initially tried implementing a clever trick suggested by Chris where, if we know the minimum post-order index of all of a block's transitive predecessors, we can continue to compute liveness in a single pass over the IR. We believed we could compute the minimum predecessor post-order index via dynamic programming. It turns out, however, that approach doesn't provide correct answers out of the box for certain kinds of irreducible control flow, only nearly correct answers, and would require an additional clever fix-up pass afterwards. We deemed this cleverness on cleverness unacceptable.
Instead, Trevor and I opted to implement a worklist algorithm where we process blocks to a fixed-point. This has the advantages of being obviously correct and producing more-precise results. It has the disadvantage of requiring multiple passes over the IR in the face of loops and back edges. Because this analysis is only used when needs-stack-map values are present (i.e. when the function uses GC values) and is otherwise skipped, this additional compile-time overhead is tolerable.
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
fitzgen updated PR #9071.
fitzgen requested cfallin for a review on PR #9071.
cfallin submitted PR review:
Looks great, thanks! Some minor requests for clarification in comments but otherwise good to go.
cfallin created PR review comment:
This is pretty clever and it took me a second to understand why we don't simply avoid pushing duplicates; it's to do with the innermost-first heuristic mentioned above (so we want to process at the last duplicate pushed, not the first), is that right?
It might be worth a comment here explicitly noting that; we push multiple times because in effect we're "hoisting" the point at which we consider a block sooner into the future, and it would be too expensive to actually remove it from deeper in the stack.
(I usually write worklist algorithms with "if already in set, don't add to queue" logic and I've seen it that way more commonly so it's worth noting this trick here!)
cfallin submitted PR review:
Looks great, thanks! Some minor requests for clarification in comments but otherwise good to go.
cfallin created PR review comment:
extra word (successors) at end of sentence?
cfallin created PR review comment:
Do we ever iterate over the live-set and produce code based on its order? It'd be good to confirm (with a comment noting here) that we don't, in order to retain deterministic compiler output.
fitzgen submitted PR review.
fitzgen created PR review comment:
The one place we do iterate over them, it is only after we've copied them into a smallvec and sorted it. But I added a note to the doc comment as well.
fitzgen submitted PR review.
fitzgen created PR review comment:
This is pretty clever and it took me a second to understand why we don't simply avoid pushing duplicates; it's to do with the innermost-first heuristic mentioned above (so we want to process at the _last_ duplicate pushed, not the first), is that right?
Correct, filtering on
push
means we process the least-recently-pushed duplicate entry. Filtering onpop
means we process the most-recently-pushed duplicate entry, which gives us that inner-loops-first behavior that is desirable for minimizing the number of iterations until we reach the fixed-point.Added some more comments; like your "hoisting" wording.
fitzgen updated PR #9071.
fitzgen has enabled auto merge for PR #9071.
fitzgen merged PR #9071.
Last updated: Dec 23 2024 at 12:05 UTC