github-actions[bot] commented on Issue #1656:
Subscribe to Label Action
cc @bnjbvr
<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64"Thus the following users have been cc'd because of the following labels:
- bnjbvr: cranelift
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
julian-seward1 commented on Issue #1656:
Nice, and looks useful. This is definitely something we should have. There are a couple of areas of concern tho:
Please add a top level description of perhaps a paragraph, of roughly how the transformation works:
what the three passes do
what the running state is (including the relationship between
remove
,remove_list
andinsert
)what the big-O costs are
It wasn't obvious to me without some code reading, that there are three separate passes: one to collect up all constants, one to park them at their use points, and one to delete the sourcing instructions. As a consequence I spent some minutes trying to figure out whether the effects of the module were dependent on the order in which the blocks are visited. Which isn't the case, but that took some reading/inferring.
Another area of not exactly concern, but interest, is how this interacts with my constant-phi-removal pass. I can't guess; we'll have to try it out.
There's also the issue that this might be a bit indiscriminate. It would be trivial to not inline "expensive" constants (for whatever definition of "expensive") simply by not putting them into
insert
. I don't think we want to do that right now, but you add a 1-liner comment to that effect, to make it easy for anyone in future who wants to try that?
bjorn3 commented on Issue #1656:
and one to delete the sourcing instructions
The MachInst backends just skip lowering of side-effect free instructions without any use of their result, right?
julian-seward1 commented on Issue #1656:
@cfallin I see the following (from
joey_big.clif
):
Before:block31: jump block22(v236, v3, v3, v3, v3, v3, v3, v3, /* circa 290 more params, most of which are v3, which is "iconst.i32 0" */)
After:
block31: v53727 = iconst.i32 2 v53728 = iconst.i32 0 v53729 = iconst.i32 0 v53730 = iconst.i32 0 v53731 = iconst.i32 0 (290 more such lines) jump block22(v53727, v53728, v53729, v53730, v53731, v53732, v53733, v53734, ..)
The effect is to generate 290+ moves of zero into a new virtual reg (I checked). We then jump to the phi translation, where they are all copied into block22's argument vregs. The post-allocation effect is naturally enough a tidal wave of spills in
block31
followed by a similar number of reloads and then re-spills in the phi translation.Would it be possible to ensure that the same constant is emitted only once per insn? I think that would remove two out of the three tidal waves here. Even special-casing zero would probably help; I suspect that the upstream LLVM (that produces the wasm) does SRA on a bunch of structs, and then is obliged to pass around some value even for initially uninitialised struct fields, because the wasm semantics require everything to be initialised. And so it chooses zero.
Since arbitrary de-duping is O(N^2), we could possibly get most of the win just by tracking the last (eg) 10 values emitted for the current insn, and emit a new
iconst
if it's not in those 10.
julian-seward1 commented on Issue #1656:
Let me just clarify: this transformation is clearly good. I'm just saying, peering at the output and comparing it with the untransformed version, that it could be even more betterer without much effort.
julian-seward1 commented on Issue #1656:
I also tried to assess how this transformation interacts with constant-phi removal. Here are some numbers. Bear in mind these are static counts. What I take from it is that the two transformations partly but not entirely overlap. And the best sequence is: constant-phi removal first, then const sinking.
joey_small.clif none out: insns: 1996 total, 44 spills, 111 reloads, 20 nopzs phi only out: insns: 1776 total, 6 spills, 77 reloads, 0 nopzs sink only out: insns: 1881 total, 0 spills, 0 reloads, 0 nopzs phi,sink out: insns: 1733 total, 0 spills, 0 reloads, 0 nopzs sink,phi out: insns: 1733 total, 0 spills, 0 reloads, 0 nopzs joey_medium.clif none out: insns: 7061 total, 211 spills, 923 reloads, 302 nopzs phi only out: insns: 5390 total, 66 spills, 439 reloads, 49 nopzs sink only out: insns: 6911 total, 157 spills, 566 reloads, 214 nopzs phi,sink out: insns: 5336 total, 57 spills, 133 reloads, 39 nopzs sink,phi out: insns: 5336 total, 57 spills, 133 reloads, 39 nopzs joey_big.clif none out: insns: 126895 total, 13158 spills, 15417 reloads, 72954 nopzs phi only out: insns: 111105 total, 10067 spills, 12242 reloads, 67247 nopzs sink only out: insns: 124471 total, 12009 spills, 10458 reloads, 77428 nopzs phi,sink out: insns: 107553 total, 8364 spills, 8940 reloads, 70251 nopzs sink,phi out: insns: 118704 total, 10579 spills, 9029 reloads, 76691 nopzs
cfallin commented on Issue #1656:
Closing this as the problem is solved now in a better way by #1718.
Last updated: Dec 23 2024 at 12:05 UTC