elliottt opened PR #6245 from elliottt:trevor/avoid-sret-reg-move
to bytecodealliance:main
:
RA2-0.7.0 removes support for processing program moves, and as such we need to burn down the list of moves present in VCode prior to it being handed of to RA2.
The handling of struct return currently will allocate a fresh VReg in
Lower::new
that will be copied into duringLower::gen_arg_setup
. That new VReg will be used when a return is processed, to ensure that thesret
param is threaded out correctly.If we discover the sret param in
Lower::new
instead, we can avoid allocating the temporary and instead use it directly when processing a return.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
elliottt edited PR #6245:
RA2-0.7.0 removes support for processing program moves, and as such we need to burn down the list of moves present in VCode prior to it being handed of to RA2.
The handling of struct return currently will allocate a fresh VReg in
Lower::new
that will be copied into duringLower::gen_arg_setup
. That new VReg will be used when a return is processed, to ensure that thesret
param is threaded out correctly.If we cache the
sret
param inLower::new
instead, we can avoid allocating the temporary and instead use it directly when processing a return. This has a mixed effect on the filetests: you can see some cases where eagerly introducing the temporary register avoids a move later, and some where avoiding the temporary allocation also avoids a move.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
elliottt updated PR #6245.
elliottt has marked PR #6245 as ready for review.
elliottt requested abrown for a review on PR #6245.
elliottt requested wasmtime-compiler-reviewers for a review on PR #6245.
jameysharp submitted PR review:
I think this is a big improvement! I just have some questions about the details.
jameysharp submitted PR review:
I think this is a big improvement! I just have some questions about the details.
jameysharp created PR review comment:
I don't think the
returns
array needs to be cloned now. I think after the changes in this PR, the loop body only reads fromvcode
.
jameysharp created PR review comment:
Have you identified what's causing this duplicate move (
mov x8, x23
happens twice, with only theblr
in between)?It would also be nice if it weren't just the reverse of an earlier move (
mov x23, x8
), and therefore a no-op, but that was happening before too. So I'm really just wondering about the new duplicate move at the moment.
jameysharp created PR review comment:
I'm inclined to suggest
unwrap
here. I think if we're trying to lower a function that has no entry block, something has gone very wrong.
jameysharp created PR review comment:
This assert is in the middle of an if statement which only runs when the assert is true, I think. So you can delete the assert.
jameysharp created PR review comment:
I can see that this is a pretty direct transplant of the existing logic, but would you mind making this a
zip
instead ofenumerate
plus slice indexing?
elliottt created PR review comment:
Thanks for catching this! Originally I had left it out, but felt paranoid about removing asserts :)
elliottt created PR review comment:
I agree. The other place we use if-let for the entry block is
gen_arg_setup
, perhaps that one should change as well?
elliottt updated PR #6245.
jameysharp created PR review comment:
On further thought, maybe there's an equivalent assert we should add here, that
sret_reg.is_some()
after looping over all the params?
jameysharp created PR review comment:
I was hoping to find that
gen_arg_setup
is dominated by something that panics if there's no entry block. I got as far ascranelift/codegen/src/dominator_tree.rs
wherecompute_postorder
just silently returns without doing anything if there's no entry block. So in that case it looks like we just generate an empty block lowering order and proceed without complaint.Since I can't quickly prove that lowering isn't expected to work when there's no entry block, I'm going to appeal to authority and ask @cfallin if there's some reason we should be supporting that case.
cfallin created PR review comment:
I think that the verifier should verify that there is an entry block? If not, it seems like a reasonable thing to add. I don't think we should expend any unnecessary energy doing special things for empty function bodies; they have no meaning/semantics anyway.
elliottt created PR review comment:
I like that! I'll add that assert in.
elliottt updated PR #6245.
elliottt created PR review comment:
This was an interesting one! The call will clobber the value in
x8
, but we require that value to be live to the end of the function so that we can return it, so RA2 rewrites the input constraint on the call to anany
constraint and moves intox8
right before the call. This ensures that we have a value (x23
) to restorex8
after the call, but does yield a slightly funny result.The move from
x23
intox8
on line 910 could probably be eliminated by the redundant move eliminator, but we could also look into resolving the need for the value duplication on the in-progress overlap support PR...
jameysharp created PR review comment:
That does fully explain this, doesn't it? Yeah, I guess I'm not too worried about an extra move that is:
- before a call
- from a function with an sret parameter
- to another function with an sret parameter
Especially since this sounds like exactly the sort of situation that the RA2 overlapping live ranges work should fix.
elliottt created PR review comment:
(Just for posterity, here's the location in RA2 where this rewrite happens)
https://github.com/bytecodealliance/regalloc2/blob/244ede83638762701831453a43e6011a7dbe0e94/src/ion/liveranges.rs#L557-L616
jameysharp submitted PR review:
Right then, this seems just fine. I think any remaining cleanup can safely wait until more RA2 work is ready, even if that ends up taking a while.
elliottt merged PR #6245.
Last updated: Dec 23 2024 at 13:07 UTC