elliottt opened PR #8327 from elliottt:trevor/eagerly-reserve-tail-call-arg-space
to bytecodealliance:main
:
Rebased on #8319
Instead of explicitly growing and shrinking the stack argument area for a tail call, resize it to the maximum needed in the function prologue, and shrink when a tail call happens.
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
elliottt edited PR #8327:
Rebased on #8319
Instead of explicitly growing and shrinking the stack argument area for a tail call, resize it to the maximum needed in the function prologue, and shrink when a tail call happens.
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
jameysharp submitted PR review.
jameysharp created PR review comment:
If we switch
IncomingArg
to holding offsets from the highest address in the incoming argument area, instead of the lowest address, then we don't need to also store the size of the incoming argument area with it. Finalizing such an amode then amounts to finding the highest address in the stack frame (e.g.rbp
plus thesetup_area_size
andstack_args_size
from the frame layout), then subtracting the offset that's inIncomingArg
.
elliottt updated PR #8327.
elliottt updated PR #8327.
elliottt updated PR #8327.
elliottt updated PR #8327.
elliottt updated PR #8327.
jameysharp has marked PR #8327 as ready for review.
jameysharp requested wasmtime-compiler-reviewers for a review on PR #8327.
jameysharp requested abrown for a review on PR #8327.
elliottt edited PR #8327:
Instead of explicitly growing and shrinking the stack argument area for a tail call, resize it to the maximum needed in the function prologue, and shrink when a tail call happens.
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
elliottt requested fitzgen for a review on PR #8327.
elliottt edited PR #8327:
Instead of explicitly growing and shrinking the incoming stack argument area for a tail call, resize it to the maximum needed in the function prologue. This ensures that we have enough space for any tail call present within the function body, however if the tail-callee requires fewer stack arguments we shrink the incoming argument area before jumping to it.
This change is currently only implemented for the x64 backend, but does greatly simplify the generated code as we no longer need to move the contents of the frame when resizing the incoming argument area. Accordingly, we've deleted the
ShrinkArgumentArea
andGrowArgumentArea
pseudo-instructions, as they're no longer necessary.Co-authored-by: Jamey Sharp <jsharp@fastly.com>
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
elliottt updated PR #8327.
elliottt edited PR #8327:
return_call
instructions reuse the incoming argument area of the caller's frame. As such if the caller's incoming argument area is not exactly the right size for the callee, some resizing will need to take place to ensure that the return address, frame pointer, clobbers, and stack slots don't get overwritten. The current solution on the main branch for the x64 backend is to explicitly resize the frame viaShrinkArgumentArea
orGrowArgumentArea
right before thereturn_call
arguments are written to the stack, ensuring that there is sufficient space. While this does work, it does make areturn_call
more expensive when the resizing is necessary.To simplify this, we instead resize the incoming argument area in the function prologue to accommodate the largest possible argument area of any
return_call
instruction in the function. We then shrink back down when necessary before an individualreturn_call
. This simplifies the implementation of tail calls on x86_64, as we no longer need to move the entire frame, just the return address before we jump to the tail-callee.Co-authored-by: Jamey Sharp <jsharp@fastly.com>
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
elliottt updated PR #8327.
elliottt submitted PR review.
elliottt created PR review comment:
We may need to add this to
offset_upward_to_caller_sp
in the unwind info below.
fitzgen submitted PR review:
Nice!
fitzgen submitted PR review:
Nice!
fitzgen created PR review comment:
:tada:
fitzgen created PR review comment:
Definitely not a blocker for landing this PR, but it would be nice if we could get this to be a single
sub $rsp, M+N
followed bymov temp, [$rsp + OLD_RA_OFFSET]; mov [$rsp + NEW_RA_OFFSET]
and similar for the saved FP, instead of an initial sub to establish the frame followed by a second sub for tail call arguments space.
elliottt submitted PR review.
elliottt created PR review comment:
We're experimenting with that on a different branch based on this PR, as a follow-up. Moving the stack check before the frame setup means that we can cut out even more instructions, as there's no need to move the frame pointer as well.
elliottt updated PR #8327.
elliottt updated PR #8327.
elliottt updated PR #8327.
elliottt merged PR #8327.
Last updated: Jan 24 2025 at 00:11 UTC