Stream: git-wasmtime

Topic: wasmtime / PR #8327 Eagerly reserve tail call stack arg s...


view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 00:08):

elliottt opened PR #8327 from elliottt:trevor/eagerly-reserve-tail-call-arg-space to bytecodealliance:main:

Rebased on #8319

Instead of explicitly growing and shrinking the stack argument area for a tail call, resize it to the maximum needed in the function prologue, and shrink when a tail call happens.

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 00:08):

elliottt edited PR #8327:

Rebased on #8319

Instead of explicitly growing and shrinking the stack argument area for a tail call, resize it to the maximum needed in the function prologue, and shrink when a tail call happens.

Co-authored-by: Jamey Sharp <jsharp@fastly.com>

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 18:14):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 18:14):

jameysharp created PR review comment:

If we switch IncomingArg to holding offsets from the highest address in the incoming argument area, instead of the lowest address, then we don't need to also store the size of the incoming argument area with it. Finalizing such an amode then amounts to finding the highest address in the stack frame (e.g. rbp plus the setup_area_size and stack_args_size from the frame layout), then subtracting the offset that's in IncomingArg.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 18:30):

elliottt updated PR #8327.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 18:37):

elliottt updated PR #8327.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 18:43):

elliottt updated PR #8327.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 18:57):

elliottt updated PR #8327.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 19:59):

elliottt updated PR #8327.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 20:50):

jameysharp has marked PR #8327 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 20:50):

jameysharp requested wasmtime-compiler-reviewers for a review on PR #8327.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 20:50):

jameysharp requested abrown for a review on PR #8327.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 20:52):

elliottt edited PR #8327:

Instead of explicitly growing and shrinking the stack argument area for a tail call, resize it to the maximum needed in the function prologue, and shrink when a tail call happens.

Co-authored-by: Jamey Sharp <jsharp@fastly.com>

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 21:32):

elliottt requested fitzgen for a review on PR #8327.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 21:36):

elliottt edited PR #8327:

Instead of explicitly growing and shrinking the incoming stack argument area for a tail call, resize it to the maximum needed in the function prologue. This ensures that we have enough space for any tail call present within the function body, however if the tail-callee requires fewer stack arguments we shrink the incoming argument area before jumping to it.

This change is currently only implemented for the x64 backend, but does greatly simplify the generated code as we no longer need to move the contents of the frame when resizing the incoming argument area. Accordingly, we've deleted the ShrinkArgumentArea and GrowArgumentArea pseudo-instructions, as they're no longer necessary.

Co-authored-by: Jamey Sharp <jsharp@fastly.com>

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 21:46):

elliottt updated PR #8327.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 21:46):

elliottt edited PR #8327:

return_call instructions reuse the incoming argument area of the caller's frame. As such if the caller's incoming argument area is not exactly the right size for the callee, some resizing will need to take place to ensure that the return address, frame pointer, clobbers, and stack slots don't get overwritten. The current solution on the main branch for the x64 backend is to explicitly resize the frame via ShrinkArgumentArea or GrowArgumentArea right before the return_call arguments are written to the stack, ensuring that there is sufficient space. While this does work, it does make a return_call more expensive when the resizing is necessary.

To simplify this, we instead resize the incoming argument area in the function prologue to accommodate the largest possible argument area of any return_call instruction in the function. We then shrink back down when necessary before an individual return_call. This simplifies the implementation of tail calls on x86_64, as we no longer need to move the entire frame, just the return address before we jump to the tail-callee.

Co-authored-by: Jamey Sharp <jsharp@fastly.com>

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 21:55):

elliottt updated PR #8327.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 22:58):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 22:58):

elliottt created PR review comment:

We may need to add this to offset_upward_to_caller_sp in the unwind info below.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 23:03):

fitzgen submitted PR review:

Nice!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 23:03):

fitzgen submitted PR review:

Nice!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 23:03):

fitzgen created PR review comment:

:tada:

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 23:03):

fitzgen created PR review comment:

Definitely not a blocker for landing this PR, but it would be nice if we could get this to be a single sub $rsp, M+N followed by mov temp, [$rsp + OLD_RA_OFFSET]; mov [$rsp + NEW_RA_OFFSET] and similar for the saved FP, instead of an initial sub to establish the frame followed by a second sub for tail call arguments space.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 23:06):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2024 at 23:06):

elliottt created PR review comment:

We're experimenting with that on a different branch based on this PR, as a follow-up. Moving the stack check before the frame setup means that we can cut out even more instructions, as there's no need to move the frame pointer as well.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 15 2024 at 18:45):

elliottt updated PR #8327.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 15 2024 at 18:47):

elliottt updated PR #8327.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 15 2024 at 18:49):

elliottt updated PR #8327.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 15 2024 at 20:25):

elliottt merged PR #8327.


Last updated: Dec 23 2024 at 12:05 UTC