wasmtime / PR #6970 Refactor prolog/epilog generation code · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #6970 Refactor prolog/epilog generation code

Wasmtime GitHub notifications bot (Sep 06 2023 at 16:36):

uweigand opened PR #6970 from uweigand:refactor-prolog-epilog to bytecodealliance:main:

This patch refactors all of the ISA/ABI specific prolog/epilog generation code around the following two ideas:

Separate planning of the function's frame layout from the actual implementation within prolog / epilog code.

No longer overload different purposes (middle-end register tracking, platform-specific details like authorization modes, and pop-stack-on-return) into a single return instruction.

As to 1., the new approach is based around a FrameLayout data structure, which collects all information needed to emit prolog and epilog code, specifically the list of clobbered registers, and the sizes of all areas of the function's stack frame.

This data structure is now computed once, before any code is emitted, and stored in the Callee data structure. ABIs need to implement this via a new compute_frame_layout callback, which gets all data from common code needed to make all decisions around stack layout in one place.

The FrameLayout is then used going forward to answer all questions about frame sizes, and it is passed to all ABI routines involved in prolog / epilog code generation. [ This removes a lot of duplicated calculation, e.g. the list of clobbered registers is now only computed once and re-used everywhere. ]

This in turn allows to reduce the number of distinct callbacks ABIs need to implement, and simplifies common code logic around how and when to call them. In particular, we now only have the following four routines, which are always called in this order:

gen_prologue_frame_setup
gen_clobber_save

gen_clobber_restore
gen_epilogue_frame_restore

The main differences to before are:

frame_setup/restore are now called unconditionally (the target ABI can look in the FrameLayout to detect the case where no frame setup is required and skip whatever it thinks appropriate in that case)

there is no separate gen_prologue_start; if the target needs to do anything here, it can now just do it instead in gen_prologue_frame_setup

common code no longer attempts to emit a return instruction; instead the target can do whatever is necessary/optimal in gen_epilogue_frame_restore

[ In principle we could also just have a single gen_prologue and gen_epilogue callback - I didn't implement this because then all the stack checking / probing logic would have to be moved to target code as well. ]

As to 2., currently targets are required to implement a single
"Ret" return instruction. This is initially used during
register allocation to hold a list of return preg/vreg pairs.
During epilog emission, this is replaced by another copy of
the same "Ret" instruction that now carries various platform
specific data (e.g. authorization modes on aarch64), and is
also overloaded to handle the case where the ABI requires
that a number of bytes are popped during return.

This is a bit unfortunate in that it blows up the size of the instruction data, and also forces targets (that do not have a "ret N" instruction like Intel) into duplicated and possible sub-optimal implementations of stack adjustment during low-level emission of the return instruction.

The new approach separates these concerns. Initially, common code emits a new "Rets" instruction that is completely parallel to the existing "Args", and is used only during register allocation holding the preg/vreg pairs. That instruction -like now- is replaced during epilog emission - but unlike now the replacement is now completely up to the target, which can do whatever it needs in gen_epilogue_frame_restore.

This would typically emit some platform-specific low-level "Ret" instruction instead of the regalloc "Rets". It also allows non-Intel targets to just create a normal (or even optimized) stack adjustment sequence before its low-level "Ret".

[ In particular, on riscv64 pop-stack-before-return currently emits two distinct stack adjustment instructions immediately after one another. These could now be easily merged, but that's not yet done in this patch. ]

No functional change intended on any target.

Wasmtime GitHub notifications bot (Sep 06 2023 at 16:36):

uweigand requested wasmtime-compiler-reviewers for a review on PR #6970.

Wasmtime GitHub notifications bot (Sep 06 2023 at 16:36):

uweigand requested abrown for a review on PR #6970.

Wasmtime GitHub notifications bot (Sep 12 2023 at 16:17):

cfallin requested cfallin for a review on PR #6970.

Wasmtime GitHub notifications bot (Sep 12 2023 at 17:19):

cfallin submitted PR review:

This is excellent -- thank you very much for the simplification!

I was curious initially why we couldn't just have one "pre" hook and one "post" hook -- i.e., why we need to separate the clobber-save/restore step -- but at least for the prologue, I guess the answer is that we insert the stack-check beforehand. For the epilogue the ABI code really does just invoke one hook then the other; but for symmetry it's kind of nice to keep the hooks paired. Does that match your thinking?

In any case, this looks good to merge!

Wasmtime GitHub notifications bot (Sep 12 2023 at 18:21):

uweigand updated PR #6970.

Wasmtime GitHub notifications bot (Sep 12 2023 at 18:38):

cfallin has enabled auto merge for PR #6970.

Wasmtime GitHub notifications bot (Sep 12 2023 at 19:48):

cfallin merged PR #6970.

Last updated: Apr 16 2025 at 16:03 UTC