pnodet opened PR #11581 from pnodet:pnodet-2 to bytecodealliance:main:
ensures that Cranelift's frame pointer optimization and stack checks are applied correctly based on the actual presence of call instructions in the generated machine code, rather than potentially inaccurate heuristics based on IR-level information.
- Scanning actual VCode instructions for calls
- Detecting all call types (direct, indirect, tail calls, libcalls)
- Using this for frame pointer optimization decisions
- Properly handling stack check insertion<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
pnodet edited PR #11581:
ensures that Cranelift's frame pointer optimization and stack checks are applied correctly based on the actual presence of call instructions in the generated machine code, rather than potentially inaccurate heuristics based on IR-level information.
- Scanning actual VCode instructions for calls
- Detecting all call types (direct, indirect, tail calls, libcalls)
- Using this for frame pointer optimization decisions
- Properly handling stack check insertionFix #11573
pnodet updated PR #11581.
pnodet has marked PR #11581 as ready for review.
pnodet requested wasmtime-compiler-reviewers for a review on PR #11581.
pnodet requested alexcrichton for a review on PR #11581.
bjorn3 created PR review comment:
Does this function even still need to exist?
bjorn3 submitted PR review.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
self.insts.iter().any(|inst| inst.is_call())?
pnodet submitted PR review.
pnodet created PR review comment:
Indeed, looks cleaner, will change.
pnodet updated PR #11581.
pnodet submitted PR review.
pnodet created PR review comment:
I let @alexcrichton weigh in on this
pnodet edited PR review comment.
cfallin submitted PR review:
Thanks for working on this! Leaving a few thoughts below:
cfallin created PR review comment:
Rather than use a mutable setter pattern with action-at-a-distance plumbing, let's compute the property directly from the ABI code. We already make a pass to find clobbers and so we should be able to detect this at the same time.
cfallin created PR review comment:
A few things:
- We want to compute this during the clobber scan in ABI code, so let's not do a separate scan;
- Providing an accessor method like this is (IMHO) slightly un-idiomatic, in that it hides a potentially expensive scan behind something that looks like a simple feature/flag test;
- If we do add a helper, (tiny minor nit but) putting it right at the top of the VCode methods, before
fn new, is probably not the place to do it to make the file read nicely.
cfallin created PR review comment:
Thanks for adding these tests, but (i) they all appear to be testing
is_leaf, which we want to remove, per above; (ii) they don't actually test the ABI behavior, which is visible only in the emitted code.Could you add some compile filetests instead, showing disassemblies with and without frame-pointer setup? In particular I'd expect the case where I commented and asked for #11573 to be filed to revert back to no-frame-pointer.
cfallin created PR review comment:
(I'm not Alex, but I noted this issue originally and suggested that he file #11573 here)
IMHO: we should remove this method -- firstly, we don't need it internally, and secondly, it's nearly impossible to use it for anything useful externally because of the caveats we have found. An API that looks "almost useful" but provides misleading or inaccurate results with both false positives and false negatives is a footgun shouldn't exist.
cfallin edited PR review comment.
pnodet commented on PR #11581:
Thanks for the suggestions @cfallin
If it's okay with you all I'll wait for a decision on wetheris_leafshould be removed or improved before implementing more changes
cfallin commented on PR #11581:
If it's okay with you all I'll wait for a decision on wether is_leaf should be removed or improved before implementing more changes
As noted in my code review, please remove it, thanks!
cfallin requested cfallin for a review on PR #11581.
cfallin edited a comment on PR #11581:
If it's okay with you all I'll wait for a decision on wether is_leaf should be removed or improved before implementing more changes
As noted in my code review, please remove it, thanks!
(In more detail: I am a core reviewer on Cranelift too and originally asked for the issue to be filed, so I'm happy to take the review; there isn't any other "decision" process, but you're welcome to raise any other concerns or, in the limit, take it to the Cranelift weekly meeting)
pnodet updated PR #11581.
pnodet updated PR #11581.
pnodet updated PR #11581.
pnodet updated PR #11581.
pnodet updated PR #11581.
pnodet updated PR #11581.
pnodet updated PR #11581.
pnodet edited PR #11581:
## Problem
The previous leaf function detection in
FunctionStencil::is_leaf()was overly conservative. It would incorrectly mark functions as non-leaf if they:- Had any function signatures defined (even if unused)
- Referenced any TLS global values (even without calling)This conservative approach prevented valid leaf functions from benefiting from frame pointer optimizations, particularly affecting performance when
preserve_frame_pointers=false.## Solution
This PR implements accurate leaf function detection by analyzing the actual generated machine code during the VCode register allocation phase. The new approach:
1. Adds
is_call()method toMachInsttrait - Each architecture implementation now identifies its call instructions (including indirect calls and tail calls)2. Detects calls during register allocation - The
compute_clobbers()method in VCode now tracks whether any actual call instructions exist3. Removes the conservative
is_leaf()method - The old method that made assumptions based on IR-level information is replaced4. Provides comprehensive test coverage - Tests verify various scenarios including:
- True leaf functions (correctly identified as leaf) - Functions with unused signatures (correctly identified as leaf) - Functions with actual calls (correctly identified as non-leaf) - Functions taking function addresses (correctly identified as leaf)Fix #11573
pnodet has marked PR #11581 as ready for review.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Do we actually need to set up a stack frame if the only call we would do is a tail call? Fair chance the tail call handling doesn't currently handle not having a stack frame, so this doesn't have to be changed in this PR. Just noting a potential optimization opportunity.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
How hard would it be to return this value instead of mutating the
Callee?
pnodet updated PR #11581.
pnodet submitted PR review.
pnodet created PR review comment:
@bjorn3 I don't think this would be too much of a hassle, I could see something like
- let clobbers = self.compute_clobbers(regalloc); - self.abi.compute_frame_layout(&self.sigs, regalloc.num_spillslots, clobbers); + let (clobbers, is_leaf) = self.compute_clobbers(regalloc); + self.abi.set_is_leaf(is_leaf); + self.abi.compute_frame_layout(&self.sigs, regalloc.num_spillslots, clobbers);
bjorn3 submitted PR review.
bjorn3 created PR review comment:
This needs to be considered a call instruction too. As should the
blr x3 ; reloc_external Aarch64TlsDescCall u1:0 0on arm64 and if it doesn't already happen whichever instruction is used for__tls_get_addrcalls in thetls_valuelowering on other platforms.
pnodet submitted PR review.
pnodet created PR review comment:
@bjorn3 We could have the vcode analysis distinguish
- Leaf: No calls at all
- Tail-call-only: Only tail calls
- Non-leaf: Has regular calls (with or without tail calls)Something like
fn call_type(&self) -> CallType { match self { Inst::Call { .. } | Inst::CallInd { .. } => CallType::Regular, Inst::ReturnCall { .. } | Inst::ReturnCallInd { .. } => CallType::TailOnly, _ => CallType::None, } }so that tail call only functions skip prologue and epilogue?
bjorn3 submitted PR review.
bjorn3 created PR review comment:
I meant removing
set_is_leafand theis_leaffield ofCalleeentirely and instead passingis_leafaround to functions that need it. I don't know how much of a hassle that would be.
pnodet updated PR #11581.
pnodet submitted PR review.
pnodet created PR review comment:
I'm wondering if in
Should we have something like
- let setup_area_size = 16; // RBP, return address + let setup_area_size = is_leaf { + 0 + } else { + 16 // RBP, return address + };
pnodet submitted PR review.
pnodet created PR review comment:
And also checking flags.preserve_frame_pointers
bjorn3 submitted PR review.
bjorn3 created PR review comment:
You mean for tail calls? I think those are best left to a separate PR.
pnodet updated PR #11581.
pnodet requested bjorn3 for a review on PR #11581.
pnodet submitted PR review.
pnodet created PR review comment:
I made another pass on this
pnodet updated PR #11581.
pnodet edited PR #11581:
## Problem
The previous leaf function detection in
FunctionStencil::is_leaf()was overly conservative. It would incorrectly mark functions as non-leaf if they:- Had any function signatures defined (even if unused)
- Referenced any TLS global values (even without calling)## Solution
This PR implements accurate leaf function detection by analyzing the actual generated machine code during the VCode register allocation phase. The new approach:
1. Adds
is_call()method toMachInsttrait - Each architecture implementation now identifies its call instructions (including indirect calls and tail calls)2. Detects calls during register allocation - The
compute_clobbers()method in VCode now tracks whether any actual call instructions exist3. Removes the conservative
is_leaf()method - The old method that made assumptions based on IR-level information is replacedFix #11573
bjorn3 submitted PR review.
bjorn3 created PR review comment:
| Inst::MachOTlsGetAddr { .. } => true,PE/COFF unlike other object file formats directly emits a
%fsrelative access rather than by emitting a call that may later be relaxed by the linker.
bjorn3 submitted PR review.
pnodet updated PR #11581.
pnodet commented on PR #11581:
@cfallin I've updated this PR thanks to the given insights and to reflect the changes discussed above, would mind giving it another look?
cfallin submitted PR review:
Thanks for the updates! Looks generally good now. Just a few small nits below; happy to merge once those are changed.
cfallin created PR review comment:
Tiny nit, but our style is usually to end comments with a
..
cfallin created PR review comment:
Can we update the header comment here to note that we're including the test for completeness with respect to other ISAs, but that x64 does not yet implement the leaf-function optimization, so frame setup is still present below?
cfallin created PR review comment:
Let's call this
compute_clobbers_and_leaffor clarity, now that it returns both.
cfallin submitted PR review.
cfallin created PR review comment:
x64 actually doesn't implement the leaf-function optimization at all (it is a bit different than the other ISAs because the call still pushes the return address; that created some complications that I'm forgetting at the moment; in any case, indeed, that is territory for a separate PR).
cfallin submitted PR review.
cfallin created PR review comment:
Let's leave tail call support for a separate PR as well -- it would be nice to have, but the tail-call implementation itself is quite finicky and specific to our frame layout, and would need updating as well; in other words, it's not just a matter of omitting the prologue.
pnodet updated PR #11581.
pnodet commented on PR #11581:
Thanks for the review. I made the changes you pointed out. I'll let you merge at your own convenience!
pnodet updated PR #11581.
cfallin submitted PR review:
Thanks!
cfallin has enabled auto merge for PR #11581.
cfallin commented on PR #11581:
@pnodet it looks like this needs some test expectation updates (use the "bless" environment variable to do this then commit the deltas in a separate commit) -- happy to merge once CI is green.
pnodet updated PR #11581.
cfallin has disabled auto merge for PR #11581.
pnodet commented on PR #11581:
@cfallin Indeed, this was from newly added tests and I hadn't rebased the main branch since
cfallin commented on PR #11581:
@pnodet there is a failure when testing this PR on the merge queue because
mainnow has new filetests whose outputs shift with this PR. Could you merge (not rebase)mainand then in a separate commit, update filetest outputs?
pnodet updated PR #11581.
pnodet edited PR #11581:
## Problem
The previous leaf function detection in
FunctionStencil::is_leaf()was overly conservative. It would incorrectly mark functions as non-leaf if they:- Had any function signatures defined (even if unused)
- Referenced any TLS global values (even without calling)## Solution
This PR implements accurate leaf function detection by analyzing the actual generated machine code during the VCode register allocation phase. The new approach:
1. Adds
is_call()method toMachInsttrait - Each architecture implementation now identifies its call instructions (including indirect calls and tail calls)2. Detects calls during register allocation - The
compute_clobbers()method in VCode now tracks whether any actual call instructions exist3. Removes the conservative
is_leaf()method - The old method that made assumptions based on IR-level information is replacedFixes #11573
cfallin merged PR #11581.
pnodet commented on PR #11581:
@cfallin @bjorn3 Thanks!
Last updated: Jan 10 2026 at 02:36 UTC