wasmtime / PR #11581 fix: accurate leaf detection · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #11581 fix: accurate leaf detection

Wasmtime GitHub notifications bot (Sep 01 2025 at 11:26):

pnodet opened PR #11581 from pnodet:pnodet-2 to bytecodealliance:main:

ensures that Cranelift's frame pointer optimization and stack checks are applied correctly based on the actual presence of call instructions in the generated machine code, rather than potentially inaccurate heuristics based on IR-level information.

- Scanning actual VCode instructions for calls
- Detecting all call types (direct, indirect, tail calls, libcalls)
- Using this for frame pointer optimization decisions
- Properly handling stack check insertion

Wasmtime GitHub notifications bot (Sep 01 2025 at 11:27):

pnodet edited PR #11581:

ensures that Cranelift's frame pointer optimization and stack checks are applied correctly based on the actual presence of call instructions in the generated machine code, rather than potentially inaccurate heuristics based on IR-level information.

- Scanning actual VCode instructions for calls
- Detecting all call types (direct, indirect, tail calls, libcalls)
- Using this for frame pointer optimization decisions
- Properly handling stack check insertion

Fix #11573

Wasmtime GitHub notifications bot (Sep 01 2025 at 11:27):

pnodet updated PR #11581.

Wasmtime GitHub notifications bot (Sep 01 2025 at 11:27):

pnodet has marked PR #11581 as ready for review.

Wasmtime GitHub notifications bot (Sep 01 2025 at 11:27):

pnodet requested wasmtime-compiler-reviewers for a review on PR #11581.

Wasmtime GitHub notifications bot (Sep 01 2025 at 11:28):

pnodet requested alexcrichton for a review on PR #11581.

Wasmtime GitHub notifications bot (Sep 01 2025 at 11:31):

bjorn3 created PR review comment:

Does this function even still need to exist?

Wasmtime GitHub notifications bot (Sep 01 2025 at 11:31):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Sep 01 2025 at 11:35):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Sep 01 2025 at 11:35):

bjorn3 created PR review comment:

self.insts.iter().any(|inst| inst.is_call())?

Wasmtime GitHub notifications bot (Sep 01 2025 at 11:50):

pnodet submitted PR review.

Wasmtime GitHub notifications bot (Sep 01 2025 at 11:50):

pnodet created PR review comment:

Indeed, looks cleaner, will change.

Wasmtime GitHub notifications bot (Sep 01 2025 at 11:50):

pnodet updated PR #11581.

Wasmtime GitHub notifications bot (Sep 01 2025 at 11:51):

pnodet submitted PR review.

Wasmtime GitHub notifications bot (Sep 01 2025 at 11:51):

pnodet created PR review comment:

I let @alexcrichton weigh in on this

Wasmtime GitHub notifications bot (Sep 01 2025 at 11:52):

pnodet edited PR review comment.

Wasmtime GitHub notifications bot (Sep 01 2025 at 21:47):

cfallin submitted PR review:

Thanks for working on this! Leaving a few thoughts below:

Wasmtime GitHub notifications bot (Sep 01 2025 at 21:47):

cfallin created PR review comment:

Rather than use a mutable setter pattern with action-at-a-distance plumbing, let's compute the property directly from the ABI code. We already make a pass to find clobbers and so we should be able to detect this at the same time.

Wasmtime GitHub notifications bot (Sep 01 2025 at 21:47):

cfallin created PR review comment:

A few things:

We want to compute this during the clobber scan in ABI code, so let's not do a separate scan;

Providing an accessor method like this is (IMHO) slightly un-idiomatic, in that it hides a potentially expensive scan behind something that looks like a simple feature/flag test;

If we do add a helper, (tiny minor nit but) putting it right at the top of the VCode methods, before fn new, is probably not the place to do it to make the file read nicely.

Wasmtime GitHub notifications bot (Sep 01 2025 at 21:47):

cfallin created PR review comment:

Thanks for adding these tests, but (i) they all appear to be testing is_leaf, which we want to remove, per above; (ii) they don't actually test the ABI behavior, which is visible only in the emitted code.

Could you add some compile filetests instead, showing disassemblies with and without frame-pointer setup? In particular I'd expect the case where I commented and asked for #11573 to be filed to revert back to no-frame-pointer.

Wasmtime GitHub notifications bot (Sep 01 2025 at 21:47):

cfallin created PR review comment:

(I'm not Alex, but I noted this issue originally and suggested that he file #11573 here)

IMHO: we should remove this method -- firstly, we don't need it internally, and secondly, it's nearly impossible to use it for anything useful externally because of the caveats we have found. An API that looks "almost useful" but provides misleading or inaccurate results with both false positives and false negatives is a footgun shouldn't exist.

Wasmtime GitHub notifications bot (Sep 01 2025 at 21:48):

cfallin edited PR review comment.

Wasmtime GitHub notifications bot (Sep 01 2025 at 21:53):

pnodet commented on PR #11581:

Thanks for the suggestions @cfallin
If it's okay with you all I'll wait for a decision on wether is_leaf should be removed or improved before implementing more changes

Wasmtime GitHub notifications bot (Sep 01 2025 at 21:58):

cfallin commented on PR #11581:

If it's okay with you all I'll wait for a decision on wether is_leaf should be removed or improved before implementing more changes

As noted in my code review, please remove it, thanks!

Wasmtime GitHub notifications bot (Sep 01 2025 at 22:04):

cfallin requested cfallin for a review on PR #11581.

Wasmtime GitHub notifications bot (Sep 01 2025 at 22:05):

cfallin edited a comment on PR #11581:

If it's okay with you all I'll wait for a decision on wether is_leaf should be removed or improved before implementing more changes

As noted in my code review, please remove it, thanks!

(In more detail: I am a core reviewer on Cranelift too and originally asked for the issue to be filed, so I'm happy to take the review; there isn't any other "decision" process, but you're welcome to raise any other concerns or, in the limit, take it to the Cranelift weekly meeting)

Wasmtime GitHub notifications bot (Sep 01 2025 at 22:26):

pnodet updated PR #11581.

Wasmtime GitHub notifications bot (Sep 01 2025 at 22:45):

pnodet updated PR #11581.

Wasmtime GitHub notifications bot (Sep 01 2025 at 22:53):

pnodet updated PR #11581.

Wasmtime GitHub notifications bot (Sep 01 2025 at 23:01):

pnodet updated PR #11581.

Wasmtime GitHub notifications bot (Sep 02 2025 at 08:17):

pnodet updated PR #11581.

Wasmtime GitHub notifications bot (Sep 02 2025 at 08:19):

pnodet updated PR #11581.

Wasmtime GitHub notifications bot (Sep 02 2025 at 08:43):

pnodet updated PR #11581.

Wasmtime GitHub notifications bot (Sep 02 2025 at 08:46):

pnodet edited PR #11581:

## Problem

The previous leaf function detection in FunctionStencil::is_leaf() was overly conservative. It would incorrectly mark functions as non-leaf if they:

- Had any function signatures defined (even if unused)
- Referenced any TLS global values (even without calling)

This conservative approach prevented valid leaf functions from benefiting from frame pointer optimizations, particularly affecting performance when preserve_frame_pointers=false.

## Solution

This PR implements accurate leaf function detection by analyzing the actual generated machine code during the VCode register allocation phase. The new approach:

1. Adds is_call() method to MachInst trait - Each architecture implementation now identifies its call instructions (including indirect calls and tail calls)

2. Detects calls during register allocation - The compute_clobbers() method in VCode now tracks whether any actual call instructions exist

3. Removes the conservative is_leaf() method - The old method that made assumptions based on IR-level information is replaced

4. Provides comprehensive test coverage - Tests verify various scenarios including:
 - True leaf functions (correctly identified as leaf)
 - Functions with unused signatures (correctly identified as leaf)
 - Functions with actual calls (correctly identified as non-leaf)
 - Functions taking function addresses (correctly identified as leaf)
Fix #11573

Wasmtime GitHub notifications bot (Sep 02 2025 at 08:46):

pnodet has marked PR #11581 as ready for review.

Wasmtime GitHub notifications bot (Sep 02 2025 at 09:02):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Sep 02 2025 at 09:02):

bjorn3 created PR review comment:

Do we actually need to set up a stack frame if the only call we would do is a tail call? Fair chance the tail call handling doesn't currently handle not having a stack frame, so this doesn't have to be changed in this PR. Just noting a potential optimization opportunity.

Wasmtime GitHub notifications bot (Sep 02 2025 at 09:04):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Sep 02 2025 at 09:04):

bjorn3 created PR review comment:

How hard would it be to return this value instead of mutating the Callee?

Wasmtime GitHub notifications bot (Sep 02 2025 at 09:06):

pnodet updated PR #11581.

Wasmtime GitHub notifications bot (Sep 02 2025 at 09:12):

pnodet submitted PR review.

Wasmtime GitHub notifications bot (Sep 02 2025 at 09:12):

pnodet created PR review comment:

@bjorn3 I don't think this would be too much of a hassle, I could see something like

-  let clobbers = self.compute_clobbers(regalloc);
-  self.abi.compute_frame_layout(&self.sigs, regalloc.num_spillslots, clobbers);

+  let (clobbers, is_leaf) = self.compute_clobbers(regalloc);
+  self.abi.set_is_leaf(is_leaf);
+  self.abi.compute_frame_layout(&self.sigs, regalloc.num_spillslots, clobbers);

Wasmtime GitHub notifications bot (Sep 02 2025 at 09:19):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Sep 02 2025 at 09:19):

bjorn3 created PR review comment:

This needs to be considered a call instruction too. As should the blr x3 ; reloc_external Aarch64TlsDescCall u1:0 0 on arm64 and if it doesn't already happen whichever instruction is used for __tls_get_addr calls in the tls_value lowering on other platforms.

Wasmtime GitHub notifications bot (Sep 02 2025 at 09:22):

pnodet submitted PR review.

Wasmtime GitHub notifications bot (Sep 02 2025 at 09:22):

pnodet created PR review comment:

@bjorn3 We could have the vcode analysis distinguish

- Leaf: No calls at all
- Tail-call-only: Only tail calls
- Non-leaf: Has regular calls (with or without tail calls)

Something like
 fn call_type(&self) -> CallType {
      match self {
          Inst::Call { .. }
          | Inst::CallInd { .. } => CallType::Regular,

          Inst::ReturnCall { .. }
          | Inst::ReturnCallInd { .. } => CallType::TailOnly,

          _ => CallType::None,
      }
  }
so that tail call only functions skip prologue and epilogue?

Wasmtime GitHub notifications bot (Sep 02 2025 at 09:25):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Sep 02 2025 at 09:25):

bjorn3 created PR review comment:

I meant removing set_is_leaf and the is_leaf field of Callee entirely and instead passing is_leaf around to functions that need it. I don't know how much of a hassle that would be.

Wasmtime GitHub notifications bot (Sep 02 2025 at 09:28):

pnodet updated PR #11581.

Wasmtime GitHub notifications bot (Sep 02 2025 at 11:08):

pnodet submitted PR review.

Wasmtime GitHub notifications bot (Sep 02 2025 at 11:08):

pnodet created PR review comment:

I'm wondering if in

https://github.com/bytecodealliance/wasmtime/blob/10d2cbc59d35b14c3027a8542f636c591b2ec96b/cranelift/codegen/src/isa/x64/abi.rs#L937

Should we have something like
-          let setup_area_size = 16; // RBP, return address
+          let setup_area_size = is_leaf {
+              0
+          } else {
+              16 // RBP, return address
+          };

Wasmtime GitHub notifications bot (Sep 02 2025 at 11:08):

pnodet submitted PR review.

Wasmtime GitHub notifications bot (Sep 02 2025 at 11:08):

pnodet created PR review comment:

And also checking flags.preserve_frame_pointers

Wasmtime GitHub notifications bot (Sep 02 2025 at 11:21):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Sep 02 2025 at 11:21):

bjorn3 created PR review comment:

You mean for tail calls? I think those are best left to a separate PR.

Wasmtime GitHub notifications bot (Sep 02 2025 at 12:25):

pnodet updated PR #11581.

Wasmtime GitHub notifications bot (Sep 02 2025 at 12:25):

pnodet requested bjorn3 for a review on PR #11581.

Wasmtime GitHub notifications bot (Sep 02 2025 at 12:25):

pnodet submitted PR review.

Wasmtime GitHub notifications bot (Sep 02 2025 at 12:25):

pnodet created PR review comment:

I made another pass on this

Wasmtime GitHub notifications bot (Sep 02 2025 at 12:27):

pnodet updated PR #11581.

Wasmtime GitHub notifications bot (Sep 02 2025 at 12:29):

pnodet edited PR #11581:

## Problem

The previous leaf function detection in FunctionStencil::is_leaf() was overly conservative. It would incorrectly mark functions as non-leaf if they:

- Had any function signatures defined (even if unused)
- Referenced any TLS global values (even without calling)

## Solution

This PR implements accurate leaf function detection by analyzing the actual generated machine code during the VCode register allocation phase. The new approach:

1. Adds is_call() method to MachInst trait - Each architecture implementation now identifies its call instructions (including indirect calls and tail calls)

2. Detects calls during register allocation - The compute_clobbers() method in VCode now tracks whether any actual call instructions exist

3. Removes the conservative is_leaf() method - The old method that made assumptions based on IR-level information is replaced

Fix #11573

Wasmtime GitHub notifications bot (Sep 02 2025 at 12:34):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Sep 02 2025 at 12:34):

bjorn3 created PR review comment:

            | Inst::MachOTlsGetAddr { .. } => true,
PE/COFF unlike other object file formats directly emits a %fs relative access rather than by emitting a call that may later be relaxed by the linker.

Wasmtime GitHub notifications bot (Sep 02 2025 at 12:36):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Sep 02 2025 at 14:16):

pnodet updated PR #11581.

Wasmtime GitHub notifications bot (Sep 02 2025 at 14:17):

pnodet commented on PR #11581:

@cfallin I've updated this PR thanks to the given insights and to reflect the changes discussed above, would mind giving it another look?

Wasmtime GitHub notifications bot (Sep 02 2025 at 16:18):

cfallin submitted PR review:

Thanks for the updates! Looks generally good now. Just a few small nits below; happy to merge once those are changed.

Wasmtime GitHub notifications bot (Sep 02 2025 at 16:18):

cfallin created PR review comment:

Tiny nit, but our style is usually to end comments with a ..

Wasmtime GitHub notifications bot (Sep 02 2025 at 16:18):

cfallin created PR review comment:

Can we update the header comment here to note that we're including the test for completeness with respect to other ISAs, but that x64 does not yet implement the leaf-function optimization, so frame setup is still present below?

Wasmtime GitHub notifications bot (Sep 02 2025 at 16:18):

cfallin created PR review comment:

Let's call this compute_clobbers_and_leaf for clarity, now that it returns both.

Wasmtime GitHub notifications bot (Sep 02 2025 at 16:22):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Sep 02 2025 at 16:22):

cfallin created PR review comment:

x64 actually doesn't implement the leaf-function optimization at all (it is a bit different than the other ISAs because the call still pushes the return address; that created some complications that I'm forgetting at the moment; in any case, indeed, that is territory for a separate PR).

Wasmtime GitHub notifications bot (Sep 02 2025 at 16:23):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Sep 02 2025 at 16:23):

cfallin created PR review comment:

Let's leave tail call support for a separate PR as well -- it would be nice to have, but the tail-call implementation itself is quite finicky and specific to our frame layout, and would need updating as well; in other words, it's not just a matter of omitting the prologue.

Wasmtime GitHub notifications bot (Sep 02 2025 at 22:04):

pnodet updated PR #11581.

Wasmtime GitHub notifications bot (Sep 02 2025 at 22:06):

pnodet commented on PR #11581:

Thanks for the review. I made the changes you pointed out. I'll let you merge at your own convenience!

Wasmtime GitHub notifications bot (Sep 02 2025 at 22:06):

pnodet updated PR #11581.

Wasmtime GitHub notifications bot (Sep 02 2025 at 22:11):

cfallin submitted PR review:

Thanks!

Wasmtime GitHub notifications bot (Sep 02 2025 at 22:11):

cfallin has enabled auto merge for PR #11581.

Wasmtime GitHub notifications bot (Sep 02 2025 at 22:28):

cfallin commented on PR #11581:

@pnodet it looks like this needs some test expectation updates (use the "bless" environment variable to do this then commit the deltas in a separate commit) -- happy to merge once CI is green.

Wasmtime GitHub notifications bot (Sep 02 2025 at 22:51):

pnodet updated PR #11581.

Wasmtime GitHub notifications bot (Sep 02 2025 at 22:51):

cfallin has disabled auto merge for PR #11581.

Wasmtime GitHub notifications bot (Sep 02 2025 at 22:52):

pnodet commented on PR #11581:

@cfallin Indeed, this was from newly added tests and I hadn't rebased the main branch since

Wasmtime GitHub notifications bot (Sep 03 2025 at 04:37):

cfallin commented on PR #11581:

@pnodet there is a failure when testing this PR on the merge queue because main now has new filetests whose outputs shift with this PR. Could you merge (not rebase) main and then in a separate commit, update filetest outputs?

Wasmtime GitHub notifications bot (Sep 03 2025 at 06:42):

pnodet updated PR #11581.

Wasmtime GitHub notifications bot (Sep 03 2025 at 11:44):

pnodet edited PR #11581:

## Problem

The previous leaf function detection in FunctionStencil::is_leaf() was overly conservative. It would incorrectly mark functions as non-leaf if they:

- Had any function signatures defined (even if unused)
- Referenced any TLS global values (even without calling)

## Solution

This PR implements accurate leaf function detection by analyzing the actual generated machine code during the VCode register allocation phase. The new approach:

1. Adds is_call() method to MachInst trait - Each architecture implementation now identifies its call instructions (including indirect calls and tail calls)

2. Detects calls during register allocation - The compute_clobbers() method in VCode now tracks whether any actual call instructions exist

3. Removes the conservative is_leaf() method - The old method that made assumptions based on IR-level information is replaced

Fixes #11573

Wasmtime GitHub notifications bot (Sep 03 2025 at 16:20):

cfallin merged PR #11581.

Wasmtime GitHub notifications bot (Sep 03 2025 at 16:35):

pnodet commented on PR #11581:

@cfallin @bjorn3 Thanks!

Last updated: Feb 24 2026 at 06:21 UTC