Stream: git-wasmtime

Topic: wasmtime / PR #11608 feat: optimize frame layout for tail...


view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 22:08):

pnodet opened PR #11608 from pnodet:pnodet-11 to bytecodealliance:main:

Reduce frame size from 16 to 8 bytes for functions that only make tail calls (FunctionCalls::TailOnly). This optimization:

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 22:11):

pnodet commented on PR #11608:

@cfallin What do you think of something like this? I only looked into aarch64 for the moment since other ISAs such as x64 s390x looks quite different and more complex to implement.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 22:11):

cfallin commented on PR #11608:

Unfortunately I don't think this is going to work: the stack pointer has to be 16-aligned, and aarch64 will actually trap if memory accesses occur with a misaligned SP.

Furthermore the savings I would expect is not "only push FP, not LR", but "don't push anything at all if the frame is zero-size". This should be the case for tail-calling functions with. no stack storage (spillslots, stackslots or clobbers) and no outgoing argument space.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 22:26):

pnodet commented on PR #11608:

Don't debuggers rely on frame pointers for stack traces? Could setting the frame size to 0 hurt debugging/unwinding?

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 22:32):

bjorn3 commented on PR #11608:

Debuggers and profilers should handle missing stack frames for leaf functions already. And besides debuggers actually generally use .eh_frame for stack unwinding, only falling back to frame pointers when .eh_frame is not available.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 22:37):

cfallin commented on PR #11608:

Right -- we already omit frame pointers for functions that are truly leaf functions (no calls at all, with no frame storage); this is a common optimization.

In Wasmtime, where we use our own stack-walking logic and unwinder and want simplicity/robustness, we configure Cranelift never to omit frame pointers; so this optimization largely applies to other uses of Cranelift, like bjorn3's cg_clif.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 22:45):

pnodet commented on PR #11608:

Then could it be safe to have something like this?

        // Compute linkage frame size.
        let setup_area_size = if flags.preserve_frame_pointers()
            // The function arguments that are passed on the stack are addressed
            // relative to the Frame Pointer.
            || flags.unwind_info()
            || incoming_args_size > 0
            || clobber_size > 0
            || fixed_frame_storage_size > 0
        {
            16 // FP, LR
        } else {
            match function_calls {
                FunctionCalls::Regular => 16,
                FunctionCalls::None => 0,
-               FunctionCalls::TailOnly => 8,
+               FunctionCalls::TailOnly => 0,
            }
        };

view this post on Zulip Wasmtime GitHub notifications bot (Sep 04 2025 at 22:59):

cfallin commented on PR #11608:

I think you'll want to check the tail args and outgoing args size as well (the other parameters to compute_frame_layout) -- basically, if any part of the frame needs to exist, then we need to do the FP setup even if we only have tail calls.


Last updated: Dec 06 2025 at 07:03 UTC