saulecabrera edited issue #9732:
The Wasm Tail Call proposal is considered Tier 1 according to Wasmtime's Tiers of support.
Winch currently doesn't support this proposal.
- [ ]
return_call
- [ ]
return_call_indirect
saulecabrera assigned saulecabrera to issue #9732.
MarinPostma commented on issue #9732:
@saulecabrera you can assign me to that :+1:
saulecabrera edited issue #9732:
The Wasm Tail Call proposal is considered Tier 1 according to Wasmtime's Tiers of support.
Winch currently doesn't support this proposal.
- [ ]
return_call
@MarinPostma- [ ]
return_call_indirect
@MarinPostma
MarinPostma commented on issue #9732:
@saulecabrera
Did you have a sketch of a plan on how to implement that already?
I did some homework this weekend, trying to find a path toward tail calls. I think we could do something like described in this post about v8 baseline compiler, or as initially implemented in cranelift, basically:
- set up the frame as if we were about to perform a normal call
memcpy
the frame down to override the caller frame- jump to the callee entry-point
However this is over simplistic. There are many things getting in our way. Here's my understanding of the situation, thus far:
1) The callee and the caller may not require the same amount of stack space for their arguments. In the Winch calling convention, it is the caller that cleans up the stack space for the arguments. This is an issue for tail calls, because the caller doesn't know ahead of time how much stack space will need to be cleaned up when the callee returns. cranelift
TailCall
convention solves that by requiring that the callee cleans-up it's stack space. I suppose there are alternatives to let the caller know dynamically the amount of stack space to pop on cleanup, but in any case, it seems to me that we need a calling convention that supports that.
2) Theret_area
is inherited from the caller, rather and allocated in the caller stack. This is easy to do: we can get it from the caller's initial control frame. (see there). This works, because the callee must have the same return as the caller.
3) We need to emit a jump to the callee, rather than a call, because we must not push a return address, since the state is already set up during the return_call preparation. Winch relies of cranelift to emit relocs for the callee address. The story around converting a function ref to aMachLabel
that can be used for a jump is not entirely clear to me yet. Alternatively, we could pop the FP back into it's register and letcall
push it back (this is what v8 baseline does).
4) Cranelift hasInst::CallReturn*
macro-instruction, but it's not clear to me if we can use it, since it emits code specific to craneliftTail
calling convention.I am glancing over a lot of things here, obviously. My principal concern is with
1)
right now; I don't see a way forward without addressing it.
saulecabrera commented on issue #9732:
Some preliminary context:
- Winch has a single, internal calling convention, which is mostly used between Wasm-to-Wasm calls.
- Non Wasm-to-Wasm calls (e.g., calls to host-defined functions), go through
trampolines, which are currently emitted
through Cranelift. These trampolines convert between different calling conventions.Winch supports other calling conventions (e.g., SysV) used function calls
which don't go through trampolines (calls to built-in function like
f32.abs
)--
Regarding the implementation of tail-calls:
I don't have a one-size-fits-all answer unfortunately, but we did discuss some
ideas in the past, unfortunately they are not officially documented anywhere; in
the next couple of sentences I'll try to do a summary, taking into account your
points above:I suppose there are alternatives to let the caller know dynamically the amount
of stack space to pop on cleanup, but in any case, it seems to me that we need
a calling convention that supports that.Indeed, there have been attempts in other production ready Wasm compilers to avoid having
the callee clean up the stack space used for arguments. Given that (i) for tail
calls it's assumed that we're using the current frame to resolve calls and (ii)
even though that we don't know the exact space needed for arguments, there are
some expectations around the stack pointer position before/after the call, which
could be helpful to derive the correct stack pointer position in the presence of
tail calls and stack arguments. I was hoping to explore this idea, through
a prototype. The main benefit that I see here is minimizing ABI changes.The alternative, as you note, is to have the callee clean up any stack space
allocated for stack arguments. Even though, this seems like the obvious
approach, I would not be comfortable introducing this change without a proper
exploration of the other alternatives; a change like this represents
a fundamental shift in the following fronts:
- ABI
- Frame handling
- Multi-value
In either case, I don't think we need to introduce a different calling
convention, my expectation is that modifying Winch's default calling convention
should suffice.The ret_area is inherited from the caller, rather and allocated in the caller
stack. This is easy to do: we can get it from the caller's initial control
frame. (see there). This works, because the callee must have the same return
as the caller.In principle nothing fundamental should change for multiple return values in
the presence of tail calls. Note that if the callee cleans the space for stack
arguments, some updates will be needed to handle the stack return area.We need to emit a jump to the callee, rather than a call, because we must not
push a return address, since the state is already set up during the
return_call preparation. Winch relies of cranelift to emit relocs for the
callee address. The story around converting a function ref to a MachLabel that
can be used for a jump is not entirely clear to me yet. Alternatively, we
could pop the FP back into it's register and let call push it back (this is
what v8 baseline does).There are many assumptions in this statement, however, if I'm understanding this
statement correctly, I think that relocations and FP/return address handling are
two, largely orthogonal issues that we need to think about regardless.--
In general, I agree that resolving how to handle stack arguments is a
pre-requisite for supporting tail-calls. My expectation is to take an informed
decision based on prototyping, given that there are many trade-offs to consider,
not only around the complexity of the potential ABI changes, but also the
impact of any new changes on:
- Compilation performance
- Runtime performance
- Other non-x86-64 backends
After spending some time thinking a bit more about tail calls, and in
particular, with the last point above (non-x86-64 backends), I'd also recommend
waiting until we can fully run spec tests for aarch64. As I've been working on
trying to get them all passing I've found myself fixing some ABI-related bugs [1]
to ensure that our generated code meets all the aarc64 ABI requirements.
I suspect that if we introduce any new compiler-wide ABI changes, before the
aarch64 backend is fully complete, there's a risk of conflicts down the road,
more concreely around the stack pointer handling and relocations side of things.[1]: https://github.com/bytecodealliance/wasmtime/pull/10146
saulecabrera edited issue #9732:
The Wasm Tail Call proposal is considered Tier 1 according to Wasmtime's Tiers of support.
Winch currently doesn't support this proposal.
- [ ]
return_call
- [ ]
return_call_indirect
Last updated: Feb 28 2025 at 03:10 UTC