Is there any work done on indirect branches or tail calls?
I don't think so.
but is it planned to have support for indirect branches and branch target patching support in the future?
For tail calls there is an issue, but for indirect branches there isn't yet as far as I know. Branch target patching may be hard to implement due to interaction with regalloc I think.
seems quite dead issue: https://github.com/bytecodealliance/wasmtime/issues/1065
In general, we intend to support all of Wasm, so standards track proposals will be implemented eventually (at least by the time they are fully accepted) but there is the question of priority and (again, in general) we are focusing on proposals that move us towards the nanoprocess model
+1 on fitzgen's note re: prioritization, though I'll add that if someone wants to look at adding tail calls at the CLIF level, for use by other Cranelift frontends before the Wasm story comes together, I'm happy to point at the relevant ABI code and details, and review PRs
I guess I could try to work on tail calls on CLIF level. I've worked on my own tiny IR with tail calls support and it worked very well
Cool! The most relevant code is here (ABI code shared by x86-64 and aarch64 and specialized as needed): https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/codegen/src/machinst/abi_impl.rs
we'd need new tail-call instructions, I think, and then lower them into some combination of the return epilogue (stackframe cleanup) and call code; I haven't thought through too many details yet
let me know if you've got specific questions or thoughts after poking around...
I have question that is where to look for adding new instruction? I think tail call could be just treated as return_
except with jmp
instruction instead of ret
most of the time.
Instructions are defined in cranelift/codegen/meta/src/shared/instructions.rs
that's more or less the lowering, yes, though there are a lot of details wrt stack frames :-) And actually one thing I didn't mention above is that for call signatures that imply on-stack args or return values, things get a bit messy if we want to maintain constant stack usage with tail-call recursion
And if I understand correctly if we will treat tail call as return then function we're calling should have same return types as current function. I'm also not sure wha to do when functions returns more than two parameters or some parameters is passed through stack? For example windows abi can return only one value through register while SystemV allows returning two values in two registers
yes, so the return-area size will match, so we can just reuse the space that the caller allocated
the args are a bit trickier though: the tail-call's on-stack args space needs to be the same size or smaller than ours
if not, then we are forced to make the tail call with an SP below its location when we were called, which voids the constant stack usage guarantee otherwise usually expected of tail call implementations
I'm not sure how other compilers handle this; needs some thought/research
Hmm -- so gcc seems to just not do TCO when the stack arg space of the callee is larger than the caller's: https://godbolt.org/z/nbGTve (func2 and func3 both tail-call func1; func3 has a large enough stack arg space to reuse while func2 does not)
interestingly, clang seems to not do TCO for func3
I'm gonna try Zig. It has possibility to force tail call
for C/C++ compilers this is a "nice-to-have" optimization, so no problem if we fail, but yeah, for cases where the IR semantics imply we must do TCO, it's an open question
Hmm Zig seems to just throw error when it can't do tail call
What if preallocate stack space for arguments if needed on caller side? This way callee does not have to do any stack manipulation and function that callee tail calls should use this memory?
But this seems to limit how we can call functions. Indirect tail calls is quite impossible this way I believe
That's a bit tricky to require, since if a standard ABI (like System V) is specified, calls may come from code that is outside our control
Another way is to do tail calls platform dependent and make tail_call
return true or false if it emits or not tail call. There should also be way to pass these parameters on heap but I'm entirely not sure how this works
I just took a look at the Wasm proposal (to look ahead a bit to what we will eventually need to satisfy) and I didn't see any requirements wrt argument type or number of arguments of caller and callee. I remember something about an ABI change needed in SpiderMonkey to support fully general tail calls. So if we will need to do something specific for Wasm anyway, where TCO is required by the newly proposed instructions, I think the best thing to do in this case (with a standard ABI like SysV) is to just fall back silently to a non-tail call, as gcc and clang do
Perhaps we could add an attribute on the instruction that requires it to TCO or emit a compile error, but that makes the CLIF validity dependent on ABI details / stack layout, which I am not too fond of
@Dan Gohman perhaps you have more guidance / thoughts on the above? ^^
but that makes the CLIF validity dependent on ABI details / stack layout, which I am not too fond of
Yeah I don't like that too. I would like to use Cranelift in my tracing jit compiler and tail calls is the only way to implement side traces effectively.
if you control the signatures (as you generate the traces) a workaround would be to just ensure that you always have the same number of args, I guess (I dunno how you're mapping traces to functions -- args are live registers/values?)
anyway, yeah, I would say let's start by assuming silent fallback, and we can change if we have a better idea later :-)
Yeah I guess this should work. We also could return boolean from tail_call
function which will say if tail call is sucessful or not.
Going to sleep. Will try to start work on this tomorrow. :)
I'm going in the right direction? I can't get it to built cranelift: <https://github.com/bytecodealliance/wasmtime/compare/main...playXE:cranelift-tailcalls>
thread 'main' panicked at 'Inst return and recipe Op1tail_call must have the same format!', cranelift/codegen/meta/src/cdsl/encodings.rs:154:9
stack backtrace:
It is complaining that the arguments to tail_call
and the required return values of functions can't be the same. You are implementing it for the old x86 backend that will likely be removed somewhere this year. The new backend can be found at codegen/src/isa/x64. This backend uses a completely different framework.
By the way, there is another option for tail calls that wasn't discussed above: you can generate a shim for the current function to call one with a bigger stack frame, so as to get enough space to fit a tail call. This usually isn't worth the effort, but it is important if you have a mutual tail call situation. You can also optimize any calls that you have visibility into to skip the shim, but anything where you have to adhere to the ABI has to call the shim. (I had to implement a compiler with TCO for a class and this is what I did to ensure that mutual tail calls always work.)
Last updated: Dec 23 2024 at 12:05 UTC