Stream: general

Topic: Indirect branches?


view this post on Zulip Adel Prokurov (Jan 22 2021 at 13:07):

Is there any work done on indirect branches or tail calls?

view this post on Zulip bjorn3 (Jan 22 2021 at 13:07):

I don't think so.

view this post on Zulip Adel Prokurov (Jan 22 2021 at 14:24):

but is it planned to have support for indirect branches and branch target patching support in the future?

view this post on Zulip bjorn3 (Jan 22 2021 at 16:16):

For tail calls there is an issue, but for indirect branches there isn't yet as far as I know. Branch target patching may be hard to implement due to interaction with regalloc I think.

view this post on Zulip Adel Prokurov (Jan 22 2021 at 18:37):

seems quite dead issue: https://github.com/bytecodealliance/wasmtime/issues/1065

To write a functional language compiler using this IR, tail call eliminations would be desirable. Are there any plans to support this? I couldn't find any details in the docs.

view this post on Zulip fitzgen (he/him) (Jan 22 2021 at 19:22):

In general, we intend to support all of Wasm, so standards track proposals will be implemented eventually (at least by the time they are fully accepted) but there is the question of priority and (again, in general) we are focusing on proposals that move us towards the nanoprocess model

Lin Clark introduces the Bytecode Alliance, and uses Code Cartoon illustrations to share their vision of a WebAssembly ecosystem that is secure by default, fixing cracks in today’s software foundations. ...

view this post on Zulip Chris Fallin (Jan 22 2021 at 19:27):

+1 on fitzgen's note re: prioritization, though I'll add that if someone wants to look at adding tail calls at the CLIF level, for use by other Cranelift frontends before the Wasm story comes together, I'm happy to point at the relevant ABI code and details, and review PRs

view this post on Zulip Adel Prokurov (Jan 22 2021 at 19:31):

I guess I could try to work on tail calls on CLIF level. I've worked on my own tiny IR with tail calls support and it worked very well

view this post on Zulip Chris Fallin (Jan 22 2021 at 19:35):

Cool! The most relevant code is here (ABI code shared by x86-64 and aarch64 and specialized as needed): https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/codegen/src/machinst/abi_impl.rs

Standalone JIT-style runtime for WebAssembly, using Cranelift - bytecodealliance/wasmtime

view this post on Zulip Chris Fallin (Jan 22 2021 at 19:36):

we'd need new tail-call instructions, I think, and then lower them into some combination of the return epilogue (stackframe cleanup) and call code; I haven't thought through too many details yet

view this post on Zulip Chris Fallin (Jan 22 2021 at 19:36):

let me know if you've got specific questions or thoughts after poking around...

view this post on Zulip Adel Prokurov (Jan 22 2021 at 19:44):

I have question that is where to look for adding new instruction? I think tail call could be just treated as return_ except with jmp instruction instead of ret most of the time.

view this post on Zulip Chris Fallin (Jan 22 2021 at 19:45):

Instructions are defined in cranelift/codegen/meta/src/shared/instructions.rs

view this post on Zulip Chris Fallin (Jan 22 2021 at 19:46):

that's more or less the lowering, yes, though there are a lot of details wrt stack frames :-) And actually one thing I didn't mention above is that for call signatures that imply on-stack args or return values, things get a bit messy if we want to maintain constant stack usage with tail-call recursion

view this post on Zulip Adel Prokurov (Jan 22 2021 at 19:47):

And if I understand correctly if we will treat tail call as return then function we're calling should have same return types as current function. I'm also not sure wha to do when functions returns more than two parameters or some parameters is passed through stack? For example windows abi can return only one value through register while SystemV allows returning two values in two registers

view this post on Zulip Chris Fallin (Jan 22 2021 at 19:47):

yes, so the return-area size will match, so we can just reuse the space that the caller allocated

view this post on Zulip Chris Fallin (Jan 22 2021 at 19:48):

the args are a bit trickier though: the tail-call's on-stack args space needs to be the same size or smaller than ours

view this post on Zulip Chris Fallin (Jan 22 2021 at 19:48):

if not, then we are forced to make the tail call with an SP below its location when we were called, which voids the constant stack usage guarantee otherwise usually expected of tail call implementations

view this post on Zulip Chris Fallin (Jan 22 2021 at 19:49):

I'm not sure how other compilers handle this; needs some thought/research

view this post on Zulip Chris Fallin (Jan 22 2021 at 19:57):

Hmm -- so gcc seems to just not do TCO when the stack arg space of the callee is larger than the caller's: https://godbolt.org/z/nbGTve (func2 and func3 both tail-call func1; func3 has a large enough stack arg space to reuse while func2 does not)

__attribute__((noinline)) int func1(int a, int b, int c, int d, int e, int f, int g, int h) { return a + b + c + d + e + f + g + h; } int func2(int a, int b, int c, int d, int e, int f, int g) { return func1(a, b, c, d, e, f, g + 1, g + 1); } int func3(int a, int b, int c, int d, int e, int f, int g, int h) { return func1(a, b, c, d, e, f, g + 1, h + 1); }

view this post on Zulip Chris Fallin (Jan 22 2021 at 19:57):

interestingly, clang seems to not do TCO for func3

view this post on Zulip Adel Prokurov (Jan 22 2021 at 19:57):

I'm gonna try Zig. It has possibility to force tail call

view this post on Zulip Chris Fallin (Jan 22 2021 at 19:58):

for C/C++ compilers this is a "nice-to-have" optimization, so no problem if we fail, but yeah, for cases where the IR semantics imply we must do TCO, it's an open question

view this post on Zulip Adel Prokurov (Jan 22 2021 at 20:04):

Hmm Zig seems to just throw error when it can't do tail call

view this post on Zulip Adel Prokurov (Jan 22 2021 at 20:15):

What if preallocate stack space for arguments if needed on caller side? This way callee does not have to do any stack manipulation and function that callee tail calls should use this memory?

view this post on Zulip Adel Prokurov (Jan 22 2021 at 20:16):

But this seems to limit how we can call functions. Indirect tail calls is quite impossible this way I believe

view this post on Zulip Chris Fallin (Jan 22 2021 at 20:19):

That's a bit tricky to require, since if a standard ABI (like System V) is specified, calls may come from code that is outside our control

view this post on Zulip Adel Prokurov (Jan 22 2021 at 20:20):

Another way is to do tail calls platform dependent and make tail_call return true or false if it emits or not tail call. There should also be way to pass these parameters on heap but I'm entirely not sure how this works

view this post on Zulip Chris Fallin (Jan 22 2021 at 20:21):

I just took a look at the Wasm proposal (to look ahead a bit to what we will eventually need to satisfy) and I didn't see any requirements wrt argument type or number of arguments of caller and callee. I remember something about an ABI change needed in SpiderMonkey to support fully general tail calls. So if we will need to do something specific for Wasm anyway, where TCO is required by the newly proposed instructions, I think the best thing to do in this case (with a standard ABI like SysV) is to just fall back silently to a non-tail call, as gcc and clang do

view this post on Zulip Chris Fallin (Jan 22 2021 at 20:22):

Perhaps we could add an attribute on the instruction that requires it to TCO or emit a compile error, but that makes the CLIF validity dependent on ABI details / stack layout, which I am not too fond of

view this post on Zulip Chris Fallin (Jan 22 2021 at 20:23):

@Dan Gohman perhaps you have more guidance / thoughts on the above? ^^

view this post on Zulip Adel Prokurov (Jan 22 2021 at 20:25):

but that makes the CLIF validity dependent on ABI details / stack layout, which I am not too fond of

Yeah I don't like that too. I would like to use Cranelift in my tracing jit compiler and tail calls is the only way to implement side traces effectively.

view this post on Zulip Chris Fallin (Jan 22 2021 at 20:30):

if you control the signatures (as you generate the traces) a workaround would be to just ensure that you always have the same number of args, I guess (I dunno how you're mapping traces to functions -- args are live registers/values?)

view this post on Zulip Chris Fallin (Jan 22 2021 at 20:30):

anyway, yeah, I would say let's start by assuming silent fallback, and we can change if we have a better idea later :-)

view this post on Zulip Adel Prokurov (Jan 22 2021 at 20:34):

Yeah I guess this should work. We also could return boolean from tail_call function which will say if tail call is sucessful or not.

view this post on Zulip Adel Prokurov (Jan 22 2021 at 20:35):

Going to sleep. Will try to start work on this tomorrow. :)

view this post on Zulip Adel Prokurov (Jan 23 2021 at 00:44):

I'm going in the right direction? I can't get it to built cranelift: <https://github.com/bytecodealliance/wasmtime/compare/main...playXE:cranelift-tailcalls>

  thread 'main' panicked at 'Inst return and recipe Op1tail_call must have the same format!', cranelift/codegen/meta/src/cdsl/encodings.rs:154:9
  stack backtrace:
Standalone JIT-style runtime for WebAssembly, using Cranelift - bytecodealliance/wasmtime

view this post on Zulip bjorn3 (Jan 23 2021 at 06:46):

It is complaining that the arguments to tail_call and the required return values of functions can't be the same. You are implementing it for the old x86 backend that will likely be removed somewhere this year. The new backend can be found at codegen/src/isa/x64. This backend uses a completely different framework.

view this post on Zulip Mario Carneiro (Apr 14 2021 at 09:38):

By the way, there is another option for tail calls that wasn't discussed above: you can generate a shim for the current function to call one with a bigger stack frame, so as to get enough space to fit a tail call. This usually isn't worth the effort, but it is important if you have a mutual tail call situation. You can also optimize any calls that you have visibility into to skip the shim, but anything where you have to adhere to the ABI has to call the shim. (I had to implement a compiler with TCO for a class and this is what I did to ensure that mutual tail calls always work.)


Last updated: Jan 24 2025 at 00:11 UTC