Stream: cranelift

Topic: abi calculation cleanups


view this post on Zulip bjorn3 (Sep 16 2024 at 17:27):

While trying to fix https://github.com/bytecodealliance/wasmtime/issues/9250, I've found that the abi calculation code is fairly messy with multiple call conv rules being bolted on top in hacky ways. I've done some cleanups in https://github.com/bytecodealliance/wasmtime/pull/9253, but I've got a question:

Expected Results All return values are returned using a single return pointer. Actual Results Return values are put into registers where possible and on the stack for the rest. Versions and Environ...
This also removes the WasmtimeSystemV call conv as it is no longer used anywhere.

view this post on Zulip Chris Fallin (Sep 16 2024 at 17:35):

I'd be happy to see StructArgument go away -- we don't use it in Wasmtime either, so with no major users, there's no reason to keep it IMHO

view this post on Zulip bjorn3 (Sep 16 2024 at 17:52):

Upon closer inspection of the code, it turns out that StructArgument on s390x actually behaves the same as on other architectures, yet the abi document clearly states that pointers should be used for passing large struct arguments. In other words any use of StructArgument is actually incorrect already on s390x.

view this post on Zulip Ulrich Weigand (Sep 16 2024 at 17:59):

Back when I implemented this, it was certainly required to get cg_clif to work correctly, or else I wouldn't have done it. I had written up my finding back then in this issue: https://github.com/bytecodealliance/wasmtime/issues/4565

In implementing support for struct-args and i128 to enable rustc_codegen_cranelift on s390x, I ran into two features that are not currently available with the common abi_impl implementation. Struct...

view this post on Zulip bjorn3 (Sep 16 2024 at 18:02):

In rustc I'm only finding make_indirect() for s390x, not make_indirect_byval(). The former lowers to a pointer argument, while the latter lowers to StructArgument.

view this post on Zulip bjorn3 (Sep 16 2024 at 18:03):

Notice how in https://rust.godbolt.org/z/7rc6To46s there is no byval([1024 x i8]) for the argument, unlike on x86_64.

struct Foo([u8; 1024]); #[no_mangle] extern "C" fn foo(a: Foo) {}

view this post on Zulip bjorn3 (Sep 16 2024 at 18:09):

As far as I can tell the only archs that need StructArgument are x86, x86_64, m68k and xtensa: https://github.com/search?q=repo%3Arust-lang%2Frust%20make_indirect_byval&type=code

view this post on Zulip bjorn3 (Sep 16 2024 at 18:12):

This also means that the arm64 and riscv64 backends are incorrect in the exact same way.

view this post on Zulip Ulrich Weigand (Sep 16 2024 at 18:15):

Yes, we never use the LLVM IR byval attribute with the s390x ABI. If that is indeed the only place where StructArgument is used, we don't need to support that in the backend. (I'm still pretty sure something must have been broken back in 2022 when I implemented StructArgument; this all was part of an effort to get cg_clif working back then - but I don't now recall what exactly that was. Maybe something else changed in between ...)

view this post on Zulip bjorn3 (Sep 16 2024 at 19:09):

bjorn3 said:

Upon closer inspection of the code, it turns out that StructArgument on s390x actually behaves the same as on other architectures, yet the abi document clearly states that pointers should be used for passing large struct arguments. In other words any use of StructArgument is actually incorrect already on s390x.

Never mind, the pointer field on ABIArg::StructArg changes the semantics to make it pass the value using a pointer argument. Still the arm64 and riscv64 backends don't do this. Opened https://github.com/bytecodealliance/wasmtime/pull/9258

Unlike x86_64 and some other architectures where large structs are passed at fixed stack offsets, necessitating special Cranelift support, arm64, riscv64 and s390x pass them using regular pointer a...

view this post on Zulip bjorn3 (Sep 17 2024 at 17:20):

https://github.com/bytecodealliance/wasmtime/pull/3785 moved s390x to use ISLE for emitting calls rather than using CallSite like all other backends. Why was this done? It makes it harder for me to make changes to all backends at once.

This adds infrastructure to allow implementing call and return instructions in ISLE, and migrates the s390x back-end. Not intended to be committed as-is, this will not even compile as it depends on...

view this post on Zulip Chris Fallin (Sep 17 2024 at 17:54):

if I remember right, s390x was the first backend to move completely to ISLE, and it was mostly an artifact of our early experimentation -- we weren't sure which parts we should encode as term rewriting and which we should leave as helper code in Rust

view this post on Zulip Chris Fallin (Sep 17 2024 at 17:55):

there are some nice ergonomic bits around zero/sign extend pattern matching that are done in ISLE, but that's not essential

view this post on Zulip bjorn3 (Sep 17 2024 at 17:55):

I see. Would a move back to CallSite be ok?

view this post on Zulip Chris Fallin (Sep 17 2024 at 17:55):

at this point, it remains that way because it would be a lot of work to transition it to use the common ABI code, but as far as I'm concerned it's not impossible to switch back; @Ulrich Weigand could say more probably

view this post on Zulip bjorn3 (Sep 17 2024 at 19:16):

Opened another PR https://github.com/bytecodealliance/wasmtime/pull/9267 which includes a partial fix for the struct return ABI.

The first commit adds a couple of assertions against misuse of StructReturn. The next two commits are minor cleanups. And the last commit is a partial ABI fix for multi-value returns. Helps with #9250

view this post on Zulip bjorn3 (Sep 19 2024 at 12:58):

Does wasmtime depend on the exact abi of the tail call conv or am I free to change it in Cranelift? And for the winch call conv do I only need to change Cranelift and Winch?

view this post on Zulip fitzgen (he/him) (Sep 19 2024 at 15:06):

Exact details of tail: not really, but it is nice to somewhat "look like" the system ABI in general just to make it so that system tools like perf tend to work okay by default

Winch: I believe so, since the trampolines between the host and wasm are all implemented by Cranelift, so Rust code isn't relying on any particulars here

view this post on Zulip bjorn3 (Sep 19 2024 at 15:09):

Exact details of tail: not really, but it is nice to somewhat "look like" the system ABI in general just to make it so that system tools like perf tend to work okay by default

Great! We currently botch up the abi for multi val returns when they don't all fit in registers. Return values are supposed to either all fit in registers or all be passed using a return area pointer. Currently Cranelift mixes the two. Making it behave differently on the tail/winch call conv vs the system call conv would be awkward. Great to know that I can unify the behavior.

view this post on Zulip bjorn3 (Sep 19 2024 at 19:45):

PR up for just the tail and winch return area ptr register change: https://github.com/bytecodealliance/wasmtime/pull/9287 Still working on the rest of the changes for another PR.

We read every piece of feedback, and take your input very seriously.

Last updated: Dec 23 2024 at 12:05 UTC