Stream: git-wasmtime

Topic: wasmtime / PR #6478 Cranelift: Add the ability to pop sta...


view this post on Zulip Wasmtime GitHub notifications bot (May 30 2023 at 22:19):

fitzgen opened PR #6478 from fitzgen:stack-to-pop-in-ret to bytecodealliance:main:

This is necessary for implementing callee-pops calling conventions, as is required for tail calls. This is just a small part of tail calls, and doesn't implement everything, but is a good piece to land on its own so that eventual PR isn't so huge.

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (May 30 2023 at 22:19):

fitzgen requested cfallin for a review on PR #6478.

view this post on Zulip Wasmtime GitHub notifications bot (May 30 2023 at 22:19):

fitzgen requested wasmtime-compiler-reviewers for a review on PR #6478.

view this post on Zulip Wasmtime GitHub notifications bot (May 30 2023 at 22:19):

fitzgen edited PR #6478:

This is necessary for implementing callee-pops calling conventions, as is required for tail calls. This is just a small part of tail calls, and doesn't implement everything, but is a good piece to land on its own so that eventual PR isn't so huge. Just trying to shrink the yak stack.

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (May 30 2023 at 22:30):

cfallin submitted PR review:

Looks great; thanks for splitting this out!

One thought on the pseudo-assembly printing in VCode (for aarch64 but similar thoughts on riscv64); and one request for a bit more in a doc-comment; but otherwise LGTM.

view this post on Zulip Wasmtime GitHub notifications bot (May 30 2023 at 22:30):

cfallin submitted PR review:

Looks great; thanks for splitting this out!

One thought on the pseudo-assembly printing in VCode (for aarch64 but similar thoughts on riscv64); and one request for a bit more in a doc-comment; but otherwise LGTM.

view this post on Zulip Wasmtime GitHub notifications bot (May 30 2023 at 22:30):

cfallin created PR review comment:

Rather than print a pseudo-inst here, could we print e.g. `add sp, sp, #imm ; ret" or similar?

view this post on Zulip Wasmtime GitHub notifications bot (May 30 2023 at 22:30):

cfallin created PR review comment:

"At the same time as returning" is slightly ambiguous: does the pop happen logically just before popping the return address, or just after? (I think after?) A diagram of the stack before-and-after might be helpful actually, if it's not too much trouble to draw one quickly!

view this post on Zulip Wasmtime GitHub notifications bot (May 31 2023 at 00:52):

jameysharp created PR review comment:

The exact sequence of events is target-specific. The return address may not be on the stack at all by the time we get to the return instruction; x86 is the only target we support where ret pops an address off the stack for you. So yes, on x86 this emits an instruction which first pops the return address, and then removes the arguments from the caller's frame. Everywhere else, the return address is in a register by that point. So we can emit separate instructions to adjust the stack pointer and then branch through the appropriate register.

view this post on Zulip Wasmtime GitHub notifications bot (May 31 2023 at 16:25):

fitzgen updated PR #6478.

view this post on Zulip Wasmtime GitHub notifications bot (May 31 2023 at 16:48):

fitzgen updated PR #6478.

view this post on Zulip Wasmtime GitHub notifications bot (May 31 2023 at 16:48):

fitzgen has enabled auto merge for PR #6478.

view this post on Zulip Wasmtime GitHub notifications bot (May 31 2023 at 16:50):

cfallin created PR review comment:

Unresolving -- this could still use a diagram or better description in the comment, I think.

The sequencing question is really about the order of items on the stack: are the additionally-popped items before (below) the return address or after (above)? I think the latter, but I have to think about it to come to that conclusion by reasoning through it; I'd prefer for it to be clear in the comment here.

view this post on Zulip Wasmtime GitHub notifications bot (May 31 2023 at 16:51):

cfallin created PR review comment:

Ah, I see the just-pushed commit does resolve the ambiguity, so I'm happy now!

view this post on Zulip Wasmtime GitHub notifications bot (May 31 2023 at 18:50):

fitzgen updated PR #6478.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 01 2023 at 15:30):

fitzgen updated PR #6478.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 01 2023 at 16:17):

bjorn3 created PR review comment:

Less commonly used instructions from the 16bit x86 era can be slower than writing multiple simpler instructions. Does this also apply to retn or is it just as fast as add rsp, n + ret?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 01 2023 at 16:27):

cfallin created PR review comment:

According to Agner Fog's instruction tables, and picking a random semi-recent microarchitecture (Skylake), ret and ret imm are both 2 µops; the latter has a throughput of 0.5 per cycle, the former 1 per cycle. So it looks like there's a tiny penalty but it's no worse than rearranging the stack manually. (And keep in mind that the popped data is above the return address, so one would need to load that, store it higher up, move rsp, and ret; the store alone in that sequence would kill the perf advantage.)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 01 2023 at 16:28):

fitzgen created PR review comment:

add rsp, n; ret isn't equivalent because the return address must be on top of the stack in both cases. That is, ret n pops the RA and then pops n bytes.

So I don't know if it is actually slower or not, but it is not semantically equivalent, so the point is moot to some degree.

The actual alternative would be something like

mov tmp, [rsp]
add rsp, n
mov [rsp], tmp
ret

I kind of doubt that is any better than ret n, but it is certainly possible. Would be interesting to see what LLVM does here.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 01 2023 at 16:29):

cfallin edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 01 2023 at 16:39):

fitzgen merged PR #6478.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 01 2023 at 16:41):

cfallin created PR review comment:

I did a fun little experiment with a ret imm implementation, a "move the return address and ret" implementation, and a "load the return address, manually pop, and jump-indirect" implementation.

On my system (Zen 2 core, Ryzen 3900X) the first two have identical performance, and the last is 25% faster. Huh!

I sort of suspect what's happening here is:

Anyway it's sort of getting into the weeds and ret imm isn't obviously worse so I think we should keep this until/unless we discover it becoming a bottleneck.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 01 2023 at 21:13):

bjorn3 created PR review comment:

Makes sense. I didn't know that ret imm would move the stack pointer after reading the return address.


Last updated: Dec 23 2024 at 12:05 UTC