wasmtime / PR #1607 Rework aarch64 stack frame implementa... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #1607 Rework aarch64 stack frame implementa...

Wasmtime GitHub notifications bot (Apr 27 2020 at 21:29):

cfallin opened PR #1607 from aarch64-stack-frame to master:

This PR changes the aarch64 ABI implementation to use positive offsets
from SP, rather than negative offsets from FP, to refer to spill slots
and stack-local storage. This allows for better addressing-mode options,
and hence slightly better code: e.g., the unsigned scaled 12-bit offset
mode can be used to reach anywhere in a 32KB frame without extra
address-construction instructions, whereas negative offsets are limited
to a signed 9-bit unscaled mode (-256 bytes).

To enable this, the PR introduces a notion of "nominal SP offsets" as a
virtual addressing mode, lowered during the emission pass. The offsets
are relative to "SP after adjusting downward to allocate stack/spill
slots", but before pushing clobbers. This allows the addressing-mode
expressions to be generated before register allocation (or during it,
for spill/reload sequences).

To convert these offsets into true offsets from SP, we need to track
how much further SP is moved downward, and compensate for this. We do so
with "virtual SP offset adjustment" pseudo-instructions: these are seen
by the emission pass, and result in no instruction (0 byte output), but
update state that is now threaded through each instruction emission in
turn. In this way, we can push e.g. stack args for a call and adjust
the virtual SP offset, allowing reloads from nominal-SP-relative
spillslots while we do the argument setup with "real SP offsets" at the
same time.

As part of this PR, I also adjust how temporaries are used: I realized
(thanks to a comment in @alexcrichton's stack-limit PR) that x16 and x17
are the "interprocedural veneer temporaries", free to use within a
function body. These are now are spilltmp and "tmp2".

Wasmtime GitHub notifications bot (Apr 27 2020 at 21:29):

cfallin requested bnjbvr for a review on PR #1607.

Wasmtime GitHub notifications bot (Apr 27 2020 at 21:29):

cfallin requested bnjbvr and julian-seward1 for a review on PR #1607.

Wasmtime GitHub notifications bot (Apr 29 2020 at 15:16):

bnjbvr submitted PR Review.

Wasmtime GitHub notifications bot (Apr 29 2020 at 15:16):

bnjbvr created PR Review Comment:

nit: between

Wasmtime GitHub notifications bot (Apr 29 2020 at 15:16):

bnjbvr submitted PR Review.

Wasmtime GitHub notifications bot (Apr 29 2020 at 15:16):

bnjbvr created PR Review Comment:

nit: something is off with this comment?

Wasmtime GitHub notifications bot (Apr 29 2020 at 15:16):

bnjbvr created PR Review Comment:

nit: this is using x17 here

Wasmtime GitHub notifications bot (Apr 29 2020 at 15:17):

bnjbvr requested bnjbvr and julian-seward1 for a review on PR #1607.

Wasmtime GitHub notifications bot (Apr 29 2020 at 16:05):

cfallin updated PR #1607 from aarch64-stack-frame to master:

This PR changes the aarch64 ABI implementation to use positive offsets
from SP, rather than negative offsets from FP, to refer to spill slots
and stack-local storage. This allows for better addressing-mode options,
and hence slightly better code: e.g., the unsigned scaled 12-bit offset
mode can be used to reach anywhere in a 32KB frame without extra
address-construction instructions, whereas negative offsets are limited
to a signed 9-bit unscaled mode (-256 bytes).

To enable this, the PR introduces a notion of "nominal SP offsets" as a
virtual addressing mode, lowered during the emission pass. The offsets
are relative to "SP after adjusting downward to allocate stack/spill
slots", but before pushing clobbers. This allows the addressing-mode
expressions to be generated before register allocation (or during it,
for spill/reload sequences).

To convert these offsets into true offsets from SP, we need to track
how much further SP is moved downward, and compensate for this. We do so
with "virtual SP offset adjustment" pseudo-instructions: these are seen
by the emission pass, and result in no instruction (0 byte output), but
update state that is now threaded through each instruction emission in
turn. In this way, we can push e.g. stack args for a call and adjust
the virtual SP offset, allowing reloads from nominal-SP-relative
spillslots while we do the argument setup with "real SP offsets" at the
same time.

As part of this PR, I also adjust how temporaries are used: I realized
(thanks to a comment in @alexcrichton's stack-limit PR) that x16 and x17
are the "interprocedural veneer temporaries", free to use within a
function body. These are now are spilltmp and "tmp2".

Wasmtime GitHub notifications bot (Apr 30 2020 at 23:21):

cfallin updated PR #1607 from aarch64-stack-frame to master:

This PR changes the aarch64 ABI implementation to use positive offsets
from SP, rather than negative offsets from FP, to refer to spill slots
and stack-local storage. This allows for better addressing-mode options,
and hence slightly better code: e.g., the unsigned scaled 12-bit offset
mode can be used to reach anywhere in a 32KB frame without extra
address-construction instructions, whereas negative offsets are limited
to a signed 9-bit unscaled mode (-256 bytes).

To enable this, the PR introduces a notion of "nominal SP offsets" as a
virtual addressing mode, lowered during the emission pass. The offsets
are relative to "SP after adjusting downward to allocate stack/spill
slots", but before pushing clobbers. This allows the addressing-mode
expressions to be generated before register allocation (or during it,
for spill/reload sequences).

To convert these offsets into true offsets from SP, we need to track
how much further SP is moved downward, and compensate for this. We do so
with "virtual SP offset adjustment" pseudo-instructions: these are seen
by the emission pass, and result in no instruction (0 byte output), but
update state that is now threaded through each instruction emission in
turn. In this way, we can push e.g. stack args for a call and adjust
the virtual SP offset, allowing reloads from nominal-SP-relative
spillslots while we do the argument setup with "real SP offsets" at the
same time.

As part of this PR, I also adjust how temporaries are used: I realized
(thanks to a comment in @alexcrichton's stack-limit PR) that x16 and x17
are the "interprocedural veneer temporaries", free to use within a
function body. These are now are spilltmp and "tmp2".

Wasmtime GitHub notifications bot (May 05 2020 at 09:35):

bnjbvr submitted PR Review.

Wasmtime GitHub notifications bot (May 05 2020 at 09:35):

bnjbvr submitted PR Review.

Wasmtime GitHub notifications bot (May 05 2020 at 09:35):

bnjbvr created PR Review Comment:

stylistic nit: can you invert the order of the operands here, please? it's really an additional offset to the fp_to_arg_offset, and would read less Yoda-like.

Wasmtime GitHub notifications bot (May 05 2020 at 09:35):

bnjbvr created PR Review Comment:

ditto

Wasmtime GitHub notifications bot (May 05 2020 at 09:35):

bnjbvr created PR Review Comment:

Was it intentional to keep the debug! statements here? I imagine they were quite useful during development but would be spamming the console otherwise. (ditto a few times below)

Wasmtime GitHub notifications bot (May 05 2020 at 09:35):

bnjbvr created PR Review Comment:

Can you add a doc-comment link to MemArg::NominalSPOffset, which ought to be the central place for documenting what a nominal-SP is?

ditto twice below

Wasmtime GitHub notifications bot (May 05 2020 at 09:35):

bnjbvr created PR Review Comment:

nit: remove ---- before and after text

Wasmtime GitHub notifications bot (May 05 2020 at 09:35):

bnjbvr created PR Review Comment:

Since this comment will be largely read by everyone who runs into this kind of MemArg, it should be more precise. Can you explain what's above and below (closer to FP and further away from FP, to avoid the possible double interpretation of above/below)? Maybe draw some ASCII art to show a stack example and draw where the nominal SP is?

Wasmtime GitHub notifications bot (May 05 2020 at 09:35):

bnjbvr created PR Review Comment:

ditto

Wasmtime GitHub notifications bot (May 05 2020 at 09:35):

bnjbvr created PR Review Comment:

As said below, I think this comment should be placed closer to MemArg::NominalSPOffset, with a drawing of what it should look like.

Wasmtime GitHub notifications bot (May 05 2020 at 09:35):

bnjbvr created PR Review Comment:

nit: maybe remove this comment? the MemArg enum variant below + above comment make it pretty clear it's an offset from nominal-sp.

Wasmtime GitHub notifications bot (May 05 2020 at 09:35):

bnjbvr created PR Review Comment:

nit: can you name this sp_adjustment and also rename amt into amount?

Wasmtime GitHub notifications bot (May 05 2020 at 09:35):

bnjbvr created PR Review Comment:

preexisting: while you're around, can you make this a doc comment please?

Wasmtime GitHub notifications bot (May 05 2020 at 09:35):

bnjbvr created PR Review Comment:

This comment is less precise than the previous one, can you tweak it to tell what this number includes?

Wasmtime GitHub notifications bot (May 05 2020 at 09:35):

bnjbvr created PR Review Comment:

nit: can you put this before the emit function which uses it, to make reading of this code more linear?

Wasmtime GitHub notifications bot (May 05 2020 at 09:40):

bjorn3 submitted PR Review.

Wasmtime GitHub notifications bot (May 05 2020 at 09:40):

bjorn3 created PR Review Comment:

It can become a trace!.

Wasmtime GitHub notifications bot (May 05 2020 at 17:58):

cfallin updated PR #1607 from aarch64-stack-frame to master:

This PR changes the aarch64 ABI implementation to use positive offsets
from SP, rather than negative offsets from FP, to refer to spill slots
and stack-local storage. This allows for better addressing-mode options,
and hence slightly better code: e.g., the unsigned scaled 12-bit offset
mode can be used to reach anywhere in a 32KB frame without extra
address-construction instructions, whereas negative offsets are limited
to a signed 9-bit unscaled mode (-256 bytes).

To enable this, the PR introduces a notion of "nominal SP offsets" as a
virtual addressing mode, lowered during the emission pass. The offsets
are relative to "SP after adjusting downward to allocate stack/spill
slots", but before pushing clobbers. This allows the addressing-mode
expressions to be generated before register allocation (or during it,
for spill/reload sequences).

To convert these offsets into true offsets from SP, we need to track
how much further SP is moved downward, and compensate for this. We do so
with "virtual SP offset adjustment" pseudo-instructions: these are seen
by the emission pass, and result in no instruction (0 byte output), but
update state that is now threaded through each instruction emission in
turn. In this way, we can push e.g. stack args for a call and adjust
the virtual SP offset, allowing reloads from nominal-SP-relative
spillslots while we do the argument setup with "real SP offsets" at the
same time.

As part of this PR, I also adjust how temporaries are used: I realized
(thanks to a comment in @alexcrichton's stack-limit PR) that x16 and x17
are the "interprocedural veneer temporaries", free to use within a
function body. These are now are spilltmp and "tmp2".

Wasmtime GitHub notifications bot (May 05 2020 at 17:58):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (May 05 2020 at 17:58):

cfallin created PR Review Comment:

Done! Added a diagram in the main module comment for abi.rs; hopefully things are clearer now.

Wasmtime GitHub notifications bot (May 05 2020 at 18:23):

cfallin updated PR #1607 from aarch64-stack-frame to master:

This PR changes the aarch64 ABI implementation to use positive offsets
from SP, rather than negative offsets from FP, to refer to spill slots
and stack-local storage. This allows for better addressing-mode options,
and hence slightly better code: e.g., the unsigned scaled 12-bit offset
mode can be used to reach anywhere in a 32KB frame without extra
address-construction instructions, whereas negative offsets are limited
to a signed 9-bit unscaled mode (-256 bytes).

To enable this, the PR introduces a notion of "nominal SP offsets" as a
virtual addressing mode, lowered during the emission pass. The offsets
are relative to "SP after adjusting downward to allocate stack/spill
slots", but before pushing clobbers. This allows the addressing-mode
expressions to be generated before register allocation (or during it,
for spill/reload sequences).

To convert these offsets into true offsets from SP, we need to track
how much further SP is moved downward, and compensate for this. We do so
with "virtual SP offset adjustment" pseudo-instructions: these are seen
by the emission pass, and result in no instruction (0 byte output), but
update state that is now threaded through each instruction emission in
turn. In this way, we can push e.g. stack args for a call and adjust
the virtual SP offset, allowing reloads from nominal-SP-relative
spillslots while we do the argument setup with "real SP offsets" at the
same time.

As part of this PR, I also adjust how temporaries are used: I realized
(thanks to a comment in @alexcrichton's stack-limit PR) that x16 and x17
are the "interprocedural veneer temporaries", free to use within a
function body. These are now are spilltmp and "tmp2".

Wasmtime GitHub notifications bot (May 05 2020 at 20:26):

cfallin updated PR #1607 from aarch64-stack-frame to master:

This PR changes the aarch64 ABI implementation to use positive offsets
from SP, rather than negative offsets from FP, to refer to spill slots
and stack-local storage. This allows for better addressing-mode options,
and hence slightly better code: e.g., the unsigned scaled 12-bit offset
mode can be used to reach anywhere in a 32KB frame without extra
address-construction instructions, whereas negative offsets are limited
to a signed 9-bit unscaled mode (-256 bytes).

To enable this, the PR introduces a notion of "nominal SP offsets" as a
virtual addressing mode, lowered during the emission pass. The offsets
are relative to "SP after adjusting downward to allocate stack/spill
slots", but before pushing clobbers. This allows the addressing-mode
expressions to be generated before register allocation (or during it,
for spill/reload sequences).

To convert these offsets into true offsets from SP, we need to track
how much further SP is moved downward, and compensate for this. We do so
with "virtual SP offset adjustment" pseudo-instructions: these are seen
by the emission pass, and result in no instruction (0 byte output), but
update state that is now threaded through each instruction emission in
turn. In this way, we can push e.g. stack args for a call and adjust
the virtual SP offset, allowing reloads from nominal-SP-relative
spillslots while we do the argument setup with "real SP offsets" at the
same time.

As part of this PR, I also adjust how temporaries are used: I realized
(thanks to a comment in @alexcrichton's stack-limit PR) that x16 and x17
are the "interprocedural veneer temporaries", free to use within a
function body. These are now are spilltmp and "tmp2".

Wasmtime GitHub notifications bot (May 05 2020 at 20:51):

cfallin updated PR #1607 from aarch64-stack-frame to master:

This PR changes the aarch64 ABI implementation to use positive offsets
from SP, rather than negative offsets from FP, to refer to spill slots
and stack-local storage. This allows for better addressing-mode options,
and hence slightly better code: e.g., the unsigned scaled 12-bit offset
mode can be used to reach anywhere in a 32KB frame without extra
address-construction instructions, whereas negative offsets are limited
to a signed 9-bit unscaled mode (-256 bytes).

To enable this, the PR introduces a notion of "nominal SP offsets" as a
virtual addressing mode, lowered during the emission pass. The offsets
are relative to "SP after adjusting downward to allocate stack/spill
slots", but before pushing clobbers. This allows the addressing-mode
expressions to be generated before register allocation (or during it,
for spill/reload sequences).

To convert these offsets into true offsets from SP, we need to track
how much further SP is moved downward, and compensate for this. We do so
with "virtual SP offset adjustment" pseudo-instructions: these are seen
by the emission pass, and result in no instruction (0 byte output), but
update state that is now threaded through each instruction emission in
turn. In this way, we can push e.g. stack args for a call and adjust
the virtual SP offset, allowing reloads from nominal-SP-relative
spillslots while we do the argument setup with "real SP offsets" at the
same time.

As part of this PR, I also adjust how temporaries are used: I realized
(thanks to a comment in @alexcrichton's stack-limit PR) that x16 and x17
are the "interprocedural veneer temporaries", free to use within a
function body. These are now are spilltmp and "tmp2".

Wasmtime GitHub notifications bot (May 06 2020 at 13:12):

bnjbvr submitted PR Review.

Wasmtime GitHub notifications bot (May 06 2020 at 13:12):

bnjbvr submitted PR Review.

Wasmtime GitHub notifications bot (May 06 2020 at 13:12):

bnjbvr created PR Review Comment:

This is actually unused in this PR, so it could be removed? (unless you plan to use it soon later)

(pub(crate) would show you a warning if it's effectively unused)

Wasmtime GitHub notifications bot (May 06 2020 at 16:24):

cfallin updated PR #1607 from aarch64-stack-frame to master:

This PR changes the aarch64 ABI implementation to use positive offsets
from SP, rather than negative offsets from FP, to refer to spill slots
and stack-local storage. This allows for better addressing-mode options,
and hence slightly better code: e.g., the unsigned scaled 12-bit offset
mode can be used to reach anywhere in a 32KB frame without extra
address-construction instructions, whereas negative offsets are limited
to a signed 9-bit unscaled mode (-256 bytes).

To enable this, the PR introduces a notion of "nominal SP offsets" as a
virtual addressing mode, lowered during the emission pass. The offsets
are relative to "SP after adjusting downward to allocate stack/spill
slots", but before pushing clobbers. This allows the addressing-mode
expressions to be generated before register allocation (or during it,
for spill/reload sequences).

To convert these offsets into true offsets from SP, we need to track
how much further SP is moved downward, and compensate for this. We do so
with "virtual SP offset adjustment" pseudo-instructions: these are seen
by the emission pass, and result in no instruction (0 byte output), but
update state that is now threaded through each instruction emission in
turn. In this way, we can push e.g. stack args for a call and adjust
the virtual SP offset, allowing reloads from nominal-SP-relative
spillslots while we do the argument setup with "real SP offsets" at the
same time.

As part of this PR, I also adjust how temporaries are used: I realized
(thanks to a comment in @alexcrichton's stack-limit PR) that x16 and x17
are the "interprocedural veneer temporaries", free to use within a
function body. These are now are spilltmp and "tmp2".

Wasmtime GitHub notifications bot (May 06 2020 at 16:34):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (May 06 2020 at 16:34):

cfallin created PR Review Comment:

Removed, thanks!

Wasmtime GitHub notifications bot (May 06 2020 at 17:29):

cfallin merged PR #1607.

Last updated: Apr 16 2025 at 22:03 UTC