wasmtime / PR #11727 Cranelift: use SP-offset amodes for ... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #11727 Cranelift: use SP-offset amodes for ...

Wasmtime GitHub notifications bot (Sep 21 2025 at 05:12):

cfallin opened PR #11727 from cfallin:direct-stack-loads-stores to bytecodealliance:main:

We provide stack_load/ stack_store / stack_addr instructions in Cranelift to operate on stack slots, and the first two are legalized to a stack_addr plus an ordinary load or store instruction.

We currently have lowerings for stack_addr that materialize an SP-relative address into a register: for example, leaq 8(%rsp), %rax on x86-64 or add x0, sp, #8 on aarch64.

Taken together, we see sequences like (aarch64 / x86-64)
    add x0, sp, #8       /   leaq 8(%rsp), %rax
    str x1, [x0]         /   movq %rdx, (%rax)
when using stack_stores. In particular, we do not use the direct SP-relative form, which would look like
    str x1, [sp, #8]     /   movq %rdx, 8(%rsp)
and which we can already generate in other cases, e.g. spillslot moves (spills/reloads) and clobber saves/restores.

This inefficiency is undesirable whenever the embedder is using stackslots, but in particular when we expect to have high memory traffic to stack slots (e.g., I am seeing this now when implementing debug instrumentation in Wasmtime, and user stack map instrumentation for GC will also benefit).

This PR adds new lowerings that use the existing synthetic address mode we already use for spillslots to emit loads/stores to stackslots directly when possible. The PR does this for x86-64 and aarch64; others could be updated later.

Wasmtime GitHub notifications bot (Sep 21 2025 at 05:12):

cfallin requested abrown for a review on PR #11727.

Wasmtime GitHub notifications bot (Sep 21 2025 at 05:12):

cfallin requested wasmtime-compiler-reviewers for a review on PR #11727.

Wasmtime GitHub notifications bot (Sep 21 2025 at 05:12):

cfallin requested pchickey for a review on PR #11727.

Wasmtime GitHub notifications bot (Sep 21 2025 at 05:12):

cfallin requested wasmtime-core-reviewers for a review on PR #11727.

Wasmtime GitHub notifications bot (Sep 21 2025 at 07:44):

github-actions[bot] commented on PR #11727:

Subscribe to Label Action

cc @cfallin, @fitzgen

<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64", "isle"

Thus the following users have been cc'd because of the following labels:

cfallin: isle

fitzgen: isle

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (Sep 22 2025 at 13:07):

bjorn3 commented on PR #11727:

This is a much cleaner implementation than what I did for https://bytecodealliance.zulipchat.com/#narrow/channel/217117-cranelift/topic/stack_addr.20.2B.20load.2Fstore.20merging/with/540466352, while still having the exact same performance on x86_64 (aka cg_clif produces faster executables than llvm -O0) and also working on arm64. This passes the full cg_clif test suite on x86_64.

On arm64 I'm getting a test failure with the jit mode however. There is a call to printf with 0x10000e73c18d0 as address, but the expected string can be found at 0xffffe73c18d0 on the stack (the stack is from 0xfffffffdf000 to 0x1000000000000).

Wasmtime GitHub notifications bot (Sep 22 2025 at 13:09):

bjorn3 edited a comment on PR #11727:

This is a much cleaner implementation than what I did for https://bytecodealliance.zulipchat.com/#narrow/channel/217117-cranelift/topic/stack_addr.20.2B.20load.2Fstore.20merging/with/540466352, while still having the exact same performance on x86_64 (aka cg_clif produces faster executables than llvm -O0) and also working on arm64. This passes the full cg_clif test suite on x86_64.

On arm64 I'm getting a test failure with the jit mode however. There is a call to printf with 0x10000e73c18d0 as address, but the expected string can be found at 0xffffe73c18d0 on the stack (the stack is from 0xfffffffdf000 to 0x1000000000000). You can reproduce this by running ./test.sh after patching the Cargo.toml of cg_clif to use the Cranelift from this PR.

Wasmtime GitHub notifications bot (Sep 22 2025 at 15:33):

bjorn3 edited a comment on PR #11727:

This is a much cleaner implementation than what I did for https://bytecodealliance.zulipchat.com/#narrow/channel/217117-cranelift/topic/stack_addr.20.2B.20load.2Fstore.20merging/with/540466352, while still having the exact same performance on x86_64 (aka cg_clif produces faster executables than llvm -O0) and also working on arm64. This passes the full cg_clif test suite on x86_64.

On arm64 I'm getting a test failure with the jit mode however. There is a call to printf with 0x10000e73c18d0 as address, but the expected string can be found at 0xffffe73c18d0 on the stack (the stack is from 0xfffffffdf000 to 0x1000000000000). You can reproduce this by running ./test.sh after patching the Cargo.toml of cg_clif to use the Cranelift from this PR.
Edit: Never mind. The test failure is unrelated to this PR.

Wasmtime GitHub notifications bot (Sep 23 2025 at 08:35):

bjorn3 edited a comment on PR #11727:

This is a much cleaner implementation than what I did for https://bytecodealliance.zulipchat.com/#narrow/channel/217117-cranelift/topic/stack_addr.20.2B.20load.2Fstore.20merging/with/540466352, while still having the exact same performance on x86_64 (aka cg_clif produces faster executables than llvm -O0) and also working on arm64. This passes the full cg_clif test suite on x86_64.

On arm64 I'm getting a test failure with the jit mode however. There is a call to printf with 0x10000e73c18d0 as address, but the expected string can be found at 0xffffe73c18d0 on the stack (the stack is from 0xfffffffdf000 to 0x1000000000000). You can reproduce this by running ./test.sh after patching the Cargo.toml of cg_clif to use the Cranelift from this PR.
Edit: Never mind. The test failure is unrelated to this PR.
Edit2: https://github.com/bytecodealliance/wasmtime/pull/11734 has the fix.

Wasmtime GitHub notifications bot (Sep 23 2025 at 18:59):

cfallin edited PR #11727:

We provide stack_load/ stack_store / stack_addr instructions in Cranelift to operate on stack slots, and the first two are legalized to a stack_addr plus an ordinary load or store instruction.

We currently have lowerings for stack_addr that materialize an SP-relative address into a register: for example, leaq 8(%rsp), %rax on x86-64 or add x0, sp, #8 on aarch64.

Taken together, we see sequences like (aarch64 / x86-64)
    add x0, sp, #8       /   leaq 8(%rsp), %rax
    str x1, [x0]         /   movq %rdx, (%rax)
when using stack_stores. In particular, we do not use the direct SP-relative form, which would look like
    str x1, [sp, #8]     /   movq %rdx, 8(%rsp)
and which we can already generate in other cases, e.g. spillslot moves (spills/reloads) and clobber saves/restores.

This inefficiency is undesirable whenever the embedder is using stackslots, but in particular when we expect to have high memory traffic to stack slots (e.g., I am seeing this now when implementing debug instrumentation in Wasmtime, and user stack map instrumentation for GC will also benefit).

This PR adds new lowerings that use the existing synthetic address mode we already use for spillslots to emit loads/stores to stackslots directly when possible. The PR does this for x86-64 and aarch64; others could be updated later.

Fixes #1064.

Wasmtime GitHub notifications bot (Sep 24 2025 at 18:45):

cfallin commented on PR #11727:

(In case others didn't see email updates from edits in bjorn3's comment above: the issue was unrelated from a cg_clif upgrade of Cranelift seeing another regression; this PR is unrelated and remains ready for review)

Wasmtime GitHub notifications bot (Sep 25 2025 at 20:45):

abrown submitted PR review:

Makes sense!

Wasmtime GitHub notifications bot (Sep 25 2025 at 21:32):

cfallin merged PR #11727.

Last updated: Feb 24 2026 at 05:28 UTC