wasmtime / Issue #1925 Collapse `load; add; store` into a... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / Issue #1925 Collapse `load; add; store` into a...

Wasmtime GitHub notifications bot (Jun 25 2020 at 21:27):

fitzgen opened Issue #1925:

Right now, this

function %add(i64) {
    block0(v0: i64):
        v1 = load.i32 aligned notrap v0
        v2 = iadd_imm v1, 1
        store aligned notrap v2, v1
        return
}

compiles down into

Disassembly of 19 bytes:
   0:   40 55                   push    rbp
   2:   48 89 e5                mov rbp, rsp
   5:   40 8b 07                mov eax, dword ptr [rdi]
   8:   89 c1                   mov ecx, eax
   a:   83 c1 01                add ecx, 1
   d:   40 89 08                mov dword ptr [rax], ecx
  10:   40 5d                   pop rbp
  12:   c3                      ret

on x86-64.

I would expect it to instead generate

push rbp
mov  rbp, rsp
add  dword ptr [rdi], 1
pop  rbp
ret

but I don't think we any IR instruction handles this addressing mode right now.

Wasmtime GitHub notifications bot (Jun 25 2020 at 21:27):

fitzgen labeled Issue #1925:

Right now, this

function %add(i64) {
    block0(v0: i64):
        v1 = load.i32 aligned notrap v0
        v2 = iadd_imm v1, 1
        store aligned notrap v2, v1
        return
}

compiles down into

Disassembly of 19 bytes:
   0:   40 55                   push    rbp
   2:   48 89 e5                mov rbp, rsp
   5:   40 8b 07                mov eax, dword ptr [rdi]
   8:   89 c1                   mov ecx, eax
   a:   83 c1 01                add ecx, 1
   d:   40 89 08                mov dword ptr [rax], ecx
  10:   40 5d                   pop rbp
  12:   c3                      ret

on x86-64.

I would expect it to instead generate

push rbp
mov  rbp, rsp
add  dword ptr [rdi], 1
pop  rbp
ret

but I don't think we any IR instruction handles this addressing mode right now.

Wasmtime GitHub notifications bot (Jun 25 2020 at 21:27):

fitzgen labeled Issue #1925:

Right now, this

function %add(i64) {
    block0(v0: i64):
        v1 = load.i32 aligned notrap v0
        v2 = iadd_imm v1, 1
        store aligned notrap v2, v1
        return
}

compiles down into

Disassembly of 19 bytes:
   0:   40 55                   push    rbp
   2:   48 89 e5                mov rbp, rsp
   5:   40 8b 07                mov eax, dword ptr [rdi]
   8:   89 c1                   mov ecx, eax
   a:   83 c1 01                add ecx, 1
   d:   40 89 08                mov dword ptr [rax], ecx
  10:   40 5d                   pop rbp
  12:   c3                      ret

on x86-64.

I would expect it to instead generate

push rbp
mov  rbp, rsp
add  dword ptr [rdi], 1
pop  rbp
ret

but I don't think we any IR instruction handles this addressing mode right now.

Wasmtime GitHub notifications bot (Jun 25 2020 at 21:27):

fitzgen labeled Issue #1925:

Right now, this

function %add(i64) {
    block0(v0: i64):
        v1 = load.i32 aligned notrap v0
        v2 = iadd_imm v1, 1
        store aligned notrap v2, v1
        return
}

compiles down into

Disassembly of 19 bytes:
   0:   40 55                   push    rbp
   2:   48 89 e5                mov rbp, rsp
   5:   40 8b 07                mov eax, dword ptr [rdi]
   8:   89 c1                   mov ecx, eax
   a:   83 c1 01                add ecx, 1
   d:   40 89 08                mov dword ptr [rax], ecx
  10:   40 5d                   pop rbp
  12:   c3                      ret

on x86-64.

I would expect it to instead generate

push rbp
mov  rbp, rsp
add  dword ptr [rdi], 1
pop  rbp
ret

but I don't think we any IR instruction handles this addressing mode right now.

Wasmtime GitHub notifications bot (Jun 25 2020 at 21:27):

fitzgen labeled Issue #1925:

Right now, this

function %add(i64) {
    block0(v0: i64):
        v1 = load.i32 aligned notrap v0
        v2 = iadd_imm v1, 1
        store aligned notrap v2, v1
        return
}

compiles down into

Disassembly of 19 bytes:
   0:   40 55                   push    rbp
   2:   48 89 e5                mov rbp, rsp
   5:   40 8b 07                mov eax, dword ptr [rdi]
   8:   89 c1                   mov ecx, eax
   a:   83 c1 01                add ecx, 1
   d:   40 89 08                mov dword ptr [rax], ecx
  10:   40 5d                   pop rbp
  12:   c3                      ret

on x86-64.

I would expect it to instead generate

push rbp
mov  rbp, rsp
add  dword ptr [rdi], 1
pop  rbp
ret

but I don't think we any IR instruction handles this addressing mode right now.

Wasmtime GitHub notifications bot (Jun 25 2020 at 21:28):

github-actions[bot] commented on Issue #1925:

Subscribe to Label Action

cc @bnjbvr

<details>
This issue or pull request has been labeled: "cranelift"

Thus the following users have been cc'd because of the following labels:

bnjbvr: cranelift

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (Jun 25 2020 at 21:29):

fitzgen commented on Issue #1925:

(Using --set opt_level=speed)

Wasmtime GitHub notifications bot (Jun 25 2020 at 21:50):

abrown commented on Issue #1925:

Which backend? I don't think the old backend can do this (due to limited addressing modes in the recipes) but I was planning to add this optimization to the new backend eventually since that has support for different addressing modes.

Wasmtime GitHub notifications bot (Jun 25 2020 at 21:51):

abrown commented on Issue #1925:

(and by "I was planning to add this" I mean feel free to do this since it will take me a bit to get there).

Wasmtime GitHub notifications bot (Jun 25 2020 at 21:56):

fitzgen commented on Issue #1925:

The current backend was all I tested.

Wasmtime GitHub notifications bot (Jun 25 2020 at 22:04):

bjorn3 commented on Issue #1925:

iadd_mem, isub_mem, etc (any suggestions fir a better name) instructions would also be necessary to implement atomic increment.

Wasmtime GitHub notifications bot (Jun 26 2020 at 04:43):

julian-seward1 commented on Issue #1925:

In the new backend, this would require a new Inst variant, Alu_R_RM or perhaps Alu_RI_RM. That bit is easy. More complex is that we'd have to add structure to the new instruction selection framework to match the multiple CLIR insns and produce a single x64 insn. Currently it can do that for pure-value trees rooted at some node in the CLIR insn dag, but this doesn't fit into that scenario. We'd have to confer with @cfallin.

Wasmtime GitHub notifications bot (Jun 26 2020 at 09:05):

bnjbvr commented on Issue #1925:

This is entirely doable in the new backend indeed, with less effort than would be required for the old one (viz., no need to add an extra specific CLIF instruction). I would strongly recommend to not implement this in the old backend at this point (more to come about this in the next few hours/days!).

Wasmtime GitHub notifications bot (Jun 26 2020 at 09:14):

bjorn3 commented on Issue #1925:

viz., no need to add an extra specific CLIF instruction

What I meant is that a specific CLIF instruction is needed anyway for atomic increment, so it could just as well have a non-atomic variant. I did imagine that atomicness would be a MemFlag instead of duplicating every instruction that operates on memory.

Wasmtime GitHub notifications bot (Jun 26 2020 at 16:13):

fitzgen commented on Issue #1925:

FWIW, having a specific instruction for this would be nice too. I'd like to use it in wasmtime's gc barriers for reference types.

Wasmtime GitHub notifications bot (Nov 23 2020 at 21:55):

fitzgen commented on Issue #1925:

The infrastructure for the new backend to do this collapsing landed in https://github.com/bytecodealliance/wasmtime/pull/2366 and some follow up work for merging ALU ops and memory loads in https://github.com/bytecodealliance/wasmtime/pull/2389

I guess we should decide whether we ever want to support an iadd_imm_mem instruction in clif itself, or if we want this to only be something that the backend optimizes risc-y instructions into. If we don't want to add such an instruction, then I think we can close this issue.

Wasmtime GitHub notifications bot (Nov 23 2020 at 23:19):

cfallin commented on Issue #1925:

I think that the latter is probably the better option; in general we want to try to reduce the instruction count going forward, and do better with pattern-matching.

This isn't too much more work on top of what we already have (load+op); we can either keep this issue open to track that work (the title seems to match what we want to do still?) or open a new one, up to you!

Wasmtime GitHub notifications bot (Nov 23 2020 at 23:22):

fitzgen commented on Issue #1925:

Sounds good. Let's keep this open to track combining load+op+store into a single instruction.

Wasmtime GitHub notifications bot (Nov 23 2020 at 23:23):

fitzgen commented on Issue #1925:

And we should probably double check that the GC barriers do end up using a single add in practice, too.

Last updated: Apr 20 2025 at 01:31 UTC