Stream: git-wasmtime

Topic: wasmtime / Issue #1925 Collapse `load; add; store` into a...


view this post on Zulip Wasmtime GitHub notifications bot (Jun 25 2020 at 21:27):

fitzgen opened Issue #1925:

Right now, this

function %add(i64) {
    block0(v0: i64):
        v1 = load.i32 aligned notrap v0
        v2 = iadd_imm v1, 1
        store aligned notrap v2, v1
        return
}

compiles down into

Disassembly of 19 bytes:
   0:   40 55                   push    rbp
   2:   48 89 e5                mov rbp, rsp
   5:   40 8b 07                mov eax, dword ptr [rdi]
   8:   89 c1                   mov ecx, eax
   a:   83 c1 01                add ecx, 1
   d:   40 89 08                mov dword ptr [rax], ecx
  10:   40 5d                   pop rbp
  12:   c3                      ret

on x86-64.

I would expect it to instead generate

push rbp
mov  rbp, rsp
add  dword ptr [rdi], 1
pop  rbp
ret

but I don't think we any IR instruction handles this addressing mode right now.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 25 2020 at 21:27):

fitzgen labeled Issue #1925:

Right now, this

function %add(i64) {
    block0(v0: i64):
        v1 = load.i32 aligned notrap v0
        v2 = iadd_imm v1, 1
        store aligned notrap v2, v1
        return
}

compiles down into

Disassembly of 19 bytes:
   0:   40 55                   push    rbp
   2:   48 89 e5                mov rbp, rsp
   5:   40 8b 07                mov eax, dword ptr [rdi]
   8:   89 c1                   mov ecx, eax
   a:   83 c1 01                add ecx, 1
   d:   40 89 08                mov dword ptr [rax], ecx
  10:   40 5d                   pop rbp
  12:   c3                      ret

on x86-64.

I would expect it to instead generate

push rbp
mov  rbp, rsp
add  dword ptr [rdi], 1
pop  rbp
ret

but I don't think we any IR instruction handles this addressing mode right now.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 25 2020 at 21:27):

fitzgen labeled Issue #1925:

Right now, this

function %add(i64) {
    block0(v0: i64):
        v1 = load.i32 aligned notrap v0
        v2 = iadd_imm v1, 1
        store aligned notrap v2, v1
        return
}

compiles down into

Disassembly of 19 bytes:
   0:   40 55                   push    rbp
   2:   48 89 e5                mov rbp, rsp
   5:   40 8b 07                mov eax, dword ptr [rdi]
   8:   89 c1                   mov ecx, eax
   a:   83 c1 01                add ecx, 1
   d:   40 89 08                mov dword ptr [rax], ecx
  10:   40 5d                   pop rbp
  12:   c3                      ret

on x86-64.

I would expect it to instead generate

push rbp
mov  rbp, rsp
add  dword ptr [rdi], 1
pop  rbp
ret

but I don't think we any IR instruction handles this addressing mode right now.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 25 2020 at 21:27):

fitzgen labeled Issue #1925:

Right now, this

function %add(i64) {
    block0(v0: i64):
        v1 = load.i32 aligned notrap v0
        v2 = iadd_imm v1, 1
        store aligned notrap v2, v1
        return
}

compiles down into

Disassembly of 19 bytes:
   0:   40 55                   push    rbp
   2:   48 89 e5                mov rbp, rsp
   5:   40 8b 07                mov eax, dword ptr [rdi]
   8:   89 c1                   mov ecx, eax
   a:   83 c1 01                add ecx, 1
   d:   40 89 08                mov dword ptr [rax], ecx
  10:   40 5d                   pop rbp
  12:   c3                      ret

on x86-64.

I would expect it to instead generate

push rbp
mov  rbp, rsp
add  dword ptr [rdi], 1
pop  rbp
ret

but I don't think we any IR instruction handles this addressing mode right now.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 25 2020 at 21:27):

fitzgen labeled Issue #1925:

Right now, this

function %add(i64) {
    block0(v0: i64):
        v1 = load.i32 aligned notrap v0
        v2 = iadd_imm v1, 1
        store aligned notrap v2, v1
        return
}

compiles down into

Disassembly of 19 bytes:
   0:   40 55                   push    rbp
   2:   48 89 e5                mov rbp, rsp
   5:   40 8b 07                mov eax, dword ptr [rdi]
   8:   89 c1                   mov ecx, eax
   a:   83 c1 01                add ecx, 1
   d:   40 89 08                mov dword ptr [rax], ecx
  10:   40 5d                   pop rbp
  12:   c3                      ret

on x86-64.

I would expect it to instead generate

push rbp
mov  rbp, rsp
add  dword ptr [rdi], 1
pop  rbp
ret

but I don't think we any IR instruction handles this addressing mode right now.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 25 2020 at 21:28):

github-actions[bot] commented on Issue #1925:

Subscribe to Label Action

cc @bnjbvr

<details>
This issue or pull request has been labeled: "cranelift"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Jun 25 2020 at 21:29):

fitzgen commented on Issue #1925:

(Using --set opt_level=speed)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 25 2020 at 21:50):

abrown commented on Issue #1925:

Which backend? I don't think the old backend can do this (due to limited addressing modes in the recipes) but I was planning to add this optimization to the new backend eventually since that has support for different addressing modes.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 25 2020 at 21:51):

abrown commented on Issue #1925:

(and by "I was planning to add this" I mean feel free to do this since it will take me a bit to get there).

view this post on Zulip Wasmtime GitHub notifications bot (Jun 25 2020 at 21:56):

fitzgen commented on Issue #1925:

The current backend was all I tested.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 25 2020 at 22:04):

bjorn3 commented on Issue #1925:

iadd_mem, isub_mem, etc (any suggestions fir a better name) instructions would also be necessary to implement atomic increment.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 26 2020 at 04:43):

julian-seward1 commented on Issue #1925:

In the new backend, this would require a new Inst variant, Alu_R_RM or perhaps Alu_RI_RM. That bit is easy. More complex is that we'd have to add structure to the new instruction selection framework to match the multiple CLIR insns and produce a single x64 insn. Currently it can do that for pure-value trees rooted at some node in the CLIR insn dag, but this doesn't fit into that scenario. We'd have to confer with @cfallin.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 26 2020 at 09:05):

bnjbvr commented on Issue #1925:

This is entirely doable in the new backend indeed, with less effort than would be required for the old one (viz., no need to add an extra specific CLIF instruction). I would strongly recommend to not implement this in the old backend at this point (more to come about this in the next few hours/days!).

view this post on Zulip Wasmtime GitHub notifications bot (Jun 26 2020 at 09:14):

bjorn3 commented on Issue #1925:

viz., no need to add an extra specific CLIF instruction

What I meant is that a specific CLIF instruction is needed anyway for atomic increment, so it could just as well have a non-atomic variant. I did imagine that atomicness would be a MemFlag instead of duplicating every instruction that operates on memory.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 26 2020 at 16:13):

fitzgen commented on Issue #1925:

FWIW, having a specific instruction for this would be nice too. I'd like to use it in wasmtime's gc barriers for reference types.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 23 2020 at 21:55):

fitzgen commented on Issue #1925:

The infrastructure for the new backend to do this collapsing landed in https://github.com/bytecodealliance/wasmtime/pull/2366 and some follow up work for merging ALU ops and memory loads in https://github.com/bytecodealliance/wasmtime/pull/2389

I guess we should decide whether we ever want to support an iadd_imm_mem instruction in clif itself, or if we want this to only be something that the backend optimizes risc-y instructions into. If we don't want to add such an instruction, then I think we can close this issue.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 23 2020 at 23:19):

cfallin commented on Issue #1925:

I think that the latter is probably the better option; in general we want to try to reduce the instruction count going forward, and do better with pattern-matching.

This isn't too much more work on top of what we already have (load+op); we can either keep this issue open to track that work (the title seems to match what we want to do still?) or open a new one, up to you!

view this post on Zulip Wasmtime GitHub notifications bot (Nov 23 2020 at 23:22):

fitzgen commented on Issue #1925:

Sounds good. Let's keep this open to track combining load+op+store into a single instruction.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 23 2020 at 23:23):

fitzgen commented on Issue #1925:

And we should probably double check that the GC barriers do end up using a single add in practice, too.


Last updated: Dec 23 2024 at 12:05 UTC