fitzgen opened Issue #1925:
Right now, this
function %add(i64) { block0(v0: i64): v1 = load.i32 aligned notrap v0 v2 = iadd_imm v1, 1 store aligned notrap v2, v1 return }
compiles down into
Disassembly of 19 bytes: 0: 40 55 push rbp 2: 48 89 e5 mov rbp, rsp 5: 40 8b 07 mov eax, dword ptr [rdi] 8: 89 c1 mov ecx, eax a: 83 c1 01 add ecx, 1 d: 40 89 08 mov dword ptr [rax], ecx 10: 40 5d pop rbp 12: c3 ret
on x86-64.
I would expect it to instead generate
push rbp mov rbp, rsp add dword ptr [rdi], 1 pop rbp ret
but I don't think we any IR instruction handles this addressing mode right now.
fitzgen labeled Issue #1925:
Right now, this
function %add(i64) { block0(v0: i64): v1 = load.i32 aligned notrap v0 v2 = iadd_imm v1, 1 store aligned notrap v2, v1 return }
compiles down into
Disassembly of 19 bytes: 0: 40 55 push rbp 2: 48 89 e5 mov rbp, rsp 5: 40 8b 07 mov eax, dword ptr [rdi] 8: 89 c1 mov ecx, eax a: 83 c1 01 add ecx, 1 d: 40 89 08 mov dword ptr [rax], ecx 10: 40 5d pop rbp 12: c3 ret
on x86-64.
I would expect it to instead generate
push rbp mov rbp, rsp add dword ptr [rdi], 1 pop rbp ret
but I don't think we any IR instruction handles this addressing mode right now.
fitzgen labeled Issue #1925:
Right now, this
function %add(i64) { block0(v0: i64): v1 = load.i32 aligned notrap v0 v2 = iadd_imm v1, 1 store aligned notrap v2, v1 return }
compiles down into
Disassembly of 19 bytes: 0: 40 55 push rbp 2: 48 89 e5 mov rbp, rsp 5: 40 8b 07 mov eax, dword ptr [rdi] 8: 89 c1 mov ecx, eax a: 83 c1 01 add ecx, 1 d: 40 89 08 mov dword ptr [rax], ecx 10: 40 5d pop rbp 12: c3 ret
on x86-64.
I would expect it to instead generate
push rbp mov rbp, rsp add dword ptr [rdi], 1 pop rbp ret
but I don't think we any IR instruction handles this addressing mode right now.
fitzgen labeled Issue #1925:
Right now, this
function %add(i64) { block0(v0: i64): v1 = load.i32 aligned notrap v0 v2 = iadd_imm v1, 1 store aligned notrap v2, v1 return }
compiles down into
Disassembly of 19 bytes: 0: 40 55 push rbp 2: 48 89 e5 mov rbp, rsp 5: 40 8b 07 mov eax, dword ptr [rdi] 8: 89 c1 mov ecx, eax a: 83 c1 01 add ecx, 1 d: 40 89 08 mov dword ptr [rax], ecx 10: 40 5d pop rbp 12: c3 ret
on x86-64.
I would expect it to instead generate
push rbp mov rbp, rsp add dword ptr [rdi], 1 pop rbp ret
but I don't think we any IR instruction handles this addressing mode right now.
fitzgen labeled Issue #1925:
Right now, this
function %add(i64) { block0(v0: i64): v1 = load.i32 aligned notrap v0 v2 = iadd_imm v1, 1 store aligned notrap v2, v1 return }
compiles down into
Disassembly of 19 bytes: 0: 40 55 push rbp 2: 48 89 e5 mov rbp, rsp 5: 40 8b 07 mov eax, dword ptr [rdi] 8: 89 c1 mov ecx, eax a: 83 c1 01 add ecx, 1 d: 40 89 08 mov dword ptr [rax], ecx 10: 40 5d pop rbp 12: c3 ret
on x86-64.
I would expect it to instead generate
push rbp mov rbp, rsp add dword ptr [rdi], 1 pop rbp ret
but I don't think we any IR instruction handles this addressing mode right now.
github-actions[bot] commented on Issue #1925:
Subscribe to Label Action
cc @bnjbvr
<details>
This issue or pull request has been labeled: "cranelift"Thus the following users have been cc'd because of the following labels:
- bnjbvr: cranelift
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
fitzgen commented on Issue #1925:
(Using
--set opt_level=speed
)
abrown commented on Issue #1925:
Which backend? I don't think the old backend can do this (due to limited addressing modes in the recipes) but I was planning to add this optimization to the new backend eventually since that has support for different addressing modes.
abrown commented on Issue #1925:
(and by "I was planning to add this" I mean feel free to do this since it will take me a bit to get there).
fitzgen commented on Issue #1925:
The current backend was all I tested.
bjorn3 commented on Issue #1925:
iadd_mem
,isub_mem
, etc (any suggestions fir a better name) instructions would also be necessary to implement atomic increment.
julian-seward1 commented on Issue #1925:
In the new backend, this would require a new
Inst
variant,Alu_R_RM
or perhapsAlu_RI_RM
. That bit is easy. More complex is that we'd have to add structure to the new instruction selection framework to match the multiple CLIR insns and produce a single x64 insn. Currently it can do that for pure-value trees rooted at some node in the CLIR insn dag, but this doesn't fit into that scenario. We'd have to confer with @cfallin.
bnjbvr commented on Issue #1925:
This is entirely doable in the new backend indeed, with less effort than would be required for the old one (viz., no need to add an extra specific CLIF instruction). I would strongly recommend to not implement this in the old backend at this point (more to come about this in the next few hours/days!).
bjorn3 commented on Issue #1925:
viz., no need to add an extra specific CLIF instruction
What I meant is that a specific CLIF instruction is needed anyway for atomic increment, so it could just as well have a non-atomic variant. I did imagine that atomicness would be a
MemFlag
instead of duplicating every instruction that operates on memory.
fitzgen commented on Issue #1925:
FWIW, having a specific instruction for this would be nice too. I'd like to use it in wasmtime's gc barriers for reference types.
fitzgen commented on Issue #1925:
The infrastructure for the new backend to do this collapsing landed in https://github.com/bytecodealliance/wasmtime/pull/2366 and some follow up work for merging ALU ops and memory loads in https://github.com/bytecodealliance/wasmtime/pull/2389
I guess we should decide whether we ever want to support an
iadd_imm_mem
instruction in clif itself, or if we want this to only be something that the backend optimizes risc-y instructions into. If we don't want to add such an instruction, then I think we can close this issue.
cfallin commented on Issue #1925:
I think that the latter is probably the better option; in general we want to try to reduce the instruction count going forward, and do better with pattern-matching.
This isn't too much more work on top of what we already have (load+op); we can either keep this issue open to track that work (the title seems to match what we want to do still?) or open a new one, up to you!
fitzgen commented on Issue #1925:
Sounds good. Let's keep this open to track combining load+op+store into a single instruction.
fitzgen commented on Issue #1925:
And we should probably double check that the GC barriers do end up using a single
add
in practice, too.
Last updated: Dec 23 2024 at 12:05 UTC