bjorn3 commented on Issue #1085:
This is fixed with the new x64 backend, right?
cfallin commented on Issue #1085:
Yes, the new backend chooses caller-save (volatile) registers first, to avoid prologue/epilogue stores/loads, unless a callsite's clobbers force values into caller-saved regs.
cfallin closed Issue #1085:
Just something I've noticed while analyzing a program: in this example, on x86 64, the register allocator chooses to use RBX (non-volatile) while e.g. R9 (volatile) is available, adding a spurious push/pop sequence to the function's body. I think it should prefer volatile registers, as long as we have some available.
Cranelift line to run the program:
cargo run wasm -dDp --target x86_64 a.wat
WebAssembly text (a.wat):
<details>(module (memory (export "mem") 100) (func $dummy) (func $loop (export "dot") (param $len i32) (result i32) (local $k1 i32) (local $k2 i32) (local $k3 i32) (local $k4 i32) (local $k5 i32) (local $k6 i32) (loop $AGAIN (if (local.get $len) (block (local.set $len (i32.sub (local.get $len) (i32.const 1))) (local.set $k1 (i32.add (local.get $k1) (i32.const 1))) (local.set $k2 (i32.add (local.get $k2) (i32.const 2))) (local.set $k3 (i32.add (local.get $k3) (i32.const 3))) (local.set $k4 (i32.add (local.get $k4) (i32.const 4))) (local.set $k5 (i32.add (local.get $k5) (i32.const 5))) (local.set $k6 (i32.add (local.get $k6) (i32.const 6))) (br_if $AGAIN (local.get $len))))) (i32.add (i32.add (i32.add (i32.add (i32.add (local.get $k1) (local.get $k2)) (local.get $k3)) (local.get $k4)) (local.get $k5)) (local.get $k6))))
</details>
Generated code:
<details>0: 40 55 push rbp 2: 48 89 e5 mov rbp, rsp 5: 40 53 push rbx 7: 48 83 ec 08 sub rsp, 8 b: 40 b8 00 00 00 00 mov eax, 0 11: 40 89 c1 mov ecx, eax 14: 40 89 c2 mov edx, eax 17: 40 89 c3 mov ebx, eax 1a: 40 89 c6 mov esi, eax 1d: 41 89 c0 mov r8d, eax 20: 40 85 ff test edi, edi 23: 74 21 je 0x46 25: 40 83 c7 ff add edi, -1 29: 41 83 c0 01 add r8d, 1 2d: 40 83 c6 02 add esi, 2 31: 40 83 c3 03 add ebx, 3 35: 40 83 c2 04 add edx, 4 39: 40 83 c1 05 add ecx, 5 3d: 40 83 c0 06 add eax, 6 41: 40 85 ff test edi, edi 44: 75 da jne 0x20 46: 41 01 f0 add r8d, esi 49: 41 01 d8 add r8d, ebx 4c: 41 01 d0 add r8d, edx 4f: 41 01 c8 add r8d, ecx 52: 41 01 c0 add r8d, eax 55: 44 89 c0 mov eax, r8d 58: 48 83 c4 08 add rsp, 8 5c: 40 5b pop rbx 5e: 40 5d pop rbp 60: c3 ret
</details>
bnjbvr commented on Issue #1085:
Yes indeed! It's a bit implicit, so for history: the order of registers in the real register universe's array determines which are picked first by regalloc algorithms; since we're pushing (in codegen/src/isa/x64/inst/regs.rs) the caller-saved first (aka volatile, aka call-clobbered), these are preferred over the callee-saved (aka non-volatile).
Last updated: Nov 22 2024 at 16:03 UTC