Stream: git-wasmtime

Topic: wasmtime / Issue #1085 Regalloc should prefer non-volatil...


view this post on Zulip Wasmtime GitHub notifications bot (Feb 03 2021 at 19:57):

bjorn3 commented on Issue #1085:

This is fixed with the new x64 backend, right?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 03 2021 at 20:10):

cfallin commented on Issue #1085:

Yes, the new backend chooses caller-save (volatile) registers first, to avoid prologue/epilogue stores/loads, unless a callsite's clobbers force values into caller-saved regs.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 03 2021 at 20:10):

cfallin closed Issue #1085:

Just something I've noticed while analyzing a program: in this example, on x86 64, the register allocator chooses to use RBX (non-volatile) while e.g. R9 (volatile) is available, adding a spurious push/pop sequence to the function's body. I think it should prefer volatile registers, as long as we have some available.

Cranelift line to run the program: cargo run wasm -dDp --target x86_64 a.wat

WebAssembly text (a.wat):
<details>

(module
  (memory (export "mem") 100)
  (func $dummy)
  (func $loop (export "dot") (param $len i32) (result i32)
    (local $k1 i32)
    (local $k2 i32)
    (local $k3 i32)
    (local $k4 i32)
    (local $k5 i32)
    (local $k6 i32)
    (loop $AGAIN
      (if (local.get $len)
          (block
            (local.set $len (i32.sub (local.get $len) (i32.const 1)))
            (local.set $k1 (i32.add (local.get $k1) (i32.const 1)))
            (local.set $k2 (i32.add (local.get $k2) (i32.const 2)))
            (local.set $k3 (i32.add (local.get $k3) (i32.const 3)))
            (local.set $k4 (i32.add (local.get $k4) (i32.const 4)))
            (local.set $k5 (i32.add (local.get $k5) (i32.const 5)))
            (local.set $k6 (i32.add (local.get $k6) (i32.const 6)))
            (br_if $AGAIN (local.get $len)))))
      (i32.add (i32.add (i32.add (i32.add (i32.add (local.get $k1) (local.get $k2))
                                          (local.get $k3))
                                 (local.get $k4))
                        (local.get $k5))
               (local.get $k6))))

</details>

Generated code:
<details>

   0:   40 55                   push    rbp
   2:   48 89 e5                mov     rbp, rsp
   5:   40 53                   push    rbx
   7:   48 83 ec 08             sub     rsp, 8
   b:   40 b8 00 00 00 00       mov     eax, 0
  11:   40 89 c1                mov     ecx, eax
  14:   40 89 c2                mov     edx, eax
  17:   40 89 c3                mov     ebx, eax
  1a:   40 89 c6                mov     esi, eax
  1d:   41 89 c0                mov     r8d, eax
  20:   40 85 ff                test    edi, edi
  23:   74 21                   je      0x46
  25:   40 83 c7 ff             add     edi, -1
  29:   41 83 c0 01             add     r8d, 1
  2d:   40 83 c6 02             add     esi, 2
  31:   40 83 c3 03             add     ebx, 3
  35:   40 83 c2 04             add     edx, 4
  39:   40 83 c1 05             add     ecx, 5
  3d:   40 83 c0 06             add     eax, 6
  41:   40 85 ff                test    edi, edi
  44:   75 da                   jne     0x20
  46:   41 01 f0                add     r8d, esi
  49:   41 01 d8                add     r8d, ebx
  4c:   41 01 d0                add     r8d, edx
  4f:   41 01 c8                add     r8d, ecx
  52:   41 01 c0                add     r8d, eax
  55:   44 89 c0                mov     eax, r8d
  58:   48 83 c4 08             add     rsp, 8
  5c:   40 5b                   pop     rbx
  5e:   40 5d                   pop     rbp
  60:   c3                      ret

</details>

view this post on Zulip Wasmtime GitHub notifications bot (Feb 03 2021 at 20:10):

bnjbvr commented on Issue #1085:

Yes indeed! It's a bit implicit, so for history: the order of registers in the real register universe's array determines which are picked first by regalloc algorithms; since we're pushing (in codegen/src/isa/x64/inst/regs.rs) the caller-saved first (aka volatile, aka call-clobbered), these are preferred over the callee-saved (aka non-volatile).


Last updated: Oct 23 2024 at 20:03 UTC