Stream: git-wasmtime

Topic: wasmtime / issue #3653 cranelift: Port `bitselect` over t...


view this post on Zulip Wasmtime GitHub notifications bot (Jan 06 2022 at 02:25):

github-actions[bot] commented on issue #3653:

Subscribe to Label Action

cc @cfallin, @fitzgen

<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "isle"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Jan 06 2022 at 17:45):

fitzgen commented on issue #3653:

we are generating some unnecessary movdqas now

It looks like the initial movdqa is being inserted by move mitosis, but that the regalloc isn't coalescing and eliding the move:

 TRACE cranelift_codegen::isa::x64::inst       > mov_mitosis(pand    %v1V, %v5V)
 TRACE cranelift_codegen::isa::x64::inst       >   -> movdqa  %v0V, %v5V
 TRACE cranelift_codegen::isa::x64::inst       >   -> pand    %v1V, %v5V
 ```

@cfallin any idea why regalloc might not be able to coalesce/elide this move? (See OP for full code)
~~~

view this post on Zulip Wasmtime GitHub notifications bot (Jan 06 2022 at 17:51):

fitzgen commented on issue #3653:

It looks like regalloc correctly determines that these virtual registers (v0V and v5V) are in the same e-class / are connected by moves (and therefore could be coalesced?) but isn't actually coalescing them:

 TRACE cranelift_codegen::machinst::compile    > vcode from lowering:
VCode_ShowWithRRU {{
  Entry block: 0
Block 0:
  (original IR block: block0)
  (instruction range: 0 .. 12)
  Inst 0:   movdqa  %xmm0, %v0V
  Inst 1:   movdqa  %xmm1, %v1V
  Inst 2:   movdqa  %xmm2, %v2V
  Inst 3:   movdqa  %v0V, %v5V
  Inst 4:   pand    %v1V, %v5V
  Inst 5:   movdqa  %v0V, %v6V
  Inst 6:   pandn   %v2V, %v6V
  Inst 7:   movdqa  %v5V, %v3V
  Inst 8:   por     %v6V, %v3V
  Inst 9:   movdqa  %v3V, %v4V
  Inst 10:   movdqa  %v4V, %xmm0
  Inst 11:   ret
}}

...

 TRACE regalloc::bt_coalescing_analysis        >
 TRACE regalloc::bt_coalescing_analysis        > do_coalescing_analysis: begin
 TRACE regalloc::bt_coalescing_analysis        > connected by moves: i0 v0V <- r0V (est_freq 1)
 TRACE regalloc::bt_coalescing_analysis        > connected by moves: i1 v1V <- r1V (est_freq 1)
 TRACE regalloc::bt_coalescing_analysis        > connected by moves: i2 v2V <- r2V (est_freq 1)
 TRACE regalloc::bt_coalescing_analysis        > connected by moves: i3 v5V <- v0V (est_freq 1)
 TRACE regalloc::bt_coalescing_analysis        > connected by moves: i5 v6V <- v0V (est_freq 1)
 TRACE regalloc::bt_coalescing_analysis        > reduce cost of vr0 and vr6
 TRACE regalloc::bt_coalescing_analysis        > connected by moves: i7 v3V <- v5V (est_freq 1)
 TRACE regalloc::bt_coalescing_analysis        > reduce cost of vr5 and vr3
 TRACE regalloc::bt_coalescing_analysis        > connected by moves: i9 v4V <- v3V (est_freq 1)
 TRACE regalloc::bt_coalescing_analysis        > reduce cost of vr3 and vr4
 TRACE regalloc::bt_coalescing_analysis        > connected by moves: i10 r0V <- v4V (est_freq 1)
 TRACE regalloc::bt_coalescing_analysis        > Revised VLRs:
 TRACE regalloc::bt_coalescing_analysis        > vr0   (VR: v0V, sz=6, tc=2, sc=0.333, [(RF: i0.d-i5.u)])
 TRACE regalloc::bt_coalescing_analysis        > vr1   (VR: v1V, sz=4, tc=2, sc=0.500, [(RF: i1.d-i4.u)])
 TRACE regalloc::bt_coalescing_analysis        > vr2   (VR: v2V, sz=5, tc=2, sc=0.400, [(RF: i2.d-i6.u)])
 TRACE regalloc::bt_coalescing_analysis        > vr3   (VR: v3V, sz=3, tc=2, sc=0.667, [(RF: i7.d-i9.u)])
 TRACE regalloc::bt_coalescing_analysis        > vr4   (VR: v4V, sz=2, tc=1, sc=0.500, [(RF: i9.d-i10.u)])
 TRACE regalloc::bt_coalescing_analysis        > vr5   (VR: v5V, sz=5, tc=3, sc=0.600, [(RF: i3.d-i7.u)])
 TRACE regalloc::bt_coalescing_analysis        > vr6   (VR: v6V, sz=4, tc=3, sc=0.750, [(RF: i5.d-i8.u)])
 TRACE regalloc::bt_coalescing_analysis        > Coalescing hints:
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr0 = (Exactly %xmm0, weight=1) (SameAs vr6, weight=1)
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr1 = (Exactly %xmm1, weight=1)
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr2 = (Exactly %xmm2, weight=1)
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr3 = (SameAs vr5, weight=1) (SameAs vr4, weight=1)
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr4 = (SameAs vr3, weight=1) (Exactly %xmm0, weight=1)
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr5 = (SameAs vr3, weight=1)
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr6 = (SameAs vr0, weight=1)
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr0 = [vr6, vr0]
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr1 = [vr1]
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr2 = [vr2]
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr3 = [vr4, vr5, vr3]
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr4 = [vr4, vr5, vr3]
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr5 = [vr4, vr5, vr3]
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr6 = [vr6, vr0]
 TRACE regalloc::bt_coalescing_analysis        >   vv_boundary_move at i5
 TRACE regalloc::bt_coalescing_analysis        >   vv_boundary_move at i7
 TRACE regalloc::bt_coalescing_analysis        >   vv_boundary_move at i9
 TRACE regalloc::bt_coalescing_analysis        > do_coalescing_analysis: end
 TRACE regalloc::bt_coalescing_analysis        >

view this post on Zulip Wasmtime GitHub notifications bot (Jan 06 2022 at 18:01):

fitzgen edited a comment on issue #3653:

It looks like regalloc correctly determines that these virtual registers (v0V and v5V) are in the same e-class / are connected by moves (and therefore could be coalesced?) but isn't actually coalescing them:

 TRACE cranelift_codegen::machinst::compile    > vcode from lowering:
VCode_ShowWithRRU {{
  Entry block: 0
Block 0:
  (original IR block: block0)
  (instruction range: 0 .. 12)
  Inst 0:   load_const VCodeConstant(0), %v0V
  Inst 1:   load_const VCodeConstant(0), %v1V
  Inst 2:   load_const VCodeConstant(0), %v2V
  Inst 3:   movdqa  %v0V, %v5V
  Inst 4:   pand    %v1V, %v5V
  Inst 5:   movdqa  %v0V, %v6V
  Inst 6:   pandn   %v2V, %v6V
  Inst 7:   movdqa  %v5V, %v3V
  Inst 8:   por     %v6V, %v3V
  Inst 9:   movdqa  %v3V, %v4V
  Inst 10:   movdqa  %v4V, %xmm0
  Inst 11:   ret
}}

...

 TRACE regalloc::bt_coalescing_analysis        >
 TRACE regalloc::bt_coalescing_analysis        > do_coalescing_analysis: begin
 TRACE regalloc::bt_coalescing_analysis        > connected by moves: i3 v5V <- v0V (est_freq 1)
 TRACE regalloc::bt_coalescing_analysis        > connected by moves: i5 v6V <- v0V (est_freq 1)
 TRACE regalloc::bt_coalescing_analysis        > reduce cost of vr0 and vr6
 TRACE regalloc::bt_coalescing_analysis        > connected by moves: i7 v3V <- v5V (est_freq 1)
 TRACE regalloc::bt_coalescing_analysis        > reduce cost of vr5 and vr3
 TRACE regalloc::bt_coalescing_analysis        > connected by moves: i9 v4V <- v3V (est_freq 1)
 TRACE regalloc::bt_coalescing_analysis        > reduce cost of vr3 and vr4
 TRACE regalloc::bt_coalescing_analysis        > connected by moves: i10 r0V <- v4V (est_freq 1)
 TRACE regalloc::bt_coalescing_analysis        > Revised VLRs:
 TRACE regalloc::bt_coalescing_analysis        > vr0   (VR: v0V, sz=6, tc=2, sc=0.333, [(RF: i0.d-i5.u)])
 TRACE regalloc::bt_coalescing_analysis        > vr1   (VR: v1V, sz=4, tc=2, sc=0.500, [(RF: i1.d-i4.u)])
 TRACE regalloc::bt_coalescing_analysis        > vr2   (VR: v2V, sz=5, tc=2, sc=0.400, [(RF: i2.d-i6.u)])
 TRACE regalloc::bt_coalescing_analysis        > vr3   (VR: v3V, sz=3, tc=2, sc=0.667, [(RF: i7.d-i9.u)])
 TRACE regalloc::bt_coalescing_analysis        > vr4   (VR: v4V, sz=2, tc=1, sc=0.500, [(RF: i9.d-i10.u)])
 TRACE regalloc::bt_coalescing_analysis        > vr5   (VR: v5V, sz=5, tc=3, sc=0.600, [(RF: i3.d-i7.u)])
 TRACE regalloc::bt_coalescing_analysis        > vr6   (VR: v6V, sz=4, tc=3, sc=0.750, [(RF: i5.d-i8.u)])
 TRACE regalloc::bt_coalescing_analysis        > Coalescing hints:
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr0 = (SameAs vr6, weight=1)
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr1 =
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr2 =
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr3 = (SameAs vr5, weight=1) (SameAs vr4, weight=1)
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr4 = (SameAs vr3, weight=1) (Exactly %xmm0, weight=1)
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr5 = (SameAs vr3, weight=1)
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr6 = (SameAs vr0, weight=1)
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr0 = [vr6, vr0]
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr1 = [vr1]
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr2 = [vr2]
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr3 = [vr4, vr5, vr3]
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr4 = [vr4, vr5, vr3]
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr5 = [vr4, vr5, vr3]
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr6 = [vr6, vr0]
 TRACE regalloc::bt_coalescing_analysis        >   vv_boundary_move at i5
 TRACE regalloc::bt_coalescing_analysis        >   vv_boundary_move at i7
 TRACE regalloc::bt_coalescing_analysis        >   vv_boundary_move at i9
 TRACE regalloc::bt_coalescing_analysis        > do_coalescing_analysis: end
 TRACE regalloc::bt_coalescing_analysis        >

view this post on Zulip Wasmtime GitHub notifications bot (Jan 06 2022 at 18:23):

cfallin commented on issue #3653:

Hmm, interesting. My understanding is that regalloc.rs computes equivalence classes to use in a strictly heuristic way during allocation (of virtual-reg ranges to registers, and then separately to spillslots). former (mainly in this file) traverses equivalence classes looking for register hints (move to/from fixed reg), and to avoid some evictions. But two vregs existing in the same eclass does not guarantee that the move between them will be elided; in fact, one can construct inputs where there is not possible, as liveranges in an eclass can overlap. Elements in an eclass are joined by an "equivalence" that's kind of misnamed, I guess: it just means that at one point, one of the vregs was assigned to the other. Consider e.g. a loop where two vregs are swapped every iteration (loop(x, y): br loop(y, x)). So in this case, for whatever reason, the spill weights and processing order mean that the two sides of the move get allocated to different registers, and then that's it; the heuristics weren't quite good enough.

Zooming out a bit, I would expect that once in a while, we get perturbations like this as ISLE's SSA lowering works just a bit differently. As long as we don't see regressions overall, I think this is fine to accept.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 06 2022 at 18:23):

cfallin edited a comment on issue #3653:

Hmm, interesting. My understanding is that regalloc.rs computes equivalence classes to use in a strictly heuristic way during allocation (of virtual-reg ranges to registers, and then separately to spillslots). The former (mainly in this file) traverses equivalence classes looking for register hints (move to/from fixed reg), and to avoid some evictions. But two vregs existing in the same eclass does not guarantee that the move between them will be elided; in fact, one can construct inputs where there is not possible, as liveranges in an eclass can overlap. Elements in an eclass are joined by an "equivalence" that's kind of misnamed, I guess: it just means that at one point, one of the vregs was assigned to the other. Consider e.g. a loop where two vregs are swapped every iteration (loop(x, y): br loop(y, x)). So in this case, for whatever reason, the spill weights and processing order mean that the two sides of the move get allocated to different registers, and then that's it; the heuristics weren't quite good enough.

Zooming out a bit, I would expect that once in a while, we get perturbations like this as ISLE's SSA lowering works just a bit differently. As long as we don't see regressions overall, I think this is fine to accept.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 06 2022 at 18:24):

cfallin edited a comment on issue #3653:

Hmm, interesting. My understanding is that regalloc.rs computes equivalence classes to use in a strictly heuristic way during allocation (of virtual-reg ranges to registers, and then separately to spillslots). The former (mainly in this file) traverses equivalence classes looking for register hints (move to/from fixed reg), and to avoid some evictions. But two vregs existing in the same eclass does not guarantee that the move between them will be elided; in fact, one can construct inputs where this is not possible, as liveranges in an eclass can overlap. Elements in an eclass are joined by an "equivalence" that's kind of misnamed, I guess: it just means that at one point, one of the vregs was assigned to the other. Consider e.g. a loop where two vregs are swapped every iteration (loop(x, y): br loop(y, x)). So in this case, for whatever reason, the spill weights and processing order mean that the two sides of the move get allocated to different registers, and then that's it; the heuristics weren't quite good enough.

Zooming out a bit, I would expect that once in a while, we get perturbations like this as ISLE's SSA lowering works just a bit differently. As long as we don't see regressions overall, I think this is fine to accept.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 06 2022 at 18:33):

fitzgen commented on issue #3653:

Swapped the operands to the two commutative instructions (which I noticed was the only difference between the pre-regalloc vcode on main vs this branch) and the moves went away.

Yay :grimacing:

view this post on Zulip Wasmtime GitHub notifications bot (Jan 06 2022 at 18:34):

fitzgen commented on issue #3653:

Zooming out a bit, I would expect that once in a while, we get perturbations like this as ISLE's SSA lowering works just a bit differently. As long as we don't see regressions overall, I think this is fine to accept.

But yeah, agreed on this point :+1:

view this post on Zulip Wasmtime GitHub notifications bot (Jan 06 2022 at 19:18):

github-actions[bot] commented on issue #3653:

Subscribe to Label Action

cc @peterhuene

<details>
This issue or pull request has been labeled: "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>


Last updated: Oct 23 2024 at 20:03 UTC