github-actions[bot] commented on issue #3653:
Subscribe to Label Action
cc @cfallin, @fitzgen
<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "isle"Thus the following users have been cc'd because of the following labels:
- cfallin: isle
- fitzgen: isle
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
fitzgen commented on issue #3653:
we are generating some unnecessary
movdqa
s nowIt looks like the initial
movdqa
is being inserted by move mitosis, but that the regalloc isn't coalescing and eliding the move:TRACE cranelift_codegen::isa::x64::inst > mov_mitosis(pand %v1V, %v5V) TRACE cranelift_codegen::isa::x64::inst > -> movdqa %v0V, %v5V TRACE cranelift_codegen::isa::x64::inst > -> pand %v1V, %v5V ``` @cfallin any idea why regalloc might not be able to coalesce/elide this move? (See OP for full code) ~~~
fitzgen commented on issue #3653:
It looks like regalloc correctly determines that these virtual registers (
v0V
andv5V
) are in the same e-class / are connected by moves (and therefore could be coalesced?) but isn't actually coalescing them:TRACE cranelift_codegen::machinst::compile > vcode from lowering: VCode_ShowWithRRU {{ Entry block: 0 Block 0: (original IR block: block0) (instruction range: 0 .. 12) Inst 0: movdqa %xmm0, %v0V Inst 1: movdqa %xmm1, %v1V Inst 2: movdqa %xmm2, %v2V Inst 3: movdqa %v0V, %v5V Inst 4: pand %v1V, %v5V Inst 5: movdqa %v0V, %v6V Inst 6: pandn %v2V, %v6V Inst 7: movdqa %v5V, %v3V Inst 8: por %v6V, %v3V Inst 9: movdqa %v3V, %v4V Inst 10: movdqa %v4V, %xmm0 Inst 11: ret }} ... TRACE regalloc::bt_coalescing_analysis > TRACE regalloc::bt_coalescing_analysis > do_coalescing_analysis: begin TRACE regalloc::bt_coalescing_analysis > connected by moves: i0 v0V <- r0V (est_freq 1) TRACE regalloc::bt_coalescing_analysis > connected by moves: i1 v1V <- r1V (est_freq 1) TRACE regalloc::bt_coalescing_analysis > connected by moves: i2 v2V <- r2V (est_freq 1) TRACE regalloc::bt_coalescing_analysis > connected by moves: i3 v5V <- v0V (est_freq 1) TRACE regalloc::bt_coalescing_analysis > connected by moves: i5 v6V <- v0V (est_freq 1) TRACE regalloc::bt_coalescing_analysis > reduce cost of vr0 and vr6 TRACE regalloc::bt_coalescing_analysis > connected by moves: i7 v3V <- v5V (est_freq 1) TRACE regalloc::bt_coalescing_analysis > reduce cost of vr5 and vr3 TRACE regalloc::bt_coalescing_analysis > connected by moves: i9 v4V <- v3V (est_freq 1) TRACE regalloc::bt_coalescing_analysis > reduce cost of vr3 and vr4 TRACE regalloc::bt_coalescing_analysis > connected by moves: i10 r0V <- v4V (est_freq 1) TRACE regalloc::bt_coalescing_analysis > Revised VLRs: TRACE regalloc::bt_coalescing_analysis > vr0 (VR: v0V, sz=6, tc=2, sc=0.333, [(RF: i0.d-i5.u)]) TRACE regalloc::bt_coalescing_analysis > vr1 (VR: v1V, sz=4, tc=2, sc=0.500, [(RF: i1.d-i4.u)]) TRACE regalloc::bt_coalescing_analysis > vr2 (VR: v2V, sz=5, tc=2, sc=0.400, [(RF: i2.d-i6.u)]) TRACE regalloc::bt_coalescing_analysis > vr3 (VR: v3V, sz=3, tc=2, sc=0.667, [(RF: i7.d-i9.u)]) TRACE regalloc::bt_coalescing_analysis > vr4 (VR: v4V, sz=2, tc=1, sc=0.500, [(RF: i9.d-i10.u)]) TRACE regalloc::bt_coalescing_analysis > vr5 (VR: v5V, sz=5, tc=3, sc=0.600, [(RF: i3.d-i7.u)]) TRACE regalloc::bt_coalescing_analysis > vr6 (VR: v6V, sz=4, tc=3, sc=0.750, [(RF: i5.d-i8.u)]) TRACE regalloc::bt_coalescing_analysis > Coalescing hints: TRACE regalloc::bt_coalescing_analysis > hintsfor vr0 = (Exactly %xmm0, weight=1) (SameAs vr6, weight=1) TRACE regalloc::bt_coalescing_analysis > hintsfor vr1 = (Exactly %xmm1, weight=1) TRACE regalloc::bt_coalescing_analysis > hintsfor vr2 = (Exactly %xmm2, weight=1) TRACE regalloc::bt_coalescing_analysis > hintsfor vr3 = (SameAs vr5, weight=1) (SameAs vr4, weight=1) TRACE regalloc::bt_coalescing_analysis > hintsfor vr4 = (SameAs vr3, weight=1) (Exactly %xmm0, weight=1) TRACE regalloc::bt_coalescing_analysis > hintsfor vr5 = (SameAs vr3, weight=1) TRACE regalloc::bt_coalescing_analysis > hintsfor vr6 = (SameAs vr0, weight=1) TRACE regalloc::bt_coalescing_analysis > eclassof vr0 = [vr6, vr0] TRACE regalloc::bt_coalescing_analysis > eclassof vr1 = [vr1] TRACE regalloc::bt_coalescing_analysis > eclassof vr2 = [vr2] TRACE regalloc::bt_coalescing_analysis > eclassof vr3 = [vr4, vr5, vr3] TRACE regalloc::bt_coalescing_analysis > eclassof vr4 = [vr4, vr5, vr3] TRACE regalloc::bt_coalescing_analysis > eclassof vr5 = [vr4, vr5, vr3] TRACE regalloc::bt_coalescing_analysis > eclassof vr6 = [vr6, vr0] TRACE regalloc::bt_coalescing_analysis > vv_boundary_move at i5 TRACE regalloc::bt_coalescing_analysis > vv_boundary_move at i7 TRACE regalloc::bt_coalescing_analysis > vv_boundary_move at i9 TRACE regalloc::bt_coalescing_analysis > do_coalescing_analysis: end TRACE regalloc::bt_coalescing_analysis >
fitzgen edited a comment on issue #3653:
It looks like regalloc correctly determines that these virtual registers (
v0V
andv5V
) are in the same e-class / are connected by moves (and therefore could be coalesced?) but isn't actually coalescing them:TRACE cranelift_codegen::machinst::compile > vcode from lowering: VCode_ShowWithRRU {{ Entry block: 0 Block 0: (original IR block: block0) (instruction range: 0 .. 12) Inst 0: load_const VCodeConstant(0), %v0V Inst 1: load_const VCodeConstant(0), %v1V Inst 2: load_const VCodeConstant(0), %v2V Inst 3: movdqa %v0V, %v5V Inst 4: pand %v1V, %v5V Inst 5: movdqa %v0V, %v6V Inst 6: pandn %v2V, %v6V Inst 7: movdqa %v5V, %v3V Inst 8: por %v6V, %v3V Inst 9: movdqa %v3V, %v4V Inst 10: movdqa %v4V, %xmm0 Inst 11: ret }} ... TRACE regalloc::bt_coalescing_analysis > TRACE regalloc::bt_coalescing_analysis > do_coalescing_analysis: begin TRACE regalloc::bt_coalescing_analysis > connected by moves: i3 v5V <- v0V (est_freq 1) TRACE regalloc::bt_coalescing_analysis > connected by moves: i5 v6V <- v0V (est_freq 1) TRACE regalloc::bt_coalescing_analysis > reduce cost of vr0 and vr6 TRACE regalloc::bt_coalescing_analysis > connected by moves: i7 v3V <- v5V (est_freq 1) TRACE regalloc::bt_coalescing_analysis > reduce cost of vr5 and vr3 TRACE regalloc::bt_coalescing_analysis > connected by moves: i9 v4V <- v3V (est_freq 1) TRACE regalloc::bt_coalescing_analysis > reduce cost of vr3 and vr4 TRACE regalloc::bt_coalescing_analysis > connected by moves: i10 r0V <- v4V (est_freq 1) TRACE regalloc::bt_coalescing_analysis > Revised VLRs: TRACE regalloc::bt_coalescing_analysis > vr0 (VR: v0V, sz=6, tc=2, sc=0.333, [(RF: i0.d-i5.u)]) TRACE regalloc::bt_coalescing_analysis > vr1 (VR: v1V, sz=4, tc=2, sc=0.500, [(RF: i1.d-i4.u)]) TRACE regalloc::bt_coalescing_analysis > vr2 (VR: v2V, sz=5, tc=2, sc=0.400, [(RF: i2.d-i6.u)]) TRACE regalloc::bt_coalescing_analysis > vr3 (VR: v3V, sz=3, tc=2, sc=0.667, [(RF: i7.d-i9.u)]) TRACE regalloc::bt_coalescing_analysis > vr4 (VR: v4V, sz=2, tc=1, sc=0.500, [(RF: i9.d-i10.u)]) TRACE regalloc::bt_coalescing_analysis > vr5 (VR: v5V, sz=5, tc=3, sc=0.600, [(RF: i3.d-i7.u)]) TRACE regalloc::bt_coalescing_analysis > vr6 (VR: v6V, sz=4, tc=3, sc=0.750, [(RF: i5.d-i8.u)]) TRACE regalloc::bt_coalescing_analysis > Coalescing hints: TRACE regalloc::bt_coalescing_analysis > hintsfor vr0 = (SameAs vr6, weight=1) TRACE regalloc::bt_coalescing_analysis > hintsfor vr1 = TRACE regalloc::bt_coalescing_analysis > hintsfor vr2 = TRACE regalloc::bt_coalescing_analysis > hintsfor vr3 = (SameAs vr5, weight=1) (SameAs vr4, weight=1) TRACE regalloc::bt_coalescing_analysis > hintsfor vr4 = (SameAs vr3, weight=1) (Exactly %xmm0, weight=1) TRACE regalloc::bt_coalescing_analysis > hintsfor vr5 = (SameAs vr3, weight=1) TRACE regalloc::bt_coalescing_analysis > hintsfor vr6 = (SameAs vr0, weight=1) TRACE regalloc::bt_coalescing_analysis > eclassof vr0 = [vr6, vr0] TRACE regalloc::bt_coalescing_analysis > eclassof vr1 = [vr1] TRACE regalloc::bt_coalescing_analysis > eclassof vr2 = [vr2] TRACE regalloc::bt_coalescing_analysis > eclassof vr3 = [vr4, vr5, vr3] TRACE regalloc::bt_coalescing_analysis > eclassof vr4 = [vr4, vr5, vr3] TRACE regalloc::bt_coalescing_analysis > eclassof vr5 = [vr4, vr5, vr3] TRACE regalloc::bt_coalescing_analysis > eclassof vr6 = [vr6, vr0] TRACE regalloc::bt_coalescing_analysis > vv_boundary_move at i5 TRACE regalloc::bt_coalescing_analysis > vv_boundary_move at i7 TRACE regalloc::bt_coalescing_analysis > vv_boundary_move at i9 TRACE regalloc::bt_coalescing_analysis > do_coalescing_analysis: end TRACE regalloc::bt_coalescing_analysis >
cfallin commented on issue #3653:
Hmm, interesting. My understanding is that regalloc.rs computes equivalence classes to use in a strictly heuristic way during allocation (of virtual-reg ranges to registers, and then separately to spillslots). former (mainly in this file) traverses equivalence classes looking for register hints (move to/from fixed reg), and to avoid some evictions. But two vregs existing in the same eclass does not guarantee that the move between them will be elided; in fact, one can construct inputs where there is not possible, as liveranges in an eclass can overlap. Elements in an eclass are joined by an "equivalence" that's kind of misnamed, I guess: it just means that at one point, one of the vregs was assigned to the other. Consider e.g. a loop where two vregs are swapped every iteration (
loop(x, y): br loop(y, x)
). So in this case, for whatever reason, the spill weights and processing order mean that the two sides of the move get allocated to different registers, and then that's it; the heuristics weren't quite good enough.Zooming out a bit, I would expect that once in a while, we get perturbations like this as ISLE's SSA lowering works just a bit differently. As long as we don't see regressions overall, I think this is fine to accept.
cfallin edited a comment on issue #3653:
Hmm, interesting. My understanding is that regalloc.rs computes equivalence classes to use in a strictly heuristic way during allocation (of virtual-reg ranges to registers, and then separately to spillslots). The former (mainly in this file) traverses equivalence classes looking for register hints (move to/from fixed reg), and to avoid some evictions. But two vregs existing in the same eclass does not guarantee that the move between them will be elided; in fact, one can construct inputs where there is not possible, as liveranges in an eclass can overlap. Elements in an eclass are joined by an "equivalence" that's kind of misnamed, I guess: it just means that at one point, one of the vregs was assigned to the other. Consider e.g. a loop where two vregs are swapped every iteration (
loop(x, y): br loop(y, x)
). So in this case, for whatever reason, the spill weights and processing order mean that the two sides of the move get allocated to different registers, and then that's it; the heuristics weren't quite good enough.Zooming out a bit, I would expect that once in a while, we get perturbations like this as ISLE's SSA lowering works just a bit differently. As long as we don't see regressions overall, I think this is fine to accept.
cfallin edited a comment on issue #3653:
Hmm, interesting. My understanding is that regalloc.rs computes equivalence classes to use in a strictly heuristic way during allocation (of virtual-reg ranges to registers, and then separately to spillslots). The former (mainly in this file) traverses equivalence classes looking for register hints (move to/from fixed reg), and to avoid some evictions. But two vregs existing in the same eclass does not guarantee that the move between them will be elided; in fact, one can construct inputs where this is not possible, as liveranges in an eclass can overlap. Elements in an eclass are joined by an "equivalence" that's kind of misnamed, I guess: it just means that at one point, one of the vregs was assigned to the other. Consider e.g. a loop where two vregs are swapped every iteration (
loop(x, y): br loop(y, x)
). So in this case, for whatever reason, the spill weights and processing order mean that the two sides of the move get allocated to different registers, and then that's it; the heuristics weren't quite good enough.Zooming out a bit, I would expect that once in a while, we get perturbations like this as ISLE's SSA lowering works just a bit differently. As long as we don't see regressions overall, I think this is fine to accept.
fitzgen commented on issue #3653:
Swapped the operands to the two commutative instructions (which I noticed was the only difference between the pre-regalloc vcode on
main
vs this branch) and the moves went away.Yay :grimacing:
fitzgen commented on issue #3653:
Zooming out a bit, I would expect that once in a while, we get perturbations like this as ISLE's SSA lowering works just a bit differently. As long as we don't see regressions overall, I think this is fine to accept.
But yeah, agreed on this point :+1:
github-actions[bot] commented on issue #3653:
Subscribe to Label Action
cc @peterhuene
<details>
This issue or pull request has been labeled: "wasmtime:api"Thus the following users have been cc'd because of the following labels:
- peterhuene: wasmtime:api
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
Last updated: Nov 22 2024 at 16:03 UTC