elliottt commented on issue #5870:
Thank you, I was just trying to decide what to do about this for #5850.
Have you double-checked the precise-output tests for these backends to make sure that each of these cases is covered? I suspect they are all covered but it'd be nice to make sure while we're thinking about it.
I had a quick look through the tests I was removing, and @cfallin is right about adding generic cold-block tests. I'll add one to each backend before merging this :+1:
jameysharp commented on issue #5870:
Huh. On all backends, this new test produces the same lowered block order, with or without the
cold
annotation.In fact as best I can tell, the only difference in generated code between the two cases is that if the
cold
annotation is used, all backends insert an unreachable "jump" instruction after the "return" and before the cold block. That seems like a bug?But I think it would also help to have a test which actually produces a different lowering order with the annotation than without.
elliottt commented on issue #5870:
Huh. On all backends, this new test produces the same lowered block order, with or without the
cold
annotation.In fact as best I can tell, the only difference in generated code between the two cases is that if the
cold
annotation is used, all backends insert an unreachable "jump" instruction after the "return" and before the cold block. That seems like a bug?But I think it would also help to have a test which actually produces a different lowering order with the annotation than without.
Good call. I've replaced the one I ported from the x64 test with one that is simpler, and easier to show the block order changing with.
elliottt commented on issue #5870:
In fact as best I can tell, the only difference in generated code between the two cases is that if the
cold
annotation is used, all backends insert an unreachable "jump" instruction after the "return" and before the cold block. That seems like a bug?It's unreachable code, that you can see really well when comparing the VCode and disassembly:
; VCode: ; Disassembled: ; pushq %rbp ; block0: ; offset 0x0 ; movq %rsp, %rbp ; pushq %rbp ; block0: ; movq %rsp, %rbp ; movq %rdi, %r11 ; block1: ; offset 0x4 ; addl %r11d, $4660, %r11d ; movq %rdi, %r11 ; testl %r11d, %r11d ; addl $0x1234, %r11d ; jnz label1; j label2 ; testl %r11d, %r11d ; block1: ; je 0x39 ; movq %r11, %r8 ; block2: ; offset 0x17 ; jmp label3 ; movq %r11, %r8 ; block3: ; block3: ; offset 0x1a ; movq %r11, %rax ; movq %r11, %rax ; subl %eax, $4660, %eax ; subl $0x1234, %eax ; addl %eax, %r8d, %eax ; addl %r8d, %eax ; testl %r11d, %r11d ; testl %r11d, %r11d ; jnz label4; j label5 ; jne 0x39 ; block5: ; block4: ; offset 0x2f ; movq %rbp, %rsp ; movq %rbp, %rsp ; popq %rbp ; popq %rbp ; ret ; retq ; block8: ; block5: ; offset 0x34 ; jmp label3 ; jmp 0x1a ; block2: ; block6: ; offset 0x39 ; jmp label6 ; movq %r11, %r8 ; block4: ; addl $0x1234, %r8d ; jmp label6 ; testl %r8d, %r8d ; block6: ; je 0x1a ; movq %r11, %r8 ; block7: ; offset 0x4c ; addl %r8d, $4660, %r8d ; jmp 0x39 ; testl %r8d, %r8d ; jnz label7; j label8 ; block7: ; jmp label6
I think we could fix this pretty easily in the machinst buffer, but we might need to track a little bit of additional state to know when blocks are unreachable.
elliottt edited a comment on issue #5870:
In fact as best I can tell, the only difference in generated code between the two cases is that if the
cold
annotation is used, all backends insert an unreachable "jump" instruction after the "return" and before the cold block. That seems like a bug?It's unreachable code, that you can see really well when comparing the VCode and disassembly. VCode block8 is disassembled block5, and the unconditional jump in block6 in the VCode to block8 got inlined, the original condition got inverted, and the final jump went back to
block6
again.; VCode: ; Disassembled: ; pushq %rbp ; block0: ; offset 0x0 ; movq %rsp, %rbp ; pushq %rbp ; block0: ; movq %rsp, %rbp ; movq %rdi, %r11 ; block1: ; offset 0x4 ; addl %r11d, $4660, %r11d ; movq %rdi, %r11 ; testl %r11d, %r11d ; addl $0x1234, %r11d ; jnz label1; j label2 ; testl %r11d, %r11d ; block1: ; je 0x39 ; movq %r11, %r8 ; block2: ; offset 0x17 ; jmp label3 ; movq %r11, %r8 ; block3: ; block3: ; offset 0x1a ; movq %r11, %rax ; movq %r11, %rax ; subl %eax, $4660, %eax ; subl $0x1234, %eax ; addl %eax, %r8d, %eax ; addl %r8d, %eax ; testl %r11d, %r11d ; testl %r11d, %r11d ; jnz label4; j label5 ; jne 0x39 ; block5: ; block4: ; offset 0x2f ; movq %rbp, %rsp ; movq %rbp, %rsp ; popq %rbp ; popq %rbp ; ret ; retq ; block8: ; block5: ; offset 0x34 ; jmp label3 ; jmp 0x1a ; block2: ; block6: ; offset 0x39 ; jmp label6 ; movq %r11, %r8 ; block4: ; addl $0x1234, %r8d ; jmp label6 ; testl %r8d, %r8d ; block6: ; je 0x1a ; movq %r11, %r8 ; block7: ; offset 0x4c ; addl %r8d, $4660, %r8d ; jmp 0x39 ; testl %r8d, %r8d ; jnz label7; j label8 ; block7: ; jmp label6
I think we could fix this pretty easily in the machinst buffer, but we might need to track a little bit of additional state to know when blocks are unreachable.
jameysharp commented on issue #5870:
Yes, this version of the cold-block tests looks great! I think "if-then triangle" CFG is perfect for this.
cfallin commented on issue #5870:
I think we could fix this pretty easily in the machinst buffer, but we might need to track a little bit of additional state to know when blocks are unreachable.
To add a little more detail to that: removing the dead block at the MachBuffer level would require the MachBuffer to know about instructions other than branches (here, a return instruction that makes the following block unreachable); in the original MachBuffer design that was an explicit anti-goal, for simplicity.
Now that we're at more of a "polish the mature code" stage of development, I think it'd be totally reasonable to give the MachBuffer a notion of explicit "informed unreachability": the emitter could invoke a method like
.mark_unreachable()
after a return, similar in spirit to how it informs the MachBuffer about branches.The tricky part is weaving that into the existing mechanism, and updating the correctness proof accordingly. It probably implies a new kind of record in the "latest branch" list. Unreachability is currently implicitly recognized as a property that follows an unconditional branch instead.
All that said, I don't think this is the highest-priority change: these blocks could impact icache pressure by increasing code size a little, but are dead, i.e. not actually executed. Still, the above is the place to start if someone wants to tackle it!
Last updated: Nov 22 2024 at 16:03 UTC