cfallin opened Issue #2340:
In
MachInst
-based backends, instruction selection works by pattern-matching on a root opcode and potentially the opcodes that produce its inputs (the "operand tree"). Sometimes, a particular machine backend is able to pattern-match several opcodes in a certain arrangement and generate one machine instruction that covers them all (e.g.: multiply-add).Some machine architectures have the ability to load one operand of an operation from memory. For example, on x86, we can do
add rax, [rbx]
, which is a combination load and add (in that order).Unfortunately, because of a (sound but conservative) design in our pattern-matching system, we cannot match patterns where the load is an operand rather than the root, because we do not allow operations that have side-effects or touch memory to move in the program. This is because reasoning about when it is safe to do so is complex. (Instruction coloring is the framework that does this reasoning, but because the color changes after the load or side-effecting op, its own color differs from that of every program point below it, so it will never merge.)
Consider the following test-case as an interesting example:
block1(v0: i32, v1: i32): v2 = sload.i32 v0 v3 = sload.i32 v1 v4 = iadd.i32 v2, v0 v5 = iadd.i32 v3, v1
We want this to lower to two add-from-memory instructions on x86. Working bottom-up (as VCode lowering does), we first see
v5
and perhaps reason that we can sink the loadv3
because it does not move across any other memory ops. We would also like to do this forv2
intov4
, and in this example we can, but only after we sinkv3
; the load producingv2
moves across wherev3
used to be. Hence, a naïve coloring scheme won't necessarily work.A solution that @julian-seward1 and I have converged on involves numbering only side-effecting ops with colors (or sequence numbers), and while scanning backward through a basic block, tracking the "current sequence number", or in other words, the next number whose op we expect to see. We permit a merge of any load with this sequence number if (i) it's in the same BB and (ii) this use is its only use; then we decrement the sequence number.
This implements the very simple rule that a load or store never moves over another load or store; but as we sink one, we create space for the one above it to sink, as well.
(Later we might want to extend this with multiple sequence numbers, one per disjoint unit of state, sort of in the spirit of a vector clock; but details there are TBD.)
cfallin commented on Issue #2340:
cc @akirilov-arm and @abrown: this should enable us to properly handle the load-merging that
LoadSplat
addresses in #2278.
cfallin assigned Issue #2340:
In
MachInst
-based backends, instruction selection works by pattern-matching on a root opcode and potentially the opcodes that produce its inputs (the "operand tree"). Sometimes, a particular machine backend is able to pattern-match several opcodes in a certain arrangement and generate one machine instruction that covers them all (e.g.: multiply-add).Some machine architectures have the ability to load one operand of an operation from memory. For example, on x86, we can do
add rax, [rbx]
, which is a combination load and add (in that order).Unfortunately, because of a (sound but conservative) design in our pattern-matching system, we cannot match patterns where the load is an operand rather than the root, because we do not allow operations that have side-effects or touch memory to move in the program. This is because reasoning about when it is safe to do so is complex. (Instruction coloring is the framework that does this reasoning, but because the color changes after the load or side-effecting op, its own color differs from that of every program point below it, so it will never merge.)
Consider the following test-case as an interesting example:
block1(v0: i32, v1: i32): v2 = sload.i32 v0 v3 = sload.i32 v1 v4 = iadd.i32 v2, v0 v5 = iadd.i32 v3, v1
We want this to lower to two add-from-memory instructions on x86. Working bottom-up (as VCode lowering does), we first see
v5
and perhaps reason that we can sink the loadv3
because it does not move across any other memory ops. We would also like to do this forv2
intov4
, and in this example we can, but only after we sinkv3
; the load producingv2
moves across wherev3
used to be. Hence, a naïve coloring scheme won't necessarily work.A solution that @julian-seward1 and I have converged on involves numbering only side-effecting ops with colors (or sequence numbers), and while scanning backward through a basic block, tracking the "current sequence number", or in other words, the next number whose op we expect to see. We permit a merge of any load with this sequence number if (i) it's in the same BB and (ii) this use is its only use; then we decrement the sequence number.
This implements the very simple rule that a load or store never moves over another load or store; but as we sink one, we create space for the one above it to sink, as well.
(Later we might want to extend this with multiple sequence numbers, one per disjoint unit of state, sort of in the spirit of a vector clock; but details there are TBD.)
cfallin labeled Issue #2340 (assigned to cfallin):
In
MachInst
-based backends, instruction selection works by pattern-matching on a root opcode and potentially the opcodes that produce its inputs (the "operand tree"). Sometimes, a particular machine backend is able to pattern-match several opcodes in a certain arrangement and generate one machine instruction that covers them all (e.g.: multiply-add).Some machine architectures have the ability to load one operand of an operation from memory. For example, on x86, we can do
add rax, [rbx]
, which is a combination load and add (in that order).Unfortunately, because of a (sound but conservative) design in our pattern-matching system, we cannot match patterns where the load is an operand rather than the root, because we do not allow operations that have side-effects or touch memory to move in the program. This is because reasoning about when it is safe to do so is complex. (Instruction coloring is the framework that does this reasoning, but because the color changes after the load or side-effecting op, its own color differs from that of every program point below it, so it will never merge.)
Consider the following test-case as an interesting example:
block1(v0: i32, v1: i32): v2 = sload.i32 v0 v3 = sload.i32 v1 v4 = iadd.i32 v2, v0 v5 = iadd.i32 v3, v1
We want this to lower to two add-from-memory instructions on x86. Working bottom-up (as VCode lowering does), we first see
v5
and perhaps reason that we can sink the loadv3
because it does not move across any other memory ops. We would also like to do this forv2
intov4
, and in this example we can, but only after we sinkv3
; the load producingv2
moves across wherev3
used to be. Hence, a naïve coloring scheme won't necessarily work.A solution that @julian-seward1 and I have converged on involves numbering only side-effecting ops with colors (or sequence numbers), and while scanning backward through a basic block, tracking the "current sequence number", or in other words, the next number whose op we expect to see. We permit a merge of any load with this sequence number if (i) it's in the same BB and (ii) this use is its only use; then we decrement the sequence number.
This implements the very simple rule that a load or store never moves over another load or store; but as we sink one, we create space for the one above it to sink, as well.
(Later we might want to extend this with multiple sequence numbers, one per disjoint unit of state, sort of in the spirit of a vector clock; but details there are TBD.)
cfallin closed Issue #2340 (assigned to cfallin):
In
MachInst
-based backends, instruction selection works by pattern-matching on a root opcode and potentially the opcodes that produce its inputs (the "operand tree"). Sometimes, a particular machine backend is able to pattern-match several opcodes in a certain arrangement and generate one machine instruction that covers them all (e.g.: multiply-add).Some machine architectures have the ability to load one operand of an operation from memory. For example, on x86, we can do
add rax, [rbx]
, which is a combination load and add (in that order).Unfortunately, because of a (sound but conservative) design in our pattern-matching system, we cannot match patterns where the load is an operand rather than the root, because we do not allow operations that have side-effects or touch memory to move in the program. This is because reasoning about when it is safe to do so is complex. (Instruction coloring is the framework that does this reasoning, but because the color changes after the load or side-effecting op, its own color differs from that of every program point below it, so it will never merge.)
Consider the following test-case as an interesting example:
block1(v0: i32, v1: i32): v2 = sload.i32 v0 v3 = sload.i32 v1 v4 = iadd.i32 v2, v0 v5 = iadd.i32 v3, v1
We want this to lower to two add-from-memory instructions on x86. Working bottom-up (as VCode lowering does), we first see
v5
and perhaps reason that we can sink the loadv3
because it does not move across any other memory ops. We would also like to do this forv2
intov4
, and in this example we can, but only after we sinkv3
; the load producingv2
moves across wherev3
used to be. Hence, a naïve coloring scheme won't necessarily work.A solution that @julian-seward1 and I have converged on involves numbering only side-effecting ops with colors (or sequence numbers), and while scanning backward through a basic block, tracking the "current sequence number", or in other words, the next number whose op we expect to see. We permit a merge of any load with this sequence number if (i) it's in the same BB and (ii) this use is its only use; then we decrement the sequence number.
This implements the very simple rule that a load or store never moves over another load or store; but as we sink one, we create space for the one above it to sink, as well.
(Later we might want to extend this with multiple sequence numbers, one per disjoint unit of state, sort of in the spirit of a vector clock; but details there are TBD.)
Last updated: Nov 22 2024 at 17:03 UTC