cfallin opened PR #2366 from load-isel
to main
:
This PR updates the "coloring" scheme that accounts for side-effects in
the MachInst lowering logic. As a result, the new backends will now be
able to merge effectful operations (such as memory loads) into other
operations; previously, only the other way (pure ops merged into
effectful ops) was possible. This will allow, for example, a load+ALU-op
combination, as is common on x86. It should even allow a load + ALU-op +
store sequence to merge into one lowered instruction.The scheme arose from many fruitful discussions with @julian-seward1
(thanks!); significant credit is due to him for the insights here.The first insight is that given the right basic conditions, i.e. that
the root instruction is the only use of an effectful instruction's
result, all we need is that the "color" of the effectful instruction is
one less than the color of the current instruction. It's easier to
think about colors on the program points between instructions: if the
color coming out of the first (effectful def) instruction and in to
the second (effectful or effect-free use) instruction are the same, then
they can merge. Basically the color denotes a version of global state;
if the same, then no other effectful ops happened in the meantime.The second insight is that we can keep state as we scan, tracking the
"current color", and update this when we sink (merge) an op. Hence
when we sink a load into another op, we effectively re-color every
instruction it moved over; this may allow further sinks.Consider the example (and assume that we consider loads effectful in
order to conservatively ensure a strong memory model; otherwise, replace
with other effectful value-producing insts):v0 = load x v1 = load y v2 = add v0, 1 v3 = add v1, 1
Scanning from bottom to top, we first see the add producing
v3
and we
can sink the load producingv1
into it, producing a load + ALU-op
machine instruction. This is legal becausev1
moves over onlyv2
,
which is a pure instruction. Consider, though,v2
: under a simple
scheme that has no other context,v0
could not sink tov2
because it
would move overv1
, another load. But because we already sunkv1
down tov3
, we are free to sinkv0
tov2
; the update of the
"current color" during the scan allows this.This PR also cleans up the
LowerCtx
interface a bit at the same time:
whereas previously it always gave some subset of (constant, mergeable
inst, register) directly fromLowerCtx::get_input()
, it now returns
zero or more of (constant, mergable inst) from
LowerCtx::maybe_get_input_as_source_or_const()
, and returns the
register only fromLowerCtx::put_input_in_reg()
. This removes the need
to explicitly denote uses of the register, so it's a little safer.Note that this PR does not actually make use of the new ability to merge
loads into other ops; that will come in future PRs, especially to
optimize thex64
backend by using direct-memory operands.Testing: existing filetests ensure the DCE-while-lowering and existing
merging pattern-matches continue to work; merging of effectful ops
will be verified in a subsequent PR with load+op pattern-matching.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin requested julian-seward1 for a review on PR #2366.
cfallin edited PR #2366 from load-isel
to main
:
This PR updates the "coloring" scheme that accounts for side-effects in
the MachInst lowering logic. As a result, the new backends will now be
able to merge effectful operations (such as memory loads) into other
operations; previously, only the other way (pure ops merged into
effectful ops) was possible. This will allow, for example, a load+ALU-op
combination, as is common on x86. It should even allow a load + ALU-op +
store sequence to merge into one lowered instruction.The scheme arose from many fruitful discussions with @julian-seward1
(thanks!); significant credit is due to him for the insights here.The first insight is that given the right basic conditions, i.e. that
the root instruction is the only use of an effectful instruction's
result, all we need is that the "color" of the effectful instruction is
one less than the color of the current instruction. It's easier to
think about colors on the program points between instructions: if the
color coming out of the first (effectful def) instruction and in to
the second (effectful or effect-free use) instruction are the same, then
they can merge. Basically the color denotes a version of global state;
if the same, then no other effectful ops happened in the meantime.The second insight is that we can keep state as we scan, tracking the
"current color", and update this when we sink (merge) an op. Hence
when we sink a load into another op, we effectively re-color every
instruction it moved over; this may allow further sinks.Consider the example (and assume that we consider loads effectful in
order to conservatively ensure a strong memory model; otherwise, replace
with other effectful value-producing insts):v0 = load x v1 = load y v2 = add v0, 1 v3 = add v1, 1
Scanning from bottom to top, we first see the add producing
v3
and we
can sink the load producingv1
into it, producing a load + ALU-op
machine instruction. This is legal becausev1
moves over onlyv2
,
which is a pure instruction. Consider, though,v2
: under a simple
scheme that has no other context,v0
could not sink tov2
because it
would move overv1
, another load. But because we already sunkv1
down tov3
, we are free to sinkv0
tov2
; the update of the
"current color" during the scan allows this.This PR also cleans up the
LowerCtx
interface a bit at the same time:
whereas previously it always gave some subset of (constant, mergeable
inst, register) directly fromLowerCtx::get_input()
, it now returns
zero or more of (constant, mergable inst) from
LowerCtx::maybe_get_input_as_source_or_const()
, and returns the
register only fromLowerCtx::put_input_in_reg()
. This removes the need
to explicitly denote uses of the register, so it's a little safer.Note that this PR does not actually make use of the new ability to merge
loads into other ops; that will come in future PRs, especially to
optimize thex64
backend by using direct-memory operands.Testing: existing filetests ensure the DCE-while-lowering and existing
merging pattern-matches continue to work; merging of effectful ops
will be verified in a subsequent PR with load+op pattern-matching.Fixes #2366
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin edited PR #2366 from load-isel
to main
:
This PR updates the "coloring" scheme that accounts for side-effects in
the MachInst lowering logic. As a result, the new backends will now be
able to merge effectful operations (such as memory loads) into other
operations; previously, only the other way (pure ops merged into
effectful ops) was possible. This will allow, for example, a load+ALU-op
combination, as is common on x86. It should even allow a load + ALU-op +
store sequence to merge into one lowered instruction.The scheme arose from many fruitful discussions with @julian-seward1
(thanks!); significant credit is due to him for the insights here.The first insight is that given the right basic conditions, i.e. that
the root instruction is the only use of an effectful instruction's
result, all we need is that the "color" of the effectful instruction is
one less than the color of the current instruction. It's easier to
think about colors on the program points between instructions: if the
color coming out of the first (effectful def) instruction and in to
the second (effectful or effect-free use) instruction are the same, then
they can merge. Basically the color denotes a version of global state;
if the same, then no other effectful ops happened in the meantime.The second insight is that we can keep state as we scan, tracking the
"current color", and update this when we sink (merge) an op. Hence
when we sink a load into another op, we effectively re-color every
instruction it moved over; this may allow further sinks.Consider the example (and assume that we consider loads effectful in
order to conservatively ensure a strong memory model; otherwise, replace
with other effectful value-producing insts):v0 = load x v1 = load y v2 = add v0, 1 v3 = add v1, 1
Scanning from bottom to top, we first see the add producing
v3
and we
can sink the load producingv1
into it, producing a load + ALU-op
machine instruction. This is legal becausev1
moves over onlyv2
,
which is a pure instruction. Consider, though,v2
: under a simple
scheme that has no other context,v0
could not sink tov2
because it
would move overv1
, another load. But because we already sunkv1
down tov3
, we are free to sinkv0
tov2
; the update of the
"current color" during the scan allows this.This PR also cleans up the
LowerCtx
interface a bit at the same time:
whereas previously it always gave some subset of (constant, mergeable
inst, register) directly fromLowerCtx::get_input()
, it now returns
zero or more of (constant, mergable inst) from
LowerCtx::maybe_get_input_as_source_or_const()
, and returns the
register only fromLowerCtx::put_input_in_reg()
. This removes the need
to explicitly denote uses of the register, so it's a little safer.Note that this PR does not actually make use of the new ability to merge
loads into other ops; that will come in future PRs, especially to
optimize thex64
backend by using direct-memory operands.Testing: existing filetests ensure the DCE-while-lowering and existing
merging pattern-matches continue to work; merging of effectful ops
will be verified in a subsequent PR with load+op pattern-matching.Fixes #2340.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin updated PR #2366 from load-isel
to main
:
This PR updates the "coloring" scheme that accounts for side-effects in
the MachInst lowering logic. As a result, the new backends will now be
able to merge effectful operations (such as memory loads) into other
operations; previously, only the other way (pure ops merged into
effectful ops) was possible. This will allow, for example, a load+ALU-op
combination, as is common on x86. It should even allow a load + ALU-op +
store sequence to merge into one lowered instruction.The scheme arose from many fruitful discussions with @julian-seward1
(thanks!); significant credit is due to him for the insights here.The first insight is that given the right basic conditions, i.e. that
the root instruction is the only use of an effectful instruction's
result, all we need is that the "color" of the effectful instruction is
one less than the color of the current instruction. It's easier to
think about colors on the program points between instructions: if the
color coming out of the first (effectful def) instruction and in to
the second (effectful or effect-free use) instruction are the same, then
they can merge. Basically the color denotes a version of global state;
if the same, then no other effectful ops happened in the meantime.The second insight is that we can keep state as we scan, tracking the
"current color", and update this when we sink (merge) an op. Hence
when we sink a load into another op, we effectively re-color every
instruction it moved over; this may allow further sinks.Consider the example (and assume that we consider loads effectful in
order to conservatively ensure a strong memory model; otherwise, replace
with other effectful value-producing insts):v0 = load x v1 = load y v2 = add v0, 1 v3 = add v1, 1
Scanning from bottom to top, we first see the add producing
v3
and we
can sink the load producingv1
into it, producing a load + ALU-op
machine instruction. This is legal becausev1
moves over onlyv2
,
which is a pure instruction. Consider, though,v2
: under a simple
scheme that has no other context,v0
could not sink tov2
because it
would move overv1
, another load. But because we already sunkv1
down tov3
, we are free to sinkv0
tov2
; the update of the
"current color" during the scan allows this.This PR also cleans up the
LowerCtx
interface a bit at the same time:
whereas previously it always gave some subset of (constant, mergeable
inst, register) directly fromLowerCtx::get_input()
, it now returns
zero or more of (constant, mergable inst) from
LowerCtx::maybe_get_input_as_source_or_const()
, and returns the
register only fromLowerCtx::put_input_in_reg()
. This removes the need
to explicitly denote uses of the register, so it's a little safer.Note that this PR does not actually make use of the new ability to merge
loads into other ops; that will come in future PRs, especially to
optimize thex64
backend by using direct-memory operands.Testing: existing filetests ensure the DCE-while-lowering and existing
merging pattern-matches continue to work; merging of effectful ops
will be verified in a subsequent PR with load+op pattern-matching.Fixes #2340.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin updated PR #2366 from load-isel
to main
:
This PR updates the "coloring" scheme that accounts for side-effects in
the MachInst lowering logic. As a result, the new backends will now be
able to merge effectful operations (such as memory loads) into other
operations; previously, only the other way (pure ops merged into
effectful ops) was possible. This will allow, for example, a load+ALU-op
combination, as is common on x86. It should even allow a load + ALU-op +
store sequence to merge into one lowered instruction.The scheme arose from many fruitful discussions with @julian-seward1
(thanks!); significant credit is due to him for the insights here.The first insight is that given the right basic conditions, i.e. that
the root instruction is the only use of an effectful instruction's
result, all we need is that the "color" of the effectful instruction is
one less than the color of the current instruction. It's easier to
think about colors on the program points between instructions: if the
color coming out of the first (effectful def) instruction and in to
the second (effectful or effect-free use) instruction are the same, then
they can merge. Basically the color denotes a version of global state;
if the same, then no other effectful ops happened in the meantime.The second insight is that we can keep state as we scan, tracking the
"current color", and update this when we sink (merge) an op. Hence
when we sink a load into another op, we effectively re-color every
instruction it moved over; this may allow further sinks.Consider the example (and assume that we consider loads effectful in
order to conservatively ensure a strong memory model; otherwise, replace
with other effectful value-producing insts):v0 = load x v1 = load y v2 = add v0, 1 v3 = add v1, 1
Scanning from bottom to top, we first see the add producing
v3
and we
can sink the load producingv1
into it, producing a load + ALU-op
machine instruction. This is legal becausev1
moves over onlyv2
,
which is a pure instruction. Consider, though,v2
: under a simple
scheme that has no other context,v0
could not sink tov2
because it
would move overv1
, another load. But because we already sunkv1
down tov3
, we are free to sinkv0
tov2
; the update of the
"current color" during the scan allows this.This PR also cleans up the
LowerCtx
interface a bit at the same time:
whereas previously it always gave some subset of (constant, mergeable
inst, register) directly fromLowerCtx::get_input()
, it now returns
zero or more of (constant, mergable inst) from
LowerCtx::maybe_get_input_as_source_or_const()
, and returns the
register only fromLowerCtx::put_input_in_reg()
. This removes the need
to explicitly denote uses of the register, so it's a little safer.Note that this PR does not actually make use of the new ability to merge
loads into other ops; that will come in future PRs, especially to
optimize thex64
backend by using direct-memory operands.Testing: existing filetests ensure the DCE-while-lowering and existing
merging pattern-matches continue to work; merging of effectful ops
will be verified in a subsequent PR with load+op pattern-matching.Fixes #2340.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin updated PR #2366 from load-isel
to main
:
This PR updates the "coloring" scheme that accounts for side-effects in
the MachInst lowering logic. As a result, the new backends will now be
able to merge effectful operations (such as memory loads) into other
operations; previously, only the other way (pure ops merged into
effectful ops) was possible. This will allow, for example, a load+ALU-op
combination, as is common on x86. It should even allow a load + ALU-op +
store sequence to merge into one lowered instruction.The scheme arose from many fruitful discussions with @julian-seward1
(thanks!); significant credit is due to him for the insights here.The first insight is that given the right basic conditions, i.e. that
the root instruction is the only use of an effectful instruction's
result, all we need is that the "color" of the effectful instruction is
one less than the color of the current instruction. It's easier to
think about colors on the program points between instructions: if the
color coming out of the first (effectful def) instruction and in to
the second (effectful or effect-free use) instruction are the same, then
they can merge. Basically the color denotes a version of global state;
if the same, then no other effectful ops happened in the meantime.The second insight is that we can keep state as we scan, tracking the
"current color", and update this when we sink (merge) an op. Hence
when we sink a load into another op, we effectively re-color every
instruction it moved over; this may allow further sinks.Consider the example (and assume that we consider loads effectful in
order to conservatively ensure a strong memory model; otherwise, replace
with other effectful value-producing insts):v0 = load x v1 = load y v2 = add v0, 1 v3 = add v1, 1
Scanning from bottom to top, we first see the add producing
v3
and we
can sink the load producingv1
into it, producing a load + ALU-op
machine instruction. This is legal becausev1
moves over onlyv2
,
which is a pure instruction. Consider, though,v2
: under a simple
scheme that has no other context,v0
could not sink tov2
because it
would move overv1
, another load. But because we already sunkv1
down tov3
, we are free to sinkv0
tov2
; the update of the
"current color" during the scan allows this.This PR also cleans up the
LowerCtx
interface a bit at the same time:
whereas previously it always gave some subset of (constant, mergeable
inst, register) directly fromLowerCtx::get_input()
, it now returns
zero or more of (constant, mergable inst) from
LowerCtx::maybe_get_input_as_source_or_const()
, and returns the
register only fromLowerCtx::put_input_in_reg()
. This removes the need
to explicitly denote uses of the register, so it's a little safer.Note that this PR does not actually make use of the new ability to merge
loads into other ops; that will come in future PRs, especially to
optimize thex64
backend by using direct-memory operands.Testing: existing filetests ensure the DCE-while-lowering and existing
merging pattern-matches continue to work; merging of effectful ops
will be verified in a subsequent PR with load+op pattern-matching.Fixes #2340.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin updated PR #2366 from load-isel
to main
:
This PR updates the "coloring" scheme that accounts for side-effects in
the MachInst lowering logic. As a result, the new backends will now be
able to merge effectful operations (such as memory loads) into other
operations; previously, only the other way (pure ops merged into
effectful ops) was possible. This will allow, for example, a load+ALU-op
combination, as is common on x86. It should even allow a load + ALU-op +
store sequence to merge into one lowered instruction.The scheme arose from many fruitful discussions with @julian-seward1
(thanks!); significant credit is due to him for the insights here.The first insight is that given the right basic conditions, i.e. that
the root instruction is the only use of an effectful instruction's
result, all we need is that the "color" of the effectful instruction is
one less than the color of the current instruction. It's easier to
think about colors on the program points between instructions: if the
color coming out of the first (effectful def) instruction and in to
the second (effectful or effect-free use) instruction are the same, then
they can merge. Basically the color denotes a version of global state;
if the same, then no other effectful ops happened in the meantime.The second insight is that we can keep state as we scan, tracking the
"current color", and update this when we sink (merge) an op. Hence
when we sink a load into another op, we effectively re-color every
instruction it moved over; this may allow further sinks.Consider the example (and assume that we consider loads effectful in
order to conservatively ensure a strong memory model; otherwise, replace
with other effectful value-producing insts):v0 = load x v1 = load y v2 = add v0, 1 v3 = add v1, 1
Scanning from bottom to top, we first see the add producing
v3
and we
can sink the load producingv1
into it, producing a load + ALU-op
machine instruction. This is legal becausev1
moves over onlyv2
,
which is a pure instruction. Consider, though,v2
: under a simple
scheme that has no other context,v0
could not sink tov2
because it
would move overv1
, another load. But because we already sunkv1
down tov3
, we are free to sinkv0
tov2
; the update of the
"current color" during the scan allows this.This PR also cleans up the
LowerCtx
interface a bit at the same time:
whereas previously it always gave some subset of (constant, mergeable
inst, register) directly fromLowerCtx::get_input()
, it now returns
zero or more of (constant, mergable inst) from
LowerCtx::maybe_get_input_as_source_or_const()
, and returns the
register only fromLowerCtx::put_input_in_reg()
. This removes the need
to explicitly denote uses of the register, so it's a little safer.Note that this PR does not actually make use of the new ability to merge
loads into other ops; that will come in future PRs, especially to
optimize thex64
backend by using direct-memory operands.Testing: existing filetests ensure the DCE-while-lowering and existing
merging pattern-matches continue to work; merging of effectful ops
will be verified in a subsequent PR with load+op pattern-matching.Fixes #2340.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin updated PR #2366 from load-isel
to main
:
This PR updates the "coloring" scheme that accounts for side-effects in
the MachInst lowering logic. As a result, the new backends will now be
able to merge effectful operations (such as memory loads) into other
operations; previously, only the other way (pure ops merged into
effectful ops) was possible. This will allow, for example, a load+ALU-op
combination, as is common on x86. It should even allow a load + ALU-op +
store sequence to merge into one lowered instruction.The scheme arose from many fruitful discussions with @julian-seward1
(thanks!); significant credit is due to him for the insights here.The first insight is that given the right basic conditions, i.e. that
the root instruction is the only use of an effectful instruction's
result, all we need is that the "color" of the effectful instruction is
one less than the color of the current instruction. It's easier to
think about colors on the program points between instructions: if the
color coming out of the first (effectful def) instruction and in to
the second (effectful or effect-free use) instruction are the same, then
they can merge. Basically the color denotes a version of global state;
if the same, then no other effectful ops happened in the meantime.The second insight is that we can keep state as we scan, tracking the
"current color", and update this when we sink (merge) an op. Hence
when we sink a load into another op, we effectively re-color every
instruction it moved over; this may allow further sinks.Consider the example (and assume that we consider loads effectful in
order to conservatively ensure a strong memory model; otherwise, replace
with other effectful value-producing insts):v0 = load x v1 = load y v2 = add v0, 1 v3 = add v1, 1
Scanning from bottom to top, we first see the add producing
v3
and we
can sink the load producingv1
into it, producing a load + ALU-op
machine instruction. This is legal becausev1
moves over onlyv2
,
which is a pure instruction. Consider, though,v2
: under a simple
scheme that has no other context,v0
could not sink tov2
because it
would move overv1
, another load. But because we already sunkv1
down tov3
, we are free to sinkv0
tov2
; the update of the
"current color" during the scan allows this.This PR also cleans up the
LowerCtx
interface a bit at the same time:
whereas previously it always gave some subset of (constant, mergeable
inst, register) directly fromLowerCtx::get_input()
, it now returns
zero or more of (constant, mergable inst) from
LowerCtx::maybe_get_input_as_source_or_const()
, and returns the
register only fromLowerCtx::put_input_in_reg()
. This removes the need
to explicitly denote uses of the register, so it's a little safer.Note that this PR does not actually make use of the new ability to merge
loads into other ops; that will come in future PRs, especially to
optimize thex64
backend by using direct-memory operands.Testing: existing filetests ensure the DCE-while-lowering and existing
merging pattern-matches continue to work; merging of effectful ops
will be verified in a subsequent PR with load+op pattern-matching.Fixes #2340.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin updated PR #2366 from load-isel
to main
:
This PR updates the "coloring" scheme that accounts for side-effects in
the MachInst lowering logic. As a result, the new backends will now be
able to merge effectful operations (such as memory loads) into other
operations; previously, only the other way (pure ops merged into
effectful ops) was possible. This will allow, for example, a load+ALU-op
combination, as is common on x86. It should even allow a load + ALU-op +
store sequence to merge into one lowered instruction.The scheme arose from many fruitful discussions with @julian-seward1
(thanks!); significant credit is due to him for the insights here.The first insight is that given the right basic conditions, i.e. that
the root instruction is the only use of an effectful instruction's
result, all we need is that the "color" of the effectful instruction is
one less than the color of the current instruction. It's easier to
think about colors on the program points between instructions: if the
color coming out of the first (effectful def) instruction and in to
the second (effectful or effect-free use) instruction are the same, then
they can merge. Basically the color denotes a version of global state;
if the same, then no other effectful ops happened in the meantime.The second insight is that we can keep state as we scan, tracking the
"current color", and update this when we sink (merge) an op. Hence
when we sink a load into another op, we effectively re-color every
instruction it moved over; this may allow further sinks.Consider the example (and assume that we consider loads effectful in
order to conservatively ensure a strong memory model; otherwise, replace
with other effectful value-producing insts):v0 = load x v1 = load y v2 = add v0, 1 v3 = add v1, 1
Scanning from bottom to top, we first see the add producing
v3
and we
can sink the load producingv1
into it, producing a load + ALU-op
machine instruction. This is legal becausev1
moves over onlyv2
,
which is a pure instruction. Consider, though,v2
: under a simple
scheme that has no other context,v0
could not sink tov2
because it
would move overv1
, another load. But because we already sunkv1
down tov3
, we are free to sinkv0
tov2
; the update of the
"current color" during the scan allows this.This PR also cleans up the
LowerCtx
interface a bit at the same time:
whereas previously it always gave some subset of (constant, mergeable
inst, register) directly fromLowerCtx::get_input()
, it now returns
zero or more of (constant, mergable inst) from
LowerCtx::maybe_get_input_as_source_or_const()
, and returns the
register only fromLowerCtx::put_input_in_reg()
. This removes the need
to explicitly denote uses of the register, so it's a little safer.Note that this PR does not actually make use of the new ability to merge
loads into other ops; that will come in future PRs, especially to
optimize thex64
backend by using direct-memory operands.Testing: existing filetests ensure the DCE-while-lowering and existing
merging pattern-matches continue to work; merging of effectful ops
will be verified in a subsequent PR with load+op pattern-matching.Fixes #2340.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin updated PR #2366 from load-isel
to main
:
This PR updates the "coloring" scheme that accounts for side-effects in
the MachInst lowering logic. As a result, the new backends will now be
able to merge effectful operations (such as memory loads) into other
operations; previously, only the other way (pure ops merged into
effectful ops) was possible. This will allow, for example, a load+ALU-op
combination, as is common on x86. It should even allow a load + ALU-op +
store sequence to merge into one lowered instruction.The scheme arose from many fruitful discussions with @julian-seward1
(thanks!); significant credit is due to him for the insights here.The first insight is that given the right basic conditions, i.e. that
the root instruction is the only use of an effectful instruction's
result, all we need is that the "color" of the effectful instruction is
one less than the color of the current instruction. It's easier to
think about colors on the program points between instructions: if the
color coming out of the first (effectful def) instruction and in to
the second (effectful or effect-free use) instruction are the same, then
they can merge. Basically the color denotes a version of global state;
if the same, then no other effectful ops happened in the meantime.The second insight is that we can keep state as we scan, tracking the
"current color", and update this when we sink (merge) an op. Hence
when we sink a load into another op, we effectively re-color every
instruction it moved over; this may allow further sinks.Consider the example (and assume that we consider loads effectful in
order to conservatively ensure a strong memory model; otherwise, replace
with other effectful value-producing insts):v0 = load x v1 = load y v2 = add v0, 1 v3 = add v1, 1
Scanning from bottom to top, we first see the add producing
v3
and we
can sink the load producingv1
into it, producing a load + ALU-op
machine instruction. This is legal becausev1
moves over onlyv2
,
which is a pure instruction. Consider, though,v2
: under a simple
scheme that has no other context,v0
could not sink tov2
because it
would move overv1
, another load. But because we already sunkv1
down tov3
, we are free to sinkv0
tov2
; the update of the
"current color" during the scan allows this.This PR also cleans up the
LowerCtx
interface a bit at the same time:
whereas previously it always gave some subset of (constant, mergeable
inst, register) directly fromLowerCtx::get_input()
, it now returns
zero or more of (constant, mergable inst) from
LowerCtx::maybe_get_input_as_source_or_const()
, and returns the
register only fromLowerCtx::put_input_in_reg()
. This removes the need
to explicitly denote uses of the register, so it's a little safer.Note that this PR does not actually make use of the new ability to merge
loads into other ops; that will come in future PRs, especially to
optimize thex64
backend by using direct-memory operands.Testing: existing filetests ensure the DCE-while-lowering and existing
merging pattern-matches continue to work; merging of effectful ops
will be verified in a subsequent PR with load+op pattern-matching.Fixes #2340.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin requested fitzgen and julian-seward1 for a review on PR #2366.
fitzgen submitted PR Review.
fitzgen submitted PR Review.
fitzgen created PR Review Comment:
I feel like this could use a new name now:
NonRegInput
?
fitzgen created PR Review Comment:
nitpick: it seems worth pulling this out into an
is_inst_sunk
helper method to clarify the meaning of this code without being forced to look at the details of the data structures and representation
fitzgen created PR Review Comment:
doc nitpick for clarity:
/// Put the `idx`th input into a register and return the assigned register.
fitzgen created PR Review Comment:
Can
inst
andconstant
ever both beSome
? If not, then I think it makes more sense to define this as anenum NonRegInput
and then methods that create this would returnOption<NonRegInput>
and useNone
instead of returning a struct that has empty components.
fitzgen created PR Review Comment:
ditto for this bit. maybe we could have a
is_result_needed
helper
fitzgen created PR Review Comment:
Returning
Option
would also make it so the naming could be a little more natural here, since the "maybe" would become redundant, and this could be namedfn get_input_as_non_reg(...) -> Option<NonRegInput> {...}
which I think reads a lot better.
cfallin submitted PR Review.
cfallin created PR Review Comment:
The
inst
andconstant
fields can actually both beSome
, and in fact some of the matching code relies on this -- it matches on the opcode implied byinst
where one of the options isIconst
. In other words when we know a constant value for the input it's because we know its source instruction, and that source instruction is iconst/fconst/etc. Otherwise I agree this would have been a good simplification!
cfallin submitted PR Review.
cfallin created PR Review Comment:
I decided to remove
maybe_
as the returned struct is composed ofOption
s, so it's semantically pretty clear what's going on in any case.
cfallin updated PR #2366 from load-isel
to main
:
This PR updates the "coloring" scheme that accounts for side-effects in
the MachInst lowering logic. As a result, the new backends will now be
able to merge effectful operations (such as memory loads) into other
operations; previously, only the other way (pure ops merged into
effectful ops) was possible. This will allow, for example, a load+ALU-op
combination, as is common on x86. It should even allow a load + ALU-op +
store sequence to merge into one lowered instruction.The scheme arose from many fruitful discussions with @julian-seward1
(thanks!); significant credit is due to him for the insights here.The first insight is that given the right basic conditions, i.e. that
the root instruction is the only use of an effectful instruction's
result, all we need is that the "color" of the effectful instruction is
one less than the color of the current instruction. It's easier to
think about colors on the program points between instructions: if the
color coming out of the first (effectful def) instruction and in to
the second (effectful or effect-free use) instruction are the same, then
they can merge. Basically the color denotes a version of global state;
if the same, then no other effectful ops happened in the meantime.The second insight is that we can keep state as we scan, tracking the
"current color", and update this when we sink (merge) an op. Hence
when we sink a load into another op, we effectively re-color every
instruction it moved over; this may allow further sinks.Consider the example (and assume that we consider loads effectful in
order to conservatively ensure a strong memory model; otherwise, replace
with other effectful value-producing insts):v0 = load x v1 = load y v2 = add v0, 1 v3 = add v1, 1
Scanning from bottom to top, we first see the add producing
v3
and we
can sink the load producingv1
into it, producing a load + ALU-op
machine instruction. This is legal becausev1
moves over onlyv2
,
which is a pure instruction. Consider, though,v2
: under a simple
scheme that has no other context,v0
could not sink tov2
because it
would move overv1
, another load. But because we already sunkv1
down tov3
, we are free to sinkv0
tov2
; the update of the
"current color" during the scan allows this.This PR also cleans up the
LowerCtx
interface a bit at the same time:
whereas previously it always gave some subset of (constant, mergeable
inst, register) directly fromLowerCtx::get_input()
, it now returns
zero or more of (constant, mergable inst) from
LowerCtx::maybe_get_input_as_source_or_const()
, and returns the
register only fromLowerCtx::put_input_in_reg()
. This removes the need
to explicitly denote uses of the register, so it's a little safer.Note that this PR does not actually make use of the new ability to merge
loads into other ops; that will come in future PRs, especially to
optimize thex64
backend by using direct-memory operands.Testing: existing filetests ensure the DCE-while-lowering and existing
merging pattern-matches continue to work; merging of effectful ops
will be verified in a subsequent PR with load+op pattern-matching.Fixes #2340.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done.
cfallin created PR Review Comment:
Done, thanks!
cfallin submitted PR Review.
cfallin submitted PR Review.
cfallin created PR Review Comment:
I like that name better, agreed.
cfallin updated PR #2366 from load-isel
to main
:
This PR updates the "coloring" scheme that accounts for side-effects in
the MachInst lowering logic. As a result, the new backends will now be
able to merge effectful operations (such as memory loads) into other
operations; previously, only the other way (pure ops merged into
effectful ops) was possible. This will allow, for example, a load+ALU-op
combination, as is common on x86. It should even allow a load + ALU-op +
store sequence to merge into one lowered instruction.The scheme arose from many fruitful discussions with @julian-seward1
(thanks!); significant credit is due to him for the insights here.The first insight is that given the right basic conditions, i.e. that
the root instruction is the only use of an effectful instruction's
result, all we need is that the "color" of the effectful instruction is
one less than the color of the current instruction. It's easier to
think about colors on the program points between instructions: if the
color coming out of the first (effectful def) instruction and in to
the second (effectful or effect-free use) instruction are the same, then
they can merge. Basically the color denotes a version of global state;
if the same, then no other effectful ops happened in the meantime.The second insight is that we can keep state as we scan, tracking the
"current color", and update this when we sink (merge) an op. Hence
when we sink a load into another op, we effectively re-color every
instruction it moved over; this may allow further sinks.Consider the example (and assume that we consider loads effectful in
order to conservatively ensure a strong memory model; otherwise, replace
with other effectful value-producing insts):v0 = load x v1 = load y v2 = add v0, 1 v3 = add v1, 1
Scanning from bottom to top, we first see the add producing
v3
and we
can sink the load producingv1
into it, producing a load + ALU-op
machine instruction. This is legal becausev1
moves over onlyv2
,
which is a pure instruction. Consider, though,v2
: under a simple
scheme that has no other context,v0
could not sink tov2
because it
would move overv1
, another load. But because we already sunkv1
down tov3
, we are free to sinkv0
tov2
; the update of the
"current color" during the scan allows this.This PR also cleans up the
LowerCtx
interface a bit at the same time:
whereas previously it always gave some subset of (constant, mergeable
inst, register) directly fromLowerCtx::get_input()
, it now returns
zero or more of (constant, mergable inst) from
LowerCtx::maybe_get_input_as_source_or_const()
, and returns the
register only fromLowerCtx::put_input_in_reg()
. This removes the need
to explicitly denote uses of the register, so it's a little safer.Note that this PR does not actually make use of the new ability to merge
loads into other ops; that will come in future PRs, especially to
optimize thex64
backend by using direct-memory operands.Testing: existing filetests ensure the DCE-while-lowering and existing
merging pattern-matches continue to work; merging of effectful ops
will be verified in a subsequent PR with load+op pattern-matching.Fixes #2340.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done.
cfallin updated PR #2366 from load-isel
to main
:
This PR updates the "coloring" scheme that accounts for side-effects in
the MachInst lowering logic. As a result, the new backends will now be
able to merge effectful operations (such as memory loads) into other
operations; previously, only the other way (pure ops merged into
effectful ops) was possible. This will allow, for example, a load+ALU-op
combination, as is common on x86. It should even allow a load + ALU-op +
store sequence to merge into one lowered instruction.The scheme arose from many fruitful discussions with @julian-seward1
(thanks!); significant credit is due to him for the insights here.The first insight is that given the right basic conditions, i.e. that
the root instruction is the only use of an effectful instruction's
result, all we need is that the "color" of the effectful instruction is
one less than the color of the current instruction. It's easier to
think about colors on the program points between instructions: if the
color coming out of the first (effectful def) instruction and in to
the second (effectful or effect-free use) instruction are the same, then
they can merge. Basically the color denotes a version of global state;
if the same, then no other effectful ops happened in the meantime.The second insight is that we can keep state as we scan, tracking the
"current color", and update this when we sink (merge) an op. Hence
when we sink a load into another op, we effectively re-color every
instruction it moved over; this may allow further sinks.Consider the example (and assume that we consider loads effectful in
order to conservatively ensure a strong memory model; otherwise, replace
with other effectful value-producing insts):v0 = load x v1 = load y v2 = add v0, 1 v3 = add v1, 1
Scanning from bottom to top, we first see the add producing
v3
and we
can sink the load producingv1
into it, producing a load + ALU-op
machine instruction. This is legal becausev1
moves over onlyv2
,
which is a pure instruction. Consider, though,v2
: under a simple
scheme that has no other context,v0
could not sink tov2
because it
would move overv1
, another load. But because we already sunkv1
down tov3
, we are free to sinkv0
tov2
; the update of the
"current color" during the scan allows this.This PR also cleans up the
LowerCtx
interface a bit at the same time:
whereas previously it always gave some subset of (constant, mergeable
inst, register) directly fromLowerCtx::get_input()
, it now returns
zero or more of (constant, mergable inst) from
LowerCtx::maybe_get_input_as_source_or_const()
, and returns the
register only fromLowerCtx::put_input_in_reg()
. This removes the need
to explicitly denote uses of the register, so it's a little safer.Note that this PR does not actually make use of the new ability to merge
loads into other ops; that will come in future PRs, especially to
optimize thex64
backend by using direct-memory operands.Testing: existing filetests ensure the DCE-while-lowering and existing
merging pattern-matches continue to work; merging of effectful ops
will be verified in a subsequent PR with load+op pattern-matching.Fixes #2340.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin merged PR #2366.
Last updated: Jan 24 2025 at 00:11 UTC