cfallin opened PR #2473 from fix-lowering
to main
:
This fixes a subtle corner case exposed during fuzzing. If we have a bit
of CLIF like:v0 = load.i64 ... v1 = iadd.i64 v0, ... v2 = do_other_thing v1 v3 = load.i64 v1
and if this is lowered using a machine backend that can merge loads into
ALU ops, and that has an addressing mode that can look through add
ops, then the following can happen:
We lower the load at
v3
. This looks backward at the address
operand tree and finds thatv1
isv0
plus other things; it has an
addressing mode that can addv0
's register and the other things
directly; so it callsput_value_in_reg(v0)
and uses its register in
the amode. At this point, the add producingv1
has no references,
so it will not (yet) be codegen'd.We lower
do_other_thing
, which putsv1
in a register and uses it.
theiadd
now has a reference.We reach the
iadd
and, because it has a reference, lower it. Our
machine has the ability to merge a load into an ALU operation.
Crucially, we think the load atv0
is mergeable because it has
only one user, the add atv1
(!). So we merge it.We reach the
load
atv0
and because it has been merged into the
iadd
, we do not separately codegen it. The register that holdsv0
is thus never written, and the use of this register by the final load
(Step 1) will see an undefined value.The logic error here is that in the presence of pattern matching that
looks through pure ops, we can end up with multiple uses of a value that
originally had a single use (because we allow lookthrough of pure ops in
all cases). In other words, the multiple-use-ness ofv1
"passes
through" in some sense tov0
. However, the load sinking logic is not
aware of this.The fix, I think, is pretty simple: we disallow an effectful instruction
from sinking/merging if it already has some other use when we look back
at it.If we disallowed lookthrough of any op that had multiple uses, even
pure ones, then we would avoid this scenario; but earlier experiments
showed that to have a non-negligible performance impact, so (given that
we've worked out the logic above) I think this complexity is worth it.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin requested fitzgen, bnjbvr and julian-seward1 for a review on PR #2473.
cfallin requested fitzgen, bnjbvr and julian-seward1 for a review on PR #2473.
cfallin requested fitzgen, bnjbvr and julian-seward1 for a review on PR #2473.
cfallin updated PR #2473 from fix-lowering
to main
:
This fixes a subtle corner case exposed during fuzzing. If we have a bit
of CLIF like:v0 = load.i64 ... v1 = iadd.i64 v0, ... v2 = do_other_thing v1 v3 = load.i64 v1
and if this is lowered using a machine backend that can merge loads into
ALU ops, and that has an addressing mode that can look through add
ops, then the following can happen:
We lower the load at
v3
. This looks backward at the address
operand tree and finds thatv1
isv0
plus other things; it has an
addressing mode that can addv0
's register and the other things
directly; so it callsput_value_in_reg(v0)
and uses its register in
the amode. At this point, the add producingv1
has no references,
so it will not (yet) be codegen'd.We lower
do_other_thing
, which putsv1
in a register and uses it.
theiadd
now has a reference.We reach the
iadd
and, because it has a reference, lower it. Our
machine has the ability to merge a load into an ALU operation.
Crucially, we think the load atv0
is mergeable because it has
only one user, the add atv1
(!). So we merge it.We reach the
load
atv0
and because it has been merged into the
iadd
, we do not separately codegen it. The register that holdsv0
is thus never written, and the use of this register by the final load
(Step 1) will see an undefined value.The logic error here is that in the presence of pattern matching that
looks through pure ops, we can end up with multiple uses of a value that
originally had a single use (because we allow lookthrough of pure ops in
all cases). In other words, the multiple-use-ness ofv1
"passes
through" in some sense tov0
. However, the load sinking logic is not
aware of this.The fix, I think, is pretty simple: we disallow an effectful instruction
from sinking/merging if it already has some other use when we look back
at it.If we disallowed lookthrough of any op that had multiple uses, even
pure ones, then we would avoid this scenario; but earlier experiments
showed that to have a non-negligible performance impact, so (given that
we've worked out the logic above) I think this complexity is worth it.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
fitzgen submitted PR Review.
cfallin merged PR #2473.
Last updated: Jan 24 2025 at 00:11 UTC