cfallin opened PR #2389 from x64-load-op
to main
:
This PR makes use of the support in #2366 for sinking effectful
instructions and merging them with consumers. In particular, on x86, we
want to make use of the ability of many instructions to load one operand
directly from memory. That is, instead of this:movq 0(%rdi), %rax addq %rax, %rbx
we want to generate this:
addq 0(%rdi), %rax
As described in more detail in #2366, sinking and merging the load is
only possible under certain conditions. In particular, we need to ensure
that the use is the only use (otherwise the load happens more than
once), and we need to ensure that it does not move across other
effectful ops (see #2366 for how we ensure this).This change is actually fairly simple, given that all the framework is
in place: we simply pattern-match a load on one operand of an ALU
instruction that takes an RMI (reg, mem, or immediate) operand, and
generate the mem form when we match.Also makes a drive-by improvement in the x64 backend to use
statically-monomorphizedLowerCtx
types rather than a&mut dyn LowerCtx
.On
bz2.wasm
, this results in ~1% instruction-count reduction. More is
likely possible by following up with other instructions that can merge
memory loads as well.This PR includes #2366 and also #2376 (I built on top of the latter because
otherwise there would be some merge conflicts due to their overlap); both
of those should land before this does.
cfallin requested abrown and julian-seward1 for a review on PR #2389.
cfallin requested abrown and julian-seward1 for a review on PR #2389.
cfallin updated PR #2389 from x64-load-op
to main
:
This PR makes use of the support in #2366 for sinking effectful
instructions and merging them with consumers. In particular, on x86, we
want to make use of the ability of many instructions to load one operand
directly from memory. That is, instead of this:movq 0(%rdi), %rax addq %rax, %rbx
we want to generate this:
addq 0(%rdi), %rax
As described in more detail in #2366, sinking and merging the load is
only possible under certain conditions. In particular, we need to ensure
that the use is the only use (otherwise the load happens more than
once), and we need to ensure that it does not move across other
effectful ops (see #2366 for how we ensure this).This change is actually fairly simple, given that all the framework is
in place: we simply pattern-match a load on one operand of an ALU
instruction that takes an RMI (reg, mem, or immediate) operand, and
generate the mem form when we match.Also makes a drive-by improvement in the x64 backend to use
statically-monomorphizedLowerCtx
types rather than a&mut dyn LowerCtx
.On
bz2.wasm
, this results in ~1% instruction-count reduction. More is
likely possible by following up with other instructions that can merge
memory loads as well.This PR includes #2366 and also #2376 (I built on top of the latter because
otherwise there would be some merge conflicts due to their overlap); both
of those should land before this does.
cfallin updated PR #2389 from x64-load-op
to main
:
This PR makes use of the support in #2366 for sinking effectful
instructions and merging them with consumers. In particular, on x86, we
want to make use of the ability of many instructions to load one operand
directly from memory. That is, instead of this:movq 0(%rdi), %rax addq %rax, %rbx
we want to generate this:
addq 0(%rdi), %rax
As described in more detail in #2366, sinking and merging the load is
only possible under certain conditions. In particular, we need to ensure
that the use is the only use (otherwise the load happens more than
once), and we need to ensure that it does not move across other
effectful ops (see #2366 for how we ensure this).This change is actually fairly simple, given that all the framework is
in place: we simply pattern-match a load on one operand of an ALU
instruction that takes an RMI (reg, mem, or immediate) operand, and
generate the mem form when we match.Also makes a drive-by improvement in the x64 backend to use
statically-monomorphizedLowerCtx
types rather than a&mut dyn LowerCtx
.On
bz2.wasm
, this results in ~1% instruction-count reduction. More is
likely possible by following up with other instructions that can merge
memory loads as well.This PR includes #2366 and also #2376 (I built on top of the latter because
otherwise there would be some merge conflicts due to their overlap); both
of those should land before this does.
cfallin updated PR #2389 from x64-load-op
to main
:
This PR makes use of the support in #2366 for sinking effectful
instructions and merging them with consumers. In particular, on x86, we
want to make use of the ability of many instructions to load one operand
directly from memory. That is, instead of this:movq 0(%rdi), %rax addq %rax, %rbx
we want to generate this:
addq 0(%rdi), %rax
As described in more detail in #2366, sinking and merging the load is
only possible under certain conditions. In particular, we need to ensure
that the use is the only use (otherwise the load happens more than
once), and we need to ensure that it does not move across other
effectful ops (see #2366 for how we ensure this).This change is actually fairly simple, given that all the framework is
in place: we simply pattern-match a load on one operand of an ALU
instruction that takes an RMI (reg, mem, or immediate) operand, and
generate the mem form when we match.Also makes a drive-by improvement in the x64 backend to use
statically-monomorphizedLowerCtx
types rather than a&mut dyn LowerCtx
.On
bz2.wasm
, this results in ~1% instruction-count reduction. More is
likely possible by following up with other instructions that can merge
memory loads as well.This PR includes #2366 and also #2376 (I built on top of the latter because
otherwise there would be some merge conflicts due to their overlap); both
of those should land before this does.
julian-seward1 submitted PR Review.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Ditto. Also, can you add a 1-liner saying what
src_inst
is? Is it the load? The insn using the result of the load? etc
julian-seward1 created PR Review Comment:
Confusing that the second operand of the
addl
is missing; can this be added? Ditto for the cases below.
julian-seward1 created PR Review Comment:
Uh, this output is so minimal I don't really understand. If the expected output begins
movzbq
, it seems like this isn't testing load-op merging at all.
julian-seward1 created PR Review Comment:
Please add a 1-liner comment saying roughly what this does.
cfallin updated PR #2389 from x64-load-op
to main
:
This PR makes use of the support in #2366 for sinking effectful
instructions and merging them with consumers. In particular, on x86, we
want to make use of the ability of many instructions to load one operand
directly from memory. That is, instead of this:movq 0(%rdi), %rax addq %rax, %rbx
we want to generate this:
addq 0(%rdi), %rax
As described in more detail in #2366, sinking and merging the load is
only possible under certain conditions. In particular, we need to ensure
that the use is the only use (otherwise the load happens more than
once), and we need to ensure that it does not move across other
effectful ops (see #2366 for how we ensure this).This change is actually fairly simple, given that all the framework is
in place: we simply pattern-match a load on one operand of an ALU
instruction that takes an RMI (reg, mem, or immediate) operand, and
generate the mem form when we match.Also makes a drive-by improvement in the x64 backend to use
statically-monomorphizedLowerCtx
types rather than a&mut dyn LowerCtx
.On
bz2.wasm
, this results in ~1% instruction-count reduction. More is
likely possible by following up with other instructions that can merge
memory loads as well.This PR includes #2366 and also #2376 (I built on top of the latter because
otherwise there would be some merge conflicts due to their overlap); both
of those should land before this does.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Ah, yes, this test is actually testing that the merge doesn't happen (because doing so would cause a 32-bit load as part of a 32-bit add instruction, but the
i8
may be the last byte in the heap, for example). Added a comment to this effect.
cfallin submitted PR Review.
cfallin created PR Review Comment:
The idea here was to make the test invariant to possible regalloc changes; for some reason it's using
r12
here, which seemed odd to me. But, I suppose it's clearer to just do the Simple Thing and match the whole instruction; fixed.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done.
cfallin updated PR #2389 from x64-load-op
to main
:
This PR makes use of the support in #2366 for sinking effectful
instructions and merging them with consumers. In particular, on x86, we
want to make use of the ability of many instructions to load one operand
directly from memory. That is, instead of this:movq 0(%rdi), %rax addq %rax, %rbx
we want to generate this:
addq 0(%rdi), %rax
As described in more detail in #2366, sinking and merging the load is
only possible under certain conditions. In particular, we need to ensure
that the use is the only use (otherwise the load happens more than
once), and we need to ensure that it does not move across other
effectful ops (see #2366 for how we ensure this).This change is actually fairly simple, given that all the framework is
in place: we simply pattern-match a load on one operand of an ALU
instruction that takes an RMI (reg, mem, or immediate) operand, and
generate the mem form when we match.Also makes a drive-by improvement in the x64 backend to use
statically-monomorphizedLowerCtx
types rather than a&mut dyn LowerCtx
.On
bz2.wasm
, this results in ~1% instruction-count reduction. More is
likely possible by following up with other instructions that can merge
memory loads as well.This PR includes #2366 and also #2376 (I built on top of the latter because
otherwise there would be some merge conflicts due to their overlap); both
of those should land before this does.
cfallin merged PR #2389.
Last updated: Dec 23 2024 at 12:05 UTC