Stream: git-wasmtime

Topic: wasmtime / PR #2389 x64 backend: merge loads into ALU ops...


view this post on Zulip Wasmtime GitHub notifications bot (Nov 11 2020 at 01:35):

cfallin opened PR #2389 from x64-load-op to main:

This PR makes use of the support in #2366 for sinking effectful
instructions and merging them with consumers. In particular, on x86, we
want to make use of the ability of many instructions to load one operand
directly from memory. That is, instead of this:

    movq 0(%rdi), %rax
    addq %rax, %rbx

we want to generate this:

    addq 0(%rdi), %rax

As described in more detail in #2366, sinking and merging the load is
only possible under certain conditions. In particular, we need to ensure
that the use is the only use (otherwise the load happens more than
once), and we need to ensure that it does not move across other
effectful ops (see #2366 for how we ensure this).

This change is actually fairly simple, given that all the framework is
in place: we simply pattern-match a load on one operand of an ALU
instruction that takes an RMI (reg, mem, or immediate) operand, and
generate the mem form when we match.

Also makes a drive-by improvement in the x64 backend to use
statically-monomorphized LowerCtx types rather than a &mut dyn LowerCtx.

On bz2.wasm, this results in ~1% instruction-count reduction. More is
likely possible by following up with other instructions that can merge
memory loads as well.

This PR includes #2366 and also #2376 (I built on top of the latter because
otherwise there would be some merge conflicts due to their overlap); both
of those should land before this does.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 11 2020 at 01:35):

cfallin requested abrown and julian-seward1 for a review on PR #2389.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 11 2020 at 01:35):

cfallin requested abrown and julian-seward1 for a review on PR #2389.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 11 2020 at 01:36):

cfallin updated PR #2389 from x64-load-op to main:

This PR makes use of the support in #2366 for sinking effectful
instructions and merging them with consumers. In particular, on x86, we
want to make use of the ability of many instructions to load one operand
directly from memory. That is, instead of this:

    movq 0(%rdi), %rax
    addq %rax, %rbx

we want to generate this:

    addq 0(%rdi), %rax

As described in more detail in #2366, sinking and merging the load is
only possible under certain conditions. In particular, we need to ensure
that the use is the only use (otherwise the load happens more than
once), and we need to ensure that it does not move across other
effectful ops (see #2366 for how we ensure this).

This change is actually fairly simple, given that all the framework is
in place: we simply pattern-match a load on one operand of an ALU
instruction that takes an RMI (reg, mem, or immediate) operand, and
generate the mem form when we match.

Also makes a drive-by improvement in the x64 backend to use
statically-monomorphized LowerCtx types rather than a &mut dyn LowerCtx.

On bz2.wasm, this results in ~1% instruction-count reduction. More is
likely possible by following up with other instructions that can merge
memory loads as well.

This PR includes #2366 and also #2376 (I built on top of the latter because
otherwise there would be some merge conflicts due to their overlap); both
of those should land before this does.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 00:13):

cfallin updated PR #2389 from x64-load-op to main:

This PR makes use of the support in #2366 for sinking effectful
instructions and merging them with consumers. In particular, on x86, we
want to make use of the ability of many instructions to load one operand
directly from memory. That is, instead of this:

    movq 0(%rdi), %rax
    addq %rax, %rbx

we want to generate this:

    addq 0(%rdi), %rax

As described in more detail in #2366, sinking and merging the load is
only possible under certain conditions. In particular, we need to ensure
that the use is the only use (otherwise the load happens more than
once), and we need to ensure that it does not move across other
effectful ops (see #2366 for how we ensure this).

This change is actually fairly simple, given that all the framework is
in place: we simply pattern-match a load on one operand of an ALU
instruction that takes an RMI (reg, mem, or immediate) operand, and
generate the mem form when we match.

Also makes a drive-by improvement in the x64 backend to use
statically-monomorphized LowerCtx types rather than a &mut dyn LowerCtx.

On bz2.wasm, this results in ~1% instruction-count reduction. More is
likely possible by following up with other instructions that can merge
memory loads as well.

This PR includes #2366 and also #2376 (I built on top of the latter because
otherwise there would be some merge conflicts due to their overlap); both
of those should land before this does.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 16:06):

cfallin updated PR #2389 from x64-load-op to main:

This PR makes use of the support in #2366 for sinking effectful
instructions and merging them with consumers. In particular, on x86, we
want to make use of the ability of many instructions to load one operand
directly from memory. That is, instead of this:

    movq 0(%rdi), %rax
    addq %rax, %rbx

we want to generate this:

    addq 0(%rdi), %rax

As described in more detail in #2366, sinking and merging the load is
only possible under certain conditions. In particular, we need to ensure
that the use is the only use (otherwise the load happens more than
once), and we need to ensure that it does not move across other
effectful ops (see #2366 for how we ensure this).

This change is actually fairly simple, given that all the framework is
in place: we simply pattern-match a load on one operand of an ALU
instruction that takes an RMI (reg, mem, or immediate) operand, and
generate the mem form when we match.

Also makes a drive-by improvement in the x64 backend to use
statically-monomorphized LowerCtx types rather than a &mut dyn LowerCtx.

On bz2.wasm, this results in ~1% instruction-count reduction. More is
likely possible by following up with other instructions that can merge
memory loads as well.

This PR includes #2366 and also #2376 (I built on top of the latter because
otherwise there would be some merge conflicts due to their overlap); both
of those should land before this does.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 17:39):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 17:39):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 17:39):

julian-seward1 created PR Review Comment:

Ditto. Also, can you add a 1-liner saying what src_inst is? Is it the load? The insn using the result of the load? etc

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 17:39):

julian-seward1 created PR Review Comment:

Confusing that the second operand of the addl is missing; can this be added? Ditto for the cases below.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 17:39):

julian-seward1 created PR Review Comment:

Uh, this output is so minimal I don't really understand. If the expected output begins movzbq, it seems like this isn't testing load-op merging at all.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 17:39):

julian-seward1 created PR Review Comment:

Please add a 1-liner comment saying roughly what this does.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 19:03):

cfallin updated PR #2389 from x64-load-op to main:

This PR makes use of the support in #2366 for sinking effectful
instructions and merging them with consumers. In particular, on x86, we
want to make use of the ability of many instructions to load one operand
directly from memory. That is, instead of this:

    movq 0(%rdi), %rax
    addq %rax, %rbx

we want to generate this:

    addq 0(%rdi), %rax

As described in more detail in #2366, sinking and merging the load is
only possible under certain conditions. In particular, we need to ensure
that the use is the only use (otherwise the load happens more than
once), and we need to ensure that it does not move across other
effectful ops (see #2366 for how we ensure this).

This change is actually fairly simple, given that all the framework is
in place: we simply pattern-match a load on one operand of an ALU
instruction that takes an RMI (reg, mem, or immediate) operand, and
generate the mem form when we match.

Also makes a drive-by improvement in the x64 backend to use
statically-monomorphized LowerCtx types rather than a &mut dyn LowerCtx.

On bz2.wasm, this results in ~1% instruction-count reduction. More is
likely possible by following up with other instructions that can merge
memory loads as well.

This PR includes #2366 and also #2376 (I built on top of the latter because
otherwise there would be some merge conflicts due to their overlap); both
of those should land before this does.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 19:04):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 19:04):

cfallin created PR Review Comment:

Ah, yes, this test is actually testing that the merge doesn't happen (because doing so would cause a 32-bit load as part of a 32-bit add instruction, but the i8 may be the last byte in the heap, for example). Added a comment to this effect.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 19:06):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 19:06):

cfallin created PR Review Comment:

The idea here was to make the test invariant to possible regalloc changes; for some reason it's using r12 here, which seemed odd to me. But, I suppose it's clearer to just do the Simple Thing and match the whole instruction; fixed.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 19:06):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 19:06):

cfallin created PR Review Comment:

Done.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 19:06):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 19:06):

cfallin created PR Review Comment:

Done.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 19:06):

cfallin updated PR #2389 from x64-load-op to main:

This PR makes use of the support in #2366 for sinking effectful
instructions and merging them with consumers. In particular, on x86, we
want to make use of the ability of many instructions to load one operand
directly from memory. That is, instead of this:

    movq 0(%rdi), %rax
    addq %rax, %rbx

we want to generate this:

    addq 0(%rdi), %rax

As described in more detail in #2366, sinking and merging the load is
only possible under certain conditions. In particular, we need to ensure
that the use is the only use (otherwise the load happens more than
once), and we need to ensure that it does not move across other
effectful ops (see #2366 for how we ensure this).

This change is actually fairly simple, given that all the framework is
in place: we simply pattern-match a load on one operand of an ALU
instruction that takes an RMI (reg, mem, or immediate) operand, and
generate the mem form when we match.

Also makes a drive-by improvement in the x64 backend to use
statically-monomorphized LowerCtx types rather than a &mut dyn LowerCtx.

On bz2.wasm, this results in ~1% instruction-count reduction. More is
likely possible by following up with other instructions that can merge
memory loads as well.

This PR includes #2366 and also #2376 (I built on top of the latter because
otherwise there would be some merge conflicts due to their overlap); both
of those should land before this does.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 17 2020 at 19:42):

cfallin merged PR #2389.


Last updated: Jan 24 2025 at 00:11 UTC