Stream: git-wasmtime

Topic: wasmtime / issue #4133 Cranelift: develop demanded-bits a...


view this post on Zulip Wasmtime GitHub notifications bot (May 10 2022 at 23:45):

cfallin opened issue #4133:

In some cases, it is possible to perform better optimizations and instruction selection if one knows that only part of a value is needed (demanded bits). At other times, it is possible to do better if one knows that the operation or instruction actually generating a value sets more bits than are necessary in the destination register (defined-bits).

We should develop these analyses and use them during optimization and during lowering. Some examples:

The demanded-bits analysis should be at the CLIF level as it is a machine-independent concept (what bits do the CLIF semantics of the use(s) actually depend on). The defined-bits analysis is fundamentally a lowered-MachInst-level concept, as it has to do (in our case, since no bits within the type are undefined) with bits above a value in a register (e.g., upper 32 for an I32 a 64-bit register).

view this post on Zulip Wasmtime GitHub notifications bot (May 10 2022 at 23:45):

cfallin labeled issue #4133:

In some cases, it is possible to perform better optimizations and instruction selection if one knows that only part of a value is needed (demanded bits). At other times, it is possible to do better if one knows that the operation or instruction actually generating a value sets more bits than are necessary in the destination register (defined-bits).

We should develop these analyses and use them during optimization and during lowering. Some examples:

The demanded-bits analysis should be at the CLIF level as it is a machine-independent concept (what bits do the CLIF semantics of the use(s) actually depend on). The defined-bits analysis is fundamentally a lowered-MachInst-level concept, as it has to do (in our case, since no bits within the type are undefined) with bits above a value in a register (e.g., upper 32 for an I32 a 64-bit register).

view this post on Zulip Wasmtime GitHub notifications bot (May 10 2022 at 23:45):

cfallin labeled issue #4133:

In some cases, it is possible to perform better optimizations and instruction selection if one knows that only part of a value is needed (demanded bits). At other times, it is possible to do better if one knows that the operation or instruction actually generating a value sets more bits than are necessary in the destination register (defined-bits).

We should develop these analyses and use them during optimization and during lowering. Some examples:

The demanded-bits analysis should be at the CLIF level as it is a machine-independent concept (what bits do the CLIF semantics of the use(s) actually depend on). The defined-bits analysis is fundamentally a lowered-MachInst-level concept, as it has to do (in our case, since no bits within the type are undefined) with bits above a value in a register (e.g., upper 32 for an I32 a 64-bit register).

view this post on Zulip Wasmtime GitHub notifications bot (May 11 2022 at 17:04):

fitzgen commented on issue #4133:

The defined-bits analysis is fundamentally a lowered-MachInst-level concept, as it has to do (in our case, since no bits within the type are undefined) with bits above a value in a register

I just realized I've been saying "defined bits" in conversation with you when I meant "known bits". I was about to comment that there is nothing fundamentally register-oriented about it, but of course I was talking about the wrong thing.

For posterity/clarity, _known_ bits is a forwards analysis where we logically keep track of this join semi-lattice per bit in a value:

  ?
 / \
0   1

and we would have transfer functions that do things like

    ????0011
bor ????????
------------
    ??????11

It's like per-bit (rather than per-value) partial evaluation.

This is totally something we can do at the CLIF level. Nothing here is fundamentally dealing with registers here.

In addition to being useful for peepholes, we can do neat things like remove CFG edges that we know will never be taken (look at the known bits for a br_table's block index argument and remove edges to the table's blocks that conflict with those known bits). The CFG edge trimming is something that is definitely more of a CLIF thing than a mach inst thing.

view this post on Zulip Wasmtime GitHub notifications bot (May 11 2022 at 17:37):

cfallin commented on issue #4133:

That totally makes sense too! Per-bit partial evaluation (or per-bit constant folding I guess) is a good description. A value with all known bits graduates to a full iconst then and can enable further opts.


Last updated: Oct 23 2024 at 20:03 UTC