wasmtime / issue #4123 Cranelift: improve codegen for boo... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #4123 Cranelift: improve codegen for boo...

Wasmtime GitHub notifications bot (May 10 2022 at 21:48):

cfallin labeled issue #4123:

The code generation for (i) branches on booleans, and (ii) branches on integer values that come from compares but are not directly observable (e.g. in a different basic block), is suboptimal. We often see:

Masking, like AND reg, 1, because we pessimistically assume that the upper bits of a boolean value are undefined;

cmp, setcc (x64) / cset (aarch64) into a register followed by a conditional branch sometime later;

A combination of the above two.

The root causes are:

We do not pattern-match far enough back, in some cases, to fuse the brz and icmp at the Cranelift level, and this is exacerbated by GVN and LICM that hoist icmps earlier in the function;

We do not have combination patterns that recognize when some producers of bools-as-integers will actually define the high bits.

Some combination of more aggressive pattern matching and demanded-bits analysis could improve the codegen in these cases.

Wasmtime GitHub notifications bot (May 10 2022 at 21:48):

cfallin labeled issue #4123:

The code generation for (i) branches on booleans, and (ii) branches on integer values that come from compares but are not directly observable (e.g. in a different basic block), is suboptimal. We often see:

Masking, like AND reg, 1, because we pessimistically assume that the upper bits of a boolean value are undefined;

cmp, setcc (x64) / cset (aarch64) into a register followed by a conditional branch sometime later;

A combination of the above two.

The root causes are:

We do not pattern-match far enough back, in some cases, to fuse the brz and icmp at the Cranelift level, and this is exacerbated by GVN and LICM that hoist icmps earlier in the function;

We do not have combination patterns that recognize when some producers of bools-as-integers will actually define the high bits.

Some combination of more aggressive pattern matching and demanded-bits analysis could improve the codegen in these cases.

Wasmtime GitHub notifications bot (May 10 2022 at 21:48):

cfallin labeled issue #4123:

The code generation for (i) branches on booleans, and (ii) branches on integer values that come from compares but are not directly observable (e.g. in a different basic block), is suboptimal. We often see:

Masking, like AND reg, 1, because we pessimistically assume that the upper bits of a boolean value are undefined;

cmp, setcc (x64) / cset (aarch64) into a register followed by a conditional branch sometime later;

A combination of the above two.

The root causes are:

We do not pattern-match far enough back, in some cases, to fuse the brz and icmp at the Cranelift level, and this is exacerbated by GVN and LICM that hoist icmps earlier in the function;

We do not have combination patterns that recognize when some producers of bools-as-integers will actually define the high bits.

Some combination of more aggressive pattern matching and demanded-bits analysis could improve the codegen in these cases.

Wasmtime GitHub notifications bot (May 10 2022 at 21:48):

cfallin opened issue #4123:

The code generation for (i) branches on booleans, and (ii) branches on integer values that come from compares but are not directly observable (e.g. in a different basic block), is suboptimal. We often see:

Masking, like AND reg, 1, because we pessimistically assume that the upper bits of a boolean value are undefined;

cmp, setcc (x64) / cset (aarch64) into a register followed by a conditional branch sometime later;

A combination of the above two.

The root causes are:

We do not pattern-match far enough back, in some cases, to fuse the brz and icmp at the Cranelift level, and this is exacerbated by GVN and LICM that hoist icmps earlier in the function;

We do not have combination patterns that recognize when some producers of bools-as-integers will actually define the high bits.

Some combination of more aggressive pattern matching and demanded-bits analysis could improve the codegen in these cases.

Wasmtime GitHub notifications bot (May 31 2022 at 10:00):

sparker-arm commented on issue #4123:

Would preventing the legalizing of br_icmp also help here? And possibly even do the opposite and combine br and icmp, when the icmp has single user?

Wasmtime GitHub notifications bot (May 31 2022 at 16:43):

cfallin commented on issue #4123:

Possibly... I'm a bit torn because our general principle is to decompose ops at the CLIF level, to allow for better optimization (this has certainly been our rule for SIMD for example). I could imagine a case where a cset produces a bool hoisted out of a loop, and the loop branches on that rather than a fresh compare, which reduces the loop-carried live set by one register. (This actually makes me think we might eventually want to have a notion of "hoisted for perf reasons, do not re-merge" feed from the mid-end into isel pattern matching, but that's a separate conversation!)

An ad-hoc fusing pass is also somewhat brittle; imho it's better to have one place where we reason about macro-op matching (namely isel).

I think we should be able to do OK with better pattern matching, at least to remove the masking to start; but this is still pretty open to investigation!

Last updated: Apr 17 2025 at 08:04 UTC