cfallin labeled issue #4123:
The code generation for (i) branches on booleans, and (ii) branches on integer values that come from compares but are not directly observable (e.g. in a different basic block), is suboptimal. We often see:
- Masking, like
AND reg, 1, because we pessimistically assume that the upper bits of a boolean value are undefined;cmp,setcc(x64) /cset(aarch64) into a register followed by a conditional branch sometime later;- A combination of the above two.
The root causes are:
- We do not pattern-match far enough back, in some cases, to fuse the
brzandicmpat the Cranelift level, and this is exacerbated by GVN and LICM that hoist icmps earlier in the function;- We do not have combination patterns that recognize when some producers of bools-as-integers will actually define the high bits.
Some combination of more aggressive pattern matching and demanded-bits analysis could improve the codegen in these cases.
cfallin labeled issue #4123:
The code generation for (i) branches on booleans, and (ii) branches on integer values that come from compares but are not directly observable (e.g. in a different basic block), is suboptimal. We often see:
- Masking, like
AND reg, 1, because we pessimistically assume that the upper bits of a boolean value are undefined;cmp,setcc(x64) /cset(aarch64) into a register followed by a conditional branch sometime later;- A combination of the above two.
The root causes are:
- We do not pattern-match far enough back, in some cases, to fuse the
brzandicmpat the Cranelift level, and this is exacerbated by GVN and LICM that hoist icmps earlier in the function;- We do not have combination patterns that recognize when some producers of bools-as-integers will actually define the high bits.
Some combination of more aggressive pattern matching and demanded-bits analysis could improve the codegen in these cases.
cfallin labeled issue #4123:
The code generation for (i) branches on booleans, and (ii) branches on integer values that come from compares but are not directly observable (e.g. in a different basic block), is suboptimal. We often see:
- Masking, like
AND reg, 1, because we pessimistically assume that the upper bits of a boolean value are undefined;cmp,setcc(x64) /cset(aarch64) into a register followed by a conditional branch sometime later;- A combination of the above two.
The root causes are:
- We do not pattern-match far enough back, in some cases, to fuse the
brzandicmpat the Cranelift level, and this is exacerbated by GVN and LICM that hoist icmps earlier in the function;- We do not have combination patterns that recognize when some producers of bools-as-integers will actually define the high bits.
Some combination of more aggressive pattern matching and demanded-bits analysis could improve the codegen in these cases.
cfallin opened issue #4123:
The code generation for (i) branches on booleans, and (ii) branches on integer values that come from compares but are not directly observable (e.g. in a different basic block), is suboptimal. We often see:
- Masking, like
AND reg, 1, because we pessimistically assume that the upper bits of a boolean value are undefined;cmp,setcc(x64) /cset(aarch64) into a register followed by a conditional branch sometime later;- A combination of the above two.
The root causes are:
- We do not pattern-match far enough back, in some cases, to fuse the
brzandicmpat the Cranelift level, and this is exacerbated by GVN and LICM that hoist icmps earlier in the function;- We do not have combination patterns that recognize when some producers of bools-as-integers will actually define the high bits.
Some combination of more aggressive pattern matching and demanded-bits analysis could improve the codegen in these cases.
sparker-arm commented on issue #4123:
Would preventing the legalizing of
br_icmpalso help here? And possibly even do the opposite and combinebrandicmp, when theicmphas single user?
cfallin commented on issue #4123:
Possibly... I'm a bit torn because our general principle is to decompose ops at the CLIF level, to allow for better optimization (this has certainly been our rule for SIMD for example). I could imagine a case where a
csetproduces a bool hoisted out of a loop, and the loop branches on that rather than a fresh compare, which reduces the loop-carried live set by one register. (This actually makes me think we might eventually want to have a notion of "hoisted for perf reasons, do not re-merge" feed from the mid-end into isel pattern matching, but that's a separate conversation!)An ad-hoc fusing pass is also somewhat brittle; imho it's better to have one place where we reason about macro-op matching (namely isel).
I think we should be able to do OK with better pattern matching, at least to remove the masking to start; but this is still pretty open to investigation!
Last updated: Dec 13 2025 at 19:03 UTC