fitzgen labeled issue #6051:
I'm seeing a bunch of 3-5 target
br_table
s inspidermonkey.wasm
, and I suspect that a series of conditional branches might perform better than these smallbr_table
s and their indirect jumps. That they are showing up inspidermonkey.wasm
means that LLVM's Wasm backend is regularly emitting them, so this wouldn't be too overly specific of an optimization.Could potentially do this in the mid-end or during lowering. Haven't thought too much about it.
fitzgen opened issue #6051:
I'm seeing a bunch of 3-5 target
br_table
s inspidermonkey.wasm
, and I suspect that a series of conditional branches might perform better than these smallbr_table
s and their indirect jumps. That they are showing up inspidermonkey.wasm
means that LLVM's Wasm backend is regularly emitting them, so this wouldn't be too overly specific of an optimization.Could potentially do this in the mid-end or during lowering. Haven't thought too much about it.
cfallin commented on issue #6051:
We could definitely do it during lowering; we could have pseudoinstruction sequences for powers of two (2, 4, 8 targets at least) that are full binary trees, and if we have e.g. 7 targets instead then fill in the remainders with the default label.
One consideration here is to ensure that we don't lose the Spectre mitigation properties: we have a cmove to short-circuit the loaded target to default on out-of-bounds, and modern CPUs (i) guarantee that cmoves are not data-speculated, generally (or have a special incantation a la aarch64's
csdb
barrier to get this guarantee), and (ii) have thought specifically about the Spectre implications of indirect branch prediction and mitigated that. A tree of conditional branches may also be safe (I'm not aware of any CPUs that feed data from speculative execution back into the direction predictor) but we'd have to be sure.
afonso360 commented on issue #6051:
We have similar logic in the the switch interface where it can emit trees of branches if it thinks is beneficial. We could do a quick test by replacing all
br_table
s with that, it also falls back tobr_table
when necessary.
jameysharp commented on issue #6051:
If you give the switch helper in cranelift-frontend a single contiguous range of cases, it will always generate a single
br_table
for the whole range. The only exceptions are when there are no cases (which just jumps to the default case) or when there's exactly one case (which generates abrif
to choose between the case and the default). So that isn't as helpful as it looks.I think I'd weakly argue against doing this during lowering, because I think that will impede branch optimization. As long as we represent the search tree explicitly in CLIF, the block lowering order will be able to see that control flow and have the opportunity to fuse jump-target blocks into the tree. If we use pseudo-instructions then the branches will be emitted too late for that.
Last updated: Nov 22 2024 at 16:03 UTC