fitzgen opened issue #5783:
Here are some synthesized optimizations for CLIF harvested from
spidermonkey.wasm
with explicit bounds checks enabled. I won't have time to
investigate, generalize, or implement them before I go on vacation, so I want to
note them down in an issue for posterity.
Replacing
clz(x) == 0
with a comparison:========= ./2559773154913247904.result ========= %0:i32 = var %1:i32 = ctlz %0 %2:i1 = eq %1, 0:i32 infer %2 ; RHS inferred successfully %3:i1 = ult 2147483647:i32, %0 result %3
Similarly, we can do this for
clz(x) == 1
, which wasn't harvested from the
CLIF:%0:i32 = var %1:i32 = ctlz %0 %2:i1 = eq %1, 1:i32 infer %2 ; RHS inferred successfully %3:i1 = slt 1073741823:i32, %0 result %3
Note that there is no generalization for
clz(x) == C
available here (unless
C
is larger thanx
's bit width, in which case it is always false). It only
works for zero and one because we don't need to check for a lower bound onx
.Probably a similar rewrite we could do with
clz(x) == OP_BIT_WIDTH
.
"Reverse" const propagation of a
shl
into aselect
:========= ./11901186248914954319.result ========= %1:i1 = var %2:i32 = select %1, 9:i32, 1:i32 %3:i32 = shl -1:i32, %2 infer %3 ; RHS inferred successfully %3:i32 = select %1, 4294966784:i32, 4294967294:i32 result %3
Maybe not profitable if
%2
is used multiple times, in which case%1
's live
range might be extended after this optimization. But I guess this is always true
with these sorts of peepholes...
Haven't dug into this one yet:
========= ./2969648510667058023.result ========= %0:i32 = var %1:i32 = and %0, 2147483647:i32 %2:i32 = shl %1, 1:i32 infer %2 ; RHS inferred successfully %3:i32 = add %0, %0 result %3
A bunch of masking off bits that will be shifted out anyways:
========= ./4985840733093600201.result ========= %0:i32 = var %1:i32 = and %0, 1073741823:i32 %2:i32 = shl %1, 2:i32 infer %2 ; RHS inferred successfully %3:i32 = shl %0, 2:i32 result %3 ========= ./17860953418551930245.result ========= %0:i32 = var %1:i32 = and %0, 268435455:i32 %2:i32 = shl %1, 4:i32 infer %2 ; RHS inferred successfully %3:i32 = shl %0, 4:i32 result %3 ========= ./8617343957324108668.result ========= %0:i32 = var %1:i32 = and %0, 536870911:i32 %2:i32 = shl %1, 3:i32 infer %2 ; RHS inferred successfully %3:i32 = shl %0, 3:i32 result %3
Replacing an
and
and aneq
with anult
. Haven't thought about these yet.========= ./9633751011751143656.result ========= %0:i32 = var %1:i32 = and %0, -2147483648:i32 %2:i1 = eq %1, 0:i32 infer %2 ; RHS inferred successfully %3:i1 = ult %0, 2147483648:i32 result %3 ========= ./3840756061434214754.result ========= %0:i32 = var %1:i32 = and %0, 4294967280:i32 %2:i1 = eq %1, 0:i32 infer %2 ; RHS inferred successfully %3:i1 = ult %0, 16:i32 result %3
Unnecessary
or
s:========= ./14551618322220925804.result ========= %0:i32 = var %1:i32 = or %0, 1024:i32 %2:i32 = and %1, 2048:i32 infer %2 ; RHS inferred successfully %3:i32 = and 2048:i32, %0 result %3 ========= ./13190508444419006141.result ========= %0:i32 = var %1:i32 = or %0, 33032:i32 %2:i32 = and %1, 192:i32 infer %2 ; RHS inferred successfully %3:i32 = and 192:i32, %0 result %3
More masking off bits that will be shifted away:
========= ./3703682576609255056.result ========= %0:i32 = var %1:i32 = and %0, 255:i32 %2:i32 = shl %1, 24:i32 infer %2 ; RHS inferred successfully %3:i32 = shl %0, 24:i32 result %3 ========= ./5310030443405410098.result ========= %0:i32 = var %1:i32 = and %0, 255:i32 %2:i32 = shl %1, 25:i32 infer %2 ; RHS inferred successfully %3:i32 = shl %0, 25:i32 result %3
A ton of "reverse" const prop with comparisons found, here are a few:
========= ./13877742632332331899.result ========= %0:i32 = var %1:i32 = add %0, 1:i32 %2:i1 = eq %1, 1:i32 infer %2 ; RHS inferred successfully %3:i1 = eq 0:i32, %0 result %3 ========= ./3568272213429168939.result ========= %0:i32 = var %1:i32 = add %0, 44:i32 %2:i1 = eq %1, 3608:i32 infer %2 ; RHS inferred successfully %3:i1 = eq 3564:i32, %0 result %3 ========= ./12119931432587256453.result ========= %0:i32 = var %1:i32 = ashr %0, 2:i32 %2:i1 = slt %1, 24:i32 infer %2 ; RHS inferred successfully %3:i1 = slt %0, 96:i32 result %3 ========= ./7033427926372082757.result ========= %0:i32 = var %1:i32 = and %0, -8:i32 %2:i1 = sle %1, 31:i32 infer %2 ; RHS inferred successfully %3:i1 = slt %0, 32:i32 result %3
And also a bunch "reverse" const prop with other operators as well (not as many
as with comparisons though). Again, here are a few:========= ./7402500702784523674.result ========= %0:i32 = var %1:i32 = shl %0, 1:i32 %2:i32 = shl %1, 16:i32 infer %2 ; RHS inferred successfully %3:i32 = shl %0, 17:i32 result %3 ========= ./10936462652029262118.result ========= %0:i32 = var %1:i32 = shl %0, 1:i32 %2:i32 = mul %1, 12:i32 infer %2 ; RHS inferred successfully %3:i32 = mul 24:i32, %0 result %3 ========= ./13431533325972989820.result ========= %0:i32 = var %1:i32 = shl -1:i32, %0 %2:i32 = shl %1, 2:i32 infer %2 ; RHS inferred successfully %3:i32 = shl 4294967292:i32, %0 result %3
fitzgen labeled issue #5783:
Here are some synthesized optimizations for CLIF harvested from
spidermonkey.wasm
with explicit bounds checks enabled. I won't have time to
investigate, generalize, or implement them before I go on vacation, so I want to
note them down in an issue for posterity.
Replacing
clz(x) == 0
with a comparison:========= ./2559773154913247904.result ========= %0:i32 = var %1:i32 = ctlz %0 %2:i1 = eq %1, 0:i32 infer %2 ; RHS inferred successfully %3:i1 = ult 2147483647:i32, %0 result %3
Similarly, we can do this for
clz(x) == 1
, which wasn't harvested from the
CLIF:%0:i32 = var %1:i32 = ctlz %0 %2:i1 = eq %1, 1:i32 infer %2 ; RHS inferred successfully %3:i1 = slt 1073741823:i32, %0 result %3
Note that there is no generalization for
clz(x) == C
available here (unless
C
is larger thanx
's bit width, in which case it is always false). It only
works for zero and one because we don't need to check for a lower bound onx
.Probably a similar rewrite we could do with
clz(x) == OP_BIT_WIDTH
.
"Reverse" const propagation of a
shl
into aselect
:========= ./11901186248914954319.result ========= %1:i1 = var %2:i32 = select %1, 9:i32, 1:i32 %3:i32 = shl -1:i32, %2 infer %3 ; RHS inferred successfully %3:i32 = select %1, 4294966784:i32, 4294967294:i32 result %3
Maybe not profitable if
%2
is used multiple times, in which case%1
's live
range might be extended after this optimization. But I guess this is always true
with these sorts of peepholes...
Haven't dug into this one yet:
========= ./2969648510667058023.result ========= %0:i32 = var %1:i32 = and %0, 2147483647:i32 %2:i32 = shl %1, 1:i32 infer %2 ; RHS inferred successfully %3:i32 = add %0, %0 result %3
A bunch of masking off bits that will be shifted out anyways:
========= ./4985840733093600201.result ========= %0:i32 = var %1:i32 = and %0, 1073741823:i32 %2:i32 = shl %1, 2:i32 infer %2 ; RHS inferred successfully %3:i32 = shl %0, 2:i32 result %3 ========= ./17860953418551930245.result ========= %0:i32 = var %1:i32 = and %0, 268435455:i32 %2:i32 = shl %1, 4:i32 infer %2 ; RHS inferred successfully %3:i32 = shl %0, 4:i32 result %3 ========= ./8617343957324108668.result ========= %0:i32 = var %1:i32 = and %0, 536870911:i32 %2:i32 = shl %1, 3:i32 infer %2 ; RHS inferred successfully %3:i32 = shl %0, 3:i32 result %3
Replacing an
and
and aneq
with anult
. Haven't thought about these yet.========= ./9633751011751143656.result ========= %0:i32 = var %1:i32 = and %0, -2147483648:i32 %2:i1 = eq %1, 0:i32 infer %2 ; RHS inferred successfully %3:i1 = ult %0, 2147483648:i32 result %3 ========= ./3840756061434214754.result ========= %0:i32 = var %1:i32 = and %0, 4294967280:i32 %2:i1 = eq %1, 0:i32 infer %2 ; RHS inferred successfully %3:i1 = ult %0, 16:i32 result %3
Unnecessary
or
s:========= ./14551618322220925804.result ========= %0:i32 = var %1:i32 = or %0, 1024:i32 %2:i32 = and %1, 2048:i32 infer %2 ; RHS inferred successfully %3:i32 = and 2048:i32, %0 result %3 ========= ./13190508444419006141.result ========= %0:i32 = var %1:i32 = or %0, 33032:i32 %2:i32 = and %1, 192:i32 infer %2 ; RHS inferred successfully %3:i32 = and 192:i32, %0 result %3
More masking off bits that will be shifted away:
========= ./3703682576609255056.result ========= %0:i32 = var %1:i32 = and %0, 255:i32 %2:i32 = shl %1, 24:i32 infer %2 ; RHS inferred successfully %3:i32 = shl %0, 24:i32 result %3 ========= ./5310030443405410098.result ========= %0:i32 = var %1:i32 = and %0, 255:i32 %2:i32 = shl %1, 25:i32 infer %2 ; RHS inferred successfully %3:i32 = shl %0, 25:i32 result %3
A ton of "reverse" const prop with comparisons found, here are a few:
========= ./13877742632332331899.result ========= %0:i32 = var %1:i32 = add %0, 1:i32 %2:i1 = eq %1, 1:i32 infer %2 ; RHS inferred successfully %3:i1 = eq 0:i32, %0 result %3 ========= ./3568272213429168939.result ========= %0:i32 = var %1:i32 = add %0, 44:i32 %2:i1 = eq %1, 3608:i32 infer %2 ; RHS inferred successfully %3:i1 = eq 3564:i32, %0 result %3 ========= ./12119931432587256453.result ========= %0:i32 = var %1:i32 = ashr %0, 2:i32 %2:i1 = slt %1, 24:i32 infer %2 ; RHS inferred successfully %3:i1 = slt %0, 96:i32 result %3 ========= ./7033427926372082757.result ========= %0:i32 = var %1:i32 = and %0, -8:i32 %2:i1 = sle %1, 31:i32 infer %2 ; RHS inferred successfully %3:i1 = slt %0, 32:i32 result %3
And also a bunch "reverse" const prop with other operators as well (not as many
as with comparisons though). Again, here are a few:========= ./7402500702784523674.result ========= %0:i32 = var %1:i32 = shl %0, 1:i32 %2:i32 = shl %1, 16:i32 infer %2 ; RHS inferred successfully %3:i32 = shl %0, 17:i32 result %3 ========= ./10936462652029262118.result ========= %0:i32 = var %1:i32 = shl %0, 1:i32 %2:i32 = mul %1, 12:i32 infer %2 ; RHS inferred successfully %3:i32 = mul 24:i32, %0 result %3 ========= ./13431533325972989820.result ========= %0:i32 = var %1:i32 = shl -1:i32, %0 %2:i32 = shl %1, 2:i32 infer %2 ; RHS inferred successfully %3:i32 = shl 4294967292:i32, %0 result %3
fitzgen labeled issue #5783:
Here are some synthesized optimizations for CLIF harvested from
spidermonkey.wasm
with explicit bounds checks enabled. I won't have time to
investigate, generalize, or implement them before I go on vacation, so I want to
note them down in an issue for posterity.
Replacing
clz(x) == 0
with a comparison:========= ./2559773154913247904.result ========= %0:i32 = var %1:i32 = ctlz %0 %2:i1 = eq %1, 0:i32 infer %2 ; RHS inferred successfully %3:i1 = ult 2147483647:i32, %0 result %3
Similarly, we can do this for
clz(x) == 1
, which wasn't harvested from the
CLIF:%0:i32 = var %1:i32 = ctlz %0 %2:i1 = eq %1, 1:i32 infer %2 ; RHS inferred successfully %3:i1 = slt 1073741823:i32, %0 result %3
Note that there is no generalization for
clz(x) == C
available here (unless
C
is larger thanx
's bit width, in which case it is always false). It only
works for zero and one because we don't need to check for a lower bound onx
.Probably a similar rewrite we could do with
clz(x) == OP_BIT_WIDTH
.
"Reverse" const propagation of a
shl
into aselect
:========= ./11901186248914954319.result ========= %1:i1 = var %2:i32 = select %1, 9:i32, 1:i32 %3:i32 = shl -1:i32, %2 infer %3 ; RHS inferred successfully %3:i32 = select %1, 4294966784:i32, 4294967294:i32 result %3
Maybe not profitable if
%2
is used multiple times, in which case%1
's live
range might be extended after this optimization. But I guess this is always true
with these sorts of peepholes...
Haven't dug into this one yet:
========= ./2969648510667058023.result ========= %0:i32 = var %1:i32 = and %0, 2147483647:i32 %2:i32 = shl %1, 1:i32 infer %2 ; RHS inferred successfully %3:i32 = add %0, %0 result %3
A bunch of masking off bits that will be shifted out anyways:
========= ./4985840733093600201.result ========= %0:i32 = var %1:i32 = and %0, 1073741823:i32 %2:i32 = shl %1, 2:i32 infer %2 ; RHS inferred successfully %3:i32 = shl %0, 2:i32 result %3 ========= ./17860953418551930245.result ========= %0:i32 = var %1:i32 = and %0, 268435455:i32 %2:i32 = shl %1, 4:i32 infer %2 ; RHS inferred successfully %3:i32 = shl %0, 4:i32 result %3 ========= ./8617343957324108668.result ========= %0:i32 = var %1:i32 = and %0, 536870911:i32 %2:i32 = shl %1, 3:i32 infer %2 ; RHS inferred successfully %3:i32 = shl %0, 3:i32 result %3
Replacing an
and
and aneq
with anult
. Haven't thought about these yet.========= ./9633751011751143656.result ========= %0:i32 = var %1:i32 = and %0, -2147483648:i32 %2:i1 = eq %1, 0:i32 infer %2 ; RHS inferred successfully %3:i1 = ult %0, 2147483648:i32 result %3 ========= ./3840756061434214754.result ========= %0:i32 = var %1:i32 = and %0, 4294967280:i32 %2:i1 = eq %1, 0:i32 infer %2 ; RHS inferred successfully %3:i1 = ult %0, 16:i32 result %3
Unnecessary
or
s:========= ./14551618322220925804.result ========= %0:i32 = var %1:i32 = or %0, 1024:i32 %2:i32 = and %1, 2048:i32 infer %2 ; RHS inferred successfully %3:i32 = and 2048:i32, %0 result %3 ========= ./13190508444419006141.result ========= %0:i32 = var %1:i32 = or %0, 33032:i32 %2:i32 = and %1, 192:i32 infer %2 ; RHS inferred successfully %3:i32 = and 192:i32, %0 result %3
More masking off bits that will be shifted away:
========= ./3703682576609255056.result ========= %0:i32 = var %1:i32 = and %0, 255:i32 %2:i32 = shl %1, 24:i32 infer %2 ; RHS inferred successfully %3:i32 = shl %0, 24:i32 result %3 ========= ./5310030443405410098.result ========= %0:i32 = var %1:i32 = and %0, 255:i32 %2:i32 = shl %1, 25:i32 infer %2 ; RHS inferred successfully %3:i32 = shl %0, 25:i32 result %3
A ton of "reverse" const prop with comparisons found, here are a few:
========= ./13877742632332331899.result ========= %0:i32 = var %1:i32 = add %0, 1:i32 %2:i1 = eq %1, 1:i32 infer %2 ; RHS inferred successfully %3:i1 = eq 0:i32, %0 result %3 ========= ./3568272213429168939.result ========= %0:i32 = var %1:i32 = add %0, 44:i32 %2:i1 = eq %1, 3608:i32 infer %2 ; RHS inferred successfully %3:i1 = eq 3564:i32, %0 result %3 ========= ./12119931432587256453.result ========= %0:i32 = var %1:i32 = ashr %0, 2:i32 %2:i1 = slt %1, 24:i32 infer %2 ; RHS inferred successfully %3:i1 = slt %0, 96:i32 result %3 ========= ./7033427926372082757.result ========= %0:i32 = var %1:i32 = and %0, -8:i32 %2:i1 = sle %1, 31:i32 infer %2 ; RHS inferred successfully %3:i1 = slt %0, 32:i32 result %3
And also a bunch "reverse" const prop with other operators as well (not as many
as with comparisons though). Again, here are a few:========= ./7402500702784523674.result ========= %0:i32 = var %1:i32 = shl %0, 1:i32 %2:i32 = shl %1, 16:i32 infer %2 ; RHS inferred successfully %3:i32 = shl %0, 17:i32 result %3 ========= ./10936462652029262118.result ========= %0:i32 = var %1:i32 = shl %0, 1:i32 %2:i32 = mul %1, 12:i32 infer %2 ; RHS inferred successfully %3:i32 = mul 24:i32, %0 result %3 ========= ./13431533325972989820.result ========= %0:i32 = var %1:i32 = shl -1:i32, %0 %2:i32 = shl %1, 2:i32 infer %2 ; RHS inferred successfully %3:i32 = shl 4294967292:i32, %0 result %3
afonso360 commented on issue #5783:
I was looking into souper the other day. I wasn't able to set it up properly, but I noticed we are missing a translation for
Bswap
in our harvest program. So there might be a few more optimizations possible there.
regehr commented on issue #5783:
hello, please see this list of generalized versions of these:
https://gist.github.com/manasij7479/602e770d45169d5ffa73d8cd100d5b05
maybe read them over and if you have any questions, ask me and @manasij7479 to explain.
in Hydra's output,
sext(1)
is just a bitwidth-independent way to say-1
hope this is useful!
regehr commented on issue #5783:
I should add that in the general case, generalizing an optimization is not a well-posed problem. there are often multiple solutions, and it is often the case that it's not totally clear which one to prefer. so while the Hydra output is (hopefully) correct, it probably contains some examples where you might want to generalize things a different way in practice.
also, this notation:
(sext(1) << x0) << C2 => (sext(1) << C2) << x0
comes from a pretty-printer and is intended to make things easy to read for humans. the machine-readable form is Hydra's internal IR which is an extended version of Souper IR. what I'm saying is that I'm not sure that writing a parser for the pretty-printed output is the right answer. the right answer is probably to write some C++ that fits into Hydra and directly emits Cranelift pattern-matching / rewriting code. if you want to do this, please talk to us and hopefully also Manasij will help out
fitzgen commented on issue #5783:
Thanks @regehr!! Will take a look at these when I have some free cycles.
Out of curiosity, can you provide some more details on what the reported "profit" scores are?
regehr commented on issue #5783:
ah, sorry, that's the difference in cost between the LHS and RHS, where cost is souper's idiosyncratic cost model where most stuff has cost 1, a few things like select have cost 3, and then some stuff like intrinsics have cost 5.
we never really arrived at a good cost model for LLVM IR (I talked to many LLVM people about this many times and never really made good progress) and I don't expect it to be a great cost model for Cranelift either. but perhaps it's close enough.
fitzgen commented on issue #5783:
what I'm saying is that I'm not sure that writing a parser for the pretty-printed output is the right answer.
FWIW, if Hydra outputted the same text format as Souper, we would be able to reuse the existing parser I wrote: https://docs.rs/souper-ir/latest/souper_ir/
But yeah, we will definitely reach out again when we look at automating this whole process in the future
manasij7479 commented on issue #5783:
what I'm saying is that I'm not sure that writing a parser for the pretty-printed output is the right answer.
FWIW, if Hydra outputted the same text format as Souper, we would be able to reuse the existing parser I wrote: https://docs.rs/souper-ir/latest/souper_ir/
But yeah, we will definitely reach out again when we look at automating this whole process in the future
It does, and that is what our automation for generating an LLVM pass uses.
It assigns meaning to specific identifier names, and has different semantics for width constraints.
For example a %symconst prefix means it is a symbolic constant. It is pretty much Souper IR other than some details like this, so your parser could work pretty well.
regehr commented on issue #5783:
this all sounds great.
but note that some things like bitwidth independence are handled outside of Souper IR, and require some care
if there's something we can do on our side, please let us know, but keep in mind that Manasij plans to defend later this spring, so the earlier the better!
Last updated: Dec 23 2024 at 12:05 UTC