T0b1-iOS opened PR #5784 from arith_of
to main
:
Currently, there is not really a way to efficiently detect arithmetic over/underflow which some ISAs (in particular x64) allow and can be important. For example, I'm currently trialing Cranelift as a backend in a database system which needs efficient overflow handling.
There also seems to be more general interest in overflow detection (see #1044).This PR adds instructions for unsigned and signed add/sub/mul which return a second output indicating overflow.
add
/sub
support 8 through 128 bit integers while themul
variants only support up to 64 bit integers. (That's because the most efficient implementation forumul_overflow
with 128bit integers is along the lines of 50 instructions on x64 at which point it is probably better to emit a function call).
I leftiadd_cout
and friends untouched as the semantics are a bit confusing right now, e.g.iadd_cout
seems to be indicating signed overflow while the pseudocode in the description describes unsigned overflow).In detail, this PR adds:
{u,s}{add,sub,mul}_overflow
with an implementation in the interpreter (I hope the naming convention is ok)- runtests for these instructions
- support for 8/16 bit operands for
AluRmiR
/AluRM
instructions in the x64 backend- support for the usigned
mul
instruction on x64- support for
UMAddL
/SMAddL
instructions in the aarch64 backend- lowerings of
{u,s}{add,sub,mul}_overflow
for x64 and aarch64Currently, there are no emit tests since they were not important for the initial testing and I would like to know whether there is interest in these instructions being included first.
Additionally, the 128bit lowerings currently forego the flag helpers since they are, as far as I can tell, not really designed to produce three outputs so getting both the low/high part of an addition and the dst reg for a setcc seemed a bit complicated.
I also do not have a lot of knowledge about RV64/S390X so I don't think I can provide lowerings there.
T0b1-iOS updated PR #5784 from arith_of
to main
.
T0b1-iOS updated PR #5784 from arith_of
to main
.
T0b1-iOS updated PR #5784 from arith_of
to main
.
T0b1-iOS updated PR #5784 from arith_of
to main
.
T0b1-iOS updated PR #5784 from arith_of
to main
.
T0b1-iOS updated PR #5784 from arith_of
to main
.
T0b1-iOS has marked PR #5784 as ready for review.
T0b1-iOS updated PR #5784 from arith_of
to main
.
afonso360 submitted PR review.
afonso360 created PR review comment:
This description does not seem correct, since we use
ints(Interval::All)
below, which does include I128's.The descriptions for
umul_overflow
/smul_overflow
also state something similar.
afonso360 submitted PR review.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
s/a arithmetic/an arithmetic/
cfallin created PR review comment:
small style nit, but I prefer a more functional style here, where instead of a
mut
binding, we dolet (op, size) = match alu_op { ... }
and produce originalsize
or an overridden one in each match arm.
cfallin created PR review comment:
It's somewhat surprising here that we take an arbitrary
Cond
, given that the helper name is specifically "overflow"-related; could we document whichCond
s we expect, or if it's only one in practice, just name it directly?
cfallin created PR review comment:
I usually prefer to avoid
bool
flags unless the meaning is really clear from the name. Here it's not clear from the name whethertrue
means zero or sign-extend. (And we've had one CVE from confusing these so the difference is important!) Could we define a purpose-built enum withZero
andSign
arms, if we don't have one elsewhere that we can use, and use it here and in callers as appropriate?
cfallin created PR review comment:
We usually use the terminology "narrow" to describe types narrower than the native operator/instruction width. Not a huge deal but probably nice for future readability.
cfallin created PR review comment:
Let's extend the helper so we can -- it's the way we use to avoid subtle bugs from out-of-order
emit
s and unexpected side-effects.
cfallin created PR review comment:
Is there any reason we can't use the ordinary
mul
in theAluRmiR
? Alternately, if we do this, we should remove it completely fromAluRmiROpcode
and switch uses over -- there shouldn't be two different ways to build a MachInst for it.
cfallin created PR review comment:
Here as well, let's extend
with_flags
rather than doing an ad-hocemit
.
cfallin created PR review comment:
It might make the tests a little easier to read and maintain if we use multiple-returns here -- if we return
-> i128, i8
and compare against[val, of]
we can collapse this with the below. See e.g.runtests/i128-concat-split.clif
for an example.
T0b1-iOS submitted PR review.
T0b1-iOS created PR review comment:
AluRmiROpcode.Mul
is a signed multiply, it emitsimul
. This one emitsmul
, which is unsigned multiplication. I added it as a new MachInst since the implementation for theAluRmiR
mul operation states that it is freeloaded into the MachInst and doesn't really fit it already so I didn't add aAluRmiROpcode.UMul
.
cfallin created PR review comment:
Ah! I had missed that difference; thanks. This is indeed fine then.
cfallin submitted PR review.
T0b1-iOS submitted PR review.
T0b1-iOS created PR review comment:
The problem is that currently
with_flags
returns aValueRegs
which can hold at max 2 values but the 128bit implementations for the overflow ops would need three return values.
So I could change thewith_flags
helper to return anInstOutput
maybe and change all the places it's used to accomodate this but that would probably be quite a big change.
Or should I simply implement a new helper (e.g.with_flags_extended
) that can return anInstOutput
?
And should I useInstOutput
or create a new type to handle more than oneValueRegs
. Though I think the notion ofInstOuput
with multipleValueRegs
is already quite fitting.
T0b1-iOS updated PR #5784 from arith_of
to main
.
T0b1-iOS submitted PR review.
T0b1-iOS created PR review comment:
Thanks for catching that, it should be correct now. I also added your changes for the fuzzer.
T0b1-iOS submitted PR review.
T0b1-iOS created PR review comment:
I added a description which conditions are used.
T0b1-iOS submitted PR review.
T0b1-iOS created PR review comment:
I changed that.
T0b1-iOS edited PR review comment.
T0b1-iOS submitted PR review.
T0b1-iOS created PR review comment:
I changed it to use the
ArgumentExtension
enum which I thought is fitting enough. I also changed thelower_extend_op
helper to useArgumentExtension
since that was using abool
before also.
T0b1-iOS submitted PR review.
T0b1-iOS created PR review comment:
I changed the description. Is it better now?
cfallin submitted PR review.
cfallin created PR review comment:
Hmm -- yeah,
with_flags_extended
seems like a better option here.
cfallin submitted PR review.
cfallin created PR review comment:
Yep, thanks!
cfallin submitted PR review.
cfallin created PR review comment:
Much better, thank you!
T0b1-iOS requested elliottt for a review on PR #5784.
T0b1-iOS requested wasmtime-compiler-reviewers for a review on PR #5784.
T0b1-iOS updated PR #5784.
T0b1-iOS created PR review comment:
I added the helper now. I added a new
ProducesFlags
variant that combines twoProducesFlagsReturnsResultWithConsumer
. I also added a few helper to extract the values out of theInstOutput
afterwards which involves a bit of trickery to extractValueRegs
from it using a reference. You might want to look at that (all that mentionsInstOutputRef
).
I also renamedproduces_flags_append
toproduces_flags_concat
to be in-line withconsumes_flags_concat
.
T0b1-iOS submitted PR review.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Hmm, now that I see this with
InstOutputRef
, I don't think this is very readable either; making the "extended" variant of the combinator take only the specific combination we need feels odd too. The secondProducesFlags
also consumes flags from the previous op, which makes it a bit of an impedance mismatch too.I think something like this would (finally!) work though:
(type MultiReg (Three (a Reg) (b Reg) (c Reg)) (decl with_flags_chain (ProducesFlags ProducesAndConsumesFlags ConsumesFlags) MultiReg) ;; inst constructor that turns `adcs` (aarch64) or `add` (x64) into a `ProducesAndConsumesFlags` ;; then: (decl pair_and_single (MultiReg) InstOutput) (rule (pair_and_single (MultiReg.Three a b c) (output_pair (value_regs a b) c))
does that seem reasonable?
T0b1-iOS submitted PR review.
T0b1-iOS created PR review comment:
The second
ProducesFlags
also consumes flags from the previous op, which makes it a bit of an impedance mismatch too.
That is true, though theccmp
constructor on arm does the same thing withProducesFlagsTwiceSideEffect
.The rest is reasonable, i guess, though I'd need to implement more generic inst constructors for
alu_rmi_r
/alu_rrr
that construct aConsumesAndProducesFlags
(probably better ordered like that) since the helpers do not know which alu op they are constructing specifically.
T0b1-iOS edited PR review comment.
T0b1-iOS updated PR #5784.
T0b1-iOS submitted PR review.
T0b1-iOS created PR review comment:
I replaced the old change with what you proposed. It supports more combinations now, though that ended up in a lot of copypaste.
cfallin submitted PR review.
T0b1-iOS updated PR #5784.
cfallin has enabled auto merge for PR #5784.
cfallin merged PR #5784.
Last updated: Jan 24 2025 at 00:11 UTC