afonso360 opened issue #5166:
:wave: Hey,
This operation is essentially two add's chained together, I think that any optimizations that we make based on this operation can also be made on the
iadd
instruction, so there doesn't seem to be any benefit of keeping this around other than frontend convenience. And if that is important, maybe we should just move this to a frontend helper that expands to twoiadd
's + some masking for the carry input?Additionally since the removal of booleans, the input for the carry bit seems a bit under-specified. We need some form of masking on the carry input, either
carry & 1
orcarry != 0
. (This also affects theiadd_carry
instruction).We discussed carry operations last week in #5123 and also in the cranelift weekly meeting (notes) but nothing concrete about
iadd_cin
I think.Some previous discussion about this mentions that cg-clif may need
iadd_cin
to "efficiently implement certain llvm intrinsics used by core::arch intrinsics that are stabilized" and points to the current implementation but that seems to be implementingiadd_carry
which I agree probably needs to remain.@bjorn3 Is there any operation that cg-clif would need to do, that needs
iadd_cin
but couldn't efficiently be replaced withiadd+iadd
?
afonso360 edited issue #5166:
:wave: Hey,
This operation is essentially two add's chained together, I think that any optimizations that we make based on this operation can also be made on the
iadd
instruction, so there doesn't seem to be any benefit of keeping this around other than frontend convenience. And if that is important, maybe we should just move this to a frontend helper that expands to twoiadd
's + some masking for the carry input?Additionally since the removal of booleans, the input for the carry bit seems a bit under-specified. We need some form of masking on the carry input, either
carry & 1
orcarry != 0
. (This also affects theiadd_carry
instruction).We discussed carry operations last week in #5123 and also in the cranelift weekly meeting (notes) but nothing concrete about
iadd_cin
I think.Some previous discussion about this mentions that cg-clif may need
iadd_cin
to "efficiently implement certain llvm intrinsics used by core::arch intrinsics that are stabilized" and points to the current implementation but that seems to be implementingiadd_carry
which I agree probably needs to remain.@bjorn3 Do you know of any operation that cg-clif would need to do, that needs
iadd_cin
but couldn't efficiently be replaced withiadd+iadd
?
afonso360 edited issue #5166:
:wave: Hey,
This operation is essentially two add's chained together, I think that any optimizations that we make based on this operation can also be made on the
iadd
instruction, so there doesn't seem to be any benefit of keeping this around other than frontend convenience. And if that is important, maybe we should just move this to a frontend helper that expands to twoiadd
's + some masking for the carry input?Additionally since the removal of booleans, the input for the carry bit seems a bit under-specified. We need some form of masking on the carry input, either
carry & 1
orcarry != 0
. (This also affects theiadd_carry
instruction).We discussed carry operations last week in #5123 and also in the cranelift weekly meeting (notes) but nothing concrete about
iadd_cin
I think.Some previous discussion about this mentions that cg-clif may need
iadd_cin
to "efficiently implement certain llvm intrinsics used by core::arch intrinsics that are stabilized" and points to the current implementation but that seems to be implementingiadd_carry
which I agree probably needs to remain.@bjorn3 Do you know of any operation that cg-clif would need to do, that needs
iadd_cin
but couldn't efficiently be replaced withiadd+iadd
?
For reference here's the llvm output for this instruction, and as far as I can tell, it doesn't seem to be doing anything particularly special for any of the arches that we support.
For the
carry & 1
it does emit theand
instruction + twoadd
's, and for thecarry != 0
it does do something special on aarch64 and x86, but we can also pattern matchiadd+icmp not zero
and emit that same code?
afonso360 edited issue #5166:
:wave: Hey,
This operation is essentially two add's chained together, I think that any optimizations that we make based on this operation can also be made on the
iadd
instruction, so there doesn't seem to be any benefit of keeping this around other than frontend convenience. And if that is important, maybe we should just move this to a frontend helper that expands to twoiadd
's + some masking for the carry input?Additionally since the removal of booleans, the input for the carry bit seems a bit under-specified. We need some form of masking on the carry input, either
carry & 1
orcarry != 0
. (This also affects theiadd_carry
instruction).We discussed carry operations last week in #5123 and also in the cranelift weekly meeting (notes) but nothing concrete about
iadd_cin
I think.Some previous discussion about this mentions that cg-clif may need
iadd_cin
to "efficiently implement certain llvm intrinsics used by core::arch intrinsics that are stabilized" and points to the current implementation but that seems to be implementingiadd_carry
which I agree probably needs to remain.@bjorn3 Do you know of any operation that cg-clif would need to do, that needs
iadd_cin
but couldn't efficiently be replaced withiadd+iadd
?
For reference here's the llvm output for this instruction, and as far as I can tell, it doesn't seem to be doing anything particularly special for any of the arches that we support.
For the
carry & 1
it does emit theand
instruction + twoadd
's.And for the
carry != 0
it does do something special on aarch64 and x86, but we can also pattern matchiadd+icmp not zero
and emit that same code?
afonso360 edited issue #5166:
:wave: Hey,
This operation is essentially two add's chained together, I think that any optimizations that we make based on this operation can also be made on the
iadd
instruction, so there doesn't seem to be any benefit of keeping this around other than frontend convenience. And if that is important, maybe we should just move this to a frontend helper that expands to twoiadd
's + some masking for the carry input?Additionally since the removal of booleans, the input for the carry bit seems a bit under-specified. We need some form of masking on the carry input, either
carry & 1
orcarry != 0
. (This also affects theiadd_carry
instruction).We discussed carry operations last week in #5123 and also in the cranelift weekly meeting (notes) but nothing concrete about
iadd_cin
I think.Some previous discussion about this mentions that cg-clif may need
iadd_cin
to "efficiently implement certain llvm intrinsics used by core::arch intrinsics that are stabilized" and points to the current implementation but that seems to be implementingiadd_carry
which I agree probably needs to remain.@bjorn3 Do you know of any operation that cg-clif would need to do, that needs
iadd_cin
but couldn't efficiently be replaced withiadd+iadd
?
For reference here's the llvm output for this instruction, and as far as I can tell, it doesn't seem to be doing anything particularly special for any of the arches that we support.
For the
carry & 1
it does emit theand
instruction + twoadd
's.And for the
carry != 0
it does do something special on aarch64 and x86, but we can also pattern matchiadd+icmp not zero
and emit that same code.
afonso360 edited issue #5166:
:wave: Hey,
This operation is essentially two
add
's chained together, I think that any optimizations that we make based on this operation can also be made on theiadd
instruction, so there doesn't seem to be any benefit of keeping this around other than frontend convenience. And if that is important, maybe we should just move this to a frontend helper that expands to twoiadd
's + some masking for the carry input?Additionally since the removal of booleans, the input for the carry bit seems a bit under-specified. We need some form of masking on the carry input, either
carry & 1
orcarry != 0
. (This also affects theiadd_carry
instruction).We discussed carry operations last week in #5123 and also in the cranelift weekly meeting (notes) but nothing concrete about
iadd_cin
I think.Some previous discussion about this mentions that cg-clif may need
iadd_cin
to "efficiently implement certain llvm intrinsics used by core::arch intrinsics that are stabilized" and points to the current implementation but that seems to be implementingiadd_carry
which I agree probably needs to remain.@bjorn3 Do you know of any operation that cg-clif would need to do, that needs
iadd_cin
but couldn't efficiently be replaced withiadd+iadd
?
For reference here's the llvm output for this instruction, and as far as I can tell, it doesn't seem to be doing anything particularly special for any of the arches that we support.
For the
carry & 1
it does emit theand
instruction + twoadd
's.And for the
carry != 0
it does do something special on aarch64 and x86, but we can also pattern matchiadd+icmp not zero
and emit that same code.
bjorn3 commented on issue #5166:
What would be the purpose of removing iadd_cin while keeping iadd_carry? It is useful for terminating a bigint addition.
afonso360 commented on issue #5166:
I'm proposing removing
iadd_cin
because that one feels a bit redundant to me.However
iadd_carry
does have the carry bit output, which makes it quite different, It essentially outputs the carry if either of the twoadd
's overflows, and as you mentioned it is very useful. We also have specialized instructions in hardware to represent that operation. Which is not the case for ouriadd_cin
.
iadd_cout
is also not a substitute ofiadd_carry
since that one computes the overflow over a single add.
afonso360 edited a comment on issue #5166:
I'm proposing removing
iadd_cin
because that one feels a bit redundant to me.However
iadd_carry
does have the carry bit output, which makes it quite different.It outputs the carry if either of the two
add
's overflows, and as you mentioned it is very useful. We also have specialized instructions in hardware to represent that operation. Which is not the case for ouriadd_cin
.
iadd_cout
is also not a substitute ofiadd_carry
since that one computes the overflow over a single add.
afonso360 edited a comment on issue #5166:
I'm proposing removing
iadd_cin
because that one feels a bit redundant to me since it could be replaced with twoadd
's as mentioned above.However
iadd_carry
does have the carry bit output, which makes it quite different.It outputs the carry if either of the two
add
's overflows, and as you mentioned it is very useful. We also have specialized instructions in hardware to represent that operation. Which is not the case for ouriadd_cin
.
iadd_cout
is also not a substitute ofiadd_carry
since that one computes the overflow over a single add.
afonso360 edited a comment on issue #5166:
I'm proposing removing
iadd_cin
because that one feels a bit redundant to me since it could be replaced with twoadd
's as mentioned above.However
iadd_carry
does have the carry bit output, which makes it quite different.It outputs the carry if either of the two
add
's overflows, and as you mentioned it is very useful. We also have specialized instructions in hardware to represent that operation. Which is not the case for ouriadd_cin
.
iadd_cout
is also not a substitute ofiadd_carry
since that one computes the overflow over a single add.
bjorn3 commented on issue #5166:
iadd_cin is iadd_carry except that it doesn't materialize the carry flag. In other words iadd_cin can be implemented using a single adc instruction on x86 while iadd_carry must be implemented as adc+setc if not fused with the instruction using the output carry flag.
afonso360 commented on issue #5166:
In other words iadd_cin can be implemented using a single adc instruction on x86
Right, but in our case we have no guarantee that the previous input has set the carry flag unless we pattern match some previous instruction.
So we need to pattern mach the carry input of
iadd_cin
in order to lower it to anadc
. And if we are doing that, we can also do that oniadd
and useadc
there.
afonso360 commented on issue #5166:
What I'm proposing is replacing the following:
function %add1(i8) -> i8 { block0(v0: i8): v1, v2 = iadd_cout v0, v0 v3 = iadd_cin v1, v1, v2 return v3 }
With this:
function %add2(i8) -> i8 { block0(v0: i8): v1, v2 = iadd_cout v0, v0 v3 = iadd v1, v2 v4 = iadd v1, v3 return v4 }
In the
v3 = iadd v1, v2
line, we know thatv2
is a carry output, so we can pattern match(iadd _ (iadd_cout _ _))
and avoid materializing the carry flag by doing theadc
instruction there.
We actually can't yet do this exact pattern matching since ISLE can't match two output instructions, but the same goes for
iadd_cin
.
afonso360 edited a comment on issue #5166:
What I'm proposing is replacing the following:
function %add1(i8) -> i8 { block0(v0: i8): v1, v2 = iadd_cout v0, v0 v3 = iadd_cin v1, v1, v2 return v3 }
With this:
function %add2(i8) -> i8 { block0(v0: i8): v1, v2 = iadd_cout v0, v0 v3 = iadd v1, v2 v4 = iadd v1, v3 return v4 }
In the
v3 = iadd v1, v2
line, we know thatv2
is a carry output, so we can pattern match(iadd _ (iadd_cout _ _))
and avoid materializing the carry flag by doing theadc
instruction there.
We actually can't do this exact pattern matching yet, since ISLE can't match two output instructions, but the same goes for
iadd_cin
.
afonso360 edited a comment on issue #5166:
What I'm proposing is replacing the following:
function %add1(i8) -> i8 { block0(v0: i8): v1, v2 = iadd_cout v0, v0 v3 = iadd_cin v1, v1, v2 return v3 }
With this:
function %add2(i8) -> i8 { block0(v0: i8): v1, v2 = iadd_cout v0, v0 v3 = iadd v1, v2 v4 = iadd v1, v3 return v4 }
~In the
v3 = iadd v1, v2
line, we know thatv2
is a carry output, so we can pattern match(iadd _ (iadd_cout _ _))
and avoid materializing the carry flag by doing theadc
instruction there.~Edit: In the
v4 = iadd v1, v3
line, we can pattern match(iadd _ (iadd _ (iadd_cout _ _)))
and avoid materializing the carry flag by doing theadc
instruction there.
We actually can't do this exact pattern matching yet, since ISLE can't match two output instructions, but the same goes for
iadd_cin
.
scottmcm commented on issue #5166:
Maybe the value is more clear here for wider integers?
It seems like
v4, v5 = iadd_cout v0, v1 v6 = iadd_cin v1, v2, v5
would, in general, need to be
v4, v5 = iadd_cout v0, v1 v7 = uextend v5 v8 = iadd v1, v2 v6 = iadd v8, v7
in order to not have
iadd_cin
. Though that's certainly still pattern-matchable, just increasing the number of permutations that would need to be matched.(but it's also possible that I completely don't understand cranelift's type expectations)
afonso360 closed issue #5166:
:wave: Hey,
This operation is essentially two
add
's chained together, I think that any optimizations that we make based on this operation can also be made on theiadd
instruction, so there doesn't seem to be any benefit of keeping this around other than frontend convenience. And if that is important, maybe we should just move this to a frontend helper that expands to twoiadd
's + some masking for the carry input?Additionally since the removal of booleans, the input for the carry bit seems a bit under-specified. We need some form of masking on the carry input, either
carry & 1
orcarry != 0
. (This also affects theiadd_carry
instruction).We discussed carry operations last week in #5123 and also in the cranelift weekly meeting (notes) but nothing concrete about
iadd_cin
I think.Some previous discussion about this mentions that cg-clif may need
iadd_cin
to "efficiently implement certain llvm intrinsics used by core::arch intrinsics that are stabilized" and points to the current implementation but that seems to be implementingiadd_carry
which I agree probably needs to remain.@bjorn3 Do you know of any operation that cg-clif would need to do, that needs
iadd_cin
but couldn't efficiently be replaced withiadd+iadd
?
For reference here's the llvm output for this instruction, and as far as I can tell, it doesn't seem to be doing anything particularly special for any of the arches that we support.
For the
carry & 1
it does emit theand
instruction + twoadd
's.And for the
carry != 0
it does do something special on aarch64 and x86, but we can also pattern matchiadd+icmp not zero
and emit that same code.
afonso360 commented on issue #5166:
Fixed by #9199
Last updated: Dec 23 2024 at 12:05 UTC