Stream: git-wasmtime

Topic: wasmtime / issue #5166 cranelift: Remove `iadd_cin` instr...


view this post on Zulip Wasmtime GitHub notifications bot (Nov 01 2022 at 11:30):

afonso360 opened issue #5166:

:wave: Hey,

This operation is essentially two add's chained together, I think that any optimizations that we make based on this operation can also be made on the iadd instruction, so there doesn't seem to be any benefit of keeping this around other than frontend convenience. And if that is important, maybe we should just move this to a frontend helper that expands to two iadd's + some masking for the carry input?

Additionally since the removal of booleans, the input for the carry bit seems a bit under-specified. We need some form of masking on the carry input, either carry & 1 or carry != 0. (This also affects the iadd_carry instruction).

We discussed carry operations last week in #5123 and also in the cranelift weekly meeting (notes) but nothing concrete about iadd_cin I think.

Some previous discussion about this mentions that cg-clif may need iadd_cin to "efficiently implement certain llvm intrinsics used by core::arch intrinsics that are stabilized" and points to the current implementation but that seems to be implementing iadd_carry which I agree probably needs to remain.

@bjorn3 Is there any operation that cg-clif would need to do, that needs iadd_cin but couldn't efficiently be replaced with iadd+iadd?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 01 2022 at 11:31):

afonso360 edited issue #5166:

:wave: Hey,

This operation is essentially two add's chained together, I think that any optimizations that we make based on this operation can also be made on the iadd instruction, so there doesn't seem to be any benefit of keeping this around other than frontend convenience. And if that is important, maybe we should just move this to a frontend helper that expands to two iadd's + some masking for the carry input?

Additionally since the removal of booleans, the input for the carry bit seems a bit under-specified. We need some form of masking on the carry input, either carry & 1 or carry != 0. (This also affects the iadd_carry instruction).

We discussed carry operations last week in #5123 and also in the cranelift weekly meeting (notes) but nothing concrete about iadd_cin I think.

Some previous discussion about this mentions that cg-clif may need iadd_cin to "efficiently implement certain llvm intrinsics used by core::arch intrinsics that are stabilized" and points to the current implementation but that seems to be implementing iadd_carry which I agree probably needs to remain.

@bjorn3 Do you know of any operation that cg-clif would need to do, that needs iadd_cin but couldn't efficiently be replaced with iadd+iadd?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 01 2022 at 11:43):

afonso360 edited issue #5166:

:wave: Hey,

This operation is essentially two add's chained together, I think that any optimizations that we make based on this operation can also be made on the iadd instruction, so there doesn't seem to be any benefit of keeping this around other than frontend convenience. And if that is important, maybe we should just move this to a frontend helper that expands to two iadd's + some masking for the carry input?

Additionally since the removal of booleans, the input for the carry bit seems a bit under-specified. We need some form of masking on the carry input, either carry & 1 or carry != 0. (This also affects the iadd_carry instruction).

We discussed carry operations last week in #5123 and also in the cranelift weekly meeting (notes) but nothing concrete about iadd_cin I think.

Some previous discussion about this mentions that cg-clif may need iadd_cin to "efficiently implement certain llvm intrinsics used by core::arch intrinsics that are stabilized" and points to the current implementation but that seems to be implementing iadd_carry which I agree probably needs to remain.

@bjorn3 Do you know of any operation that cg-clif would need to do, that needs iadd_cin but couldn't efficiently be replaced with iadd+iadd?


For reference here's the llvm output for this instruction, and as far as I can tell, it doesn't seem to be doing anything particularly special for any of the arches that we support.

For the carry & 1 it does emit the and instruction + two add's, and for the carry != 0 it does do something special on aarch64 and x86, but we can also pattern match iadd+icmp not zero and emit that same code?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 01 2022 at 11:43):

afonso360 edited issue #5166:

:wave: Hey,

This operation is essentially two add's chained together, I think that any optimizations that we make based on this operation can also be made on the iadd instruction, so there doesn't seem to be any benefit of keeping this around other than frontend convenience. And if that is important, maybe we should just move this to a frontend helper that expands to two iadd's + some masking for the carry input?

Additionally since the removal of booleans, the input for the carry bit seems a bit under-specified. We need some form of masking on the carry input, either carry & 1 or carry != 0. (This also affects the iadd_carry instruction).

We discussed carry operations last week in #5123 and also in the cranelift weekly meeting (notes) but nothing concrete about iadd_cin I think.

Some previous discussion about this mentions that cg-clif may need iadd_cin to "efficiently implement certain llvm intrinsics used by core::arch intrinsics that are stabilized" and points to the current implementation but that seems to be implementing iadd_carry which I agree probably needs to remain.

@bjorn3 Do you know of any operation that cg-clif would need to do, that needs iadd_cin but couldn't efficiently be replaced with iadd+iadd?


For reference here's the llvm output for this instruction, and as far as I can tell, it doesn't seem to be doing anything particularly special for any of the arches that we support.

For the carry & 1 it does emit the and instruction + two add's.

And for the carry != 0 it does do something special on aarch64 and x86, but we can also pattern match iadd+icmp not zero and emit that same code?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 01 2022 at 11:43):

afonso360 edited issue #5166:

:wave: Hey,

This operation is essentially two add's chained together, I think that any optimizations that we make based on this operation can also be made on the iadd instruction, so there doesn't seem to be any benefit of keeping this around other than frontend convenience. And if that is important, maybe we should just move this to a frontend helper that expands to two iadd's + some masking for the carry input?

Additionally since the removal of booleans, the input for the carry bit seems a bit under-specified. We need some form of masking on the carry input, either carry & 1 or carry != 0. (This also affects the iadd_carry instruction).

We discussed carry operations last week in #5123 and also in the cranelift weekly meeting (notes) but nothing concrete about iadd_cin I think.

Some previous discussion about this mentions that cg-clif may need iadd_cin to "efficiently implement certain llvm intrinsics used by core::arch intrinsics that are stabilized" and points to the current implementation but that seems to be implementing iadd_carry which I agree probably needs to remain.

@bjorn3 Do you know of any operation that cg-clif would need to do, that needs iadd_cin but couldn't efficiently be replaced with iadd+iadd?


For reference here's the llvm output for this instruction, and as far as I can tell, it doesn't seem to be doing anything particularly special for any of the arches that we support.

For the carry & 1 it does emit the and instruction + two add's.

And for the carry != 0 it does do something special on aarch64 and x86, but we can also pattern match iadd+icmp not zero and emit that same code.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 01 2022 at 11:47):

afonso360 edited issue #5166:

:wave: Hey,

This operation is essentially two add's chained together, I think that any optimizations that we make based on this operation can also be made on the iadd instruction, so there doesn't seem to be any benefit of keeping this around other than frontend convenience. And if that is important, maybe we should just move this to a frontend helper that expands to two iadd's + some masking for the carry input?

Additionally since the removal of booleans, the input for the carry bit seems a bit under-specified. We need some form of masking on the carry input, either carry & 1 or carry != 0. (This also affects the iadd_carry instruction).

We discussed carry operations last week in #5123 and also in the cranelift weekly meeting (notes) but nothing concrete about iadd_cin I think.

Some previous discussion about this mentions that cg-clif may need iadd_cin to "efficiently implement certain llvm intrinsics used by core::arch intrinsics that are stabilized" and points to the current implementation but that seems to be implementing iadd_carry which I agree probably needs to remain.

@bjorn3 Do you know of any operation that cg-clif would need to do, that needs iadd_cin but couldn't efficiently be replaced with iadd+iadd?


For reference here's the llvm output for this instruction, and as far as I can tell, it doesn't seem to be doing anything particularly special for any of the arches that we support.

For the carry & 1 it does emit the and instruction + two add's.

And for the carry != 0 it does do something special on aarch64 and x86, but we can also pattern match iadd+icmp not zero and emit that same code.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 01 2022 at 11:50):

bjorn3 commented on issue #5166:

What would be the purpose of removing iadd_cin while keeping iadd_carry? It is useful for terminating a bigint addition.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 01 2022 at 12:00):

afonso360 commented on issue #5166:

I'm proposing removing iadd_cin because that one feels a bit redundant to me.

However iadd_carry does have the carry bit output, which makes it quite different, It essentially outputs the carry if either of the two add's overflows, and as you mentioned it is very useful. We also have specialized instructions in hardware to represent that operation. Which is not the case for our iadd_cin.

iadd_cout is also not a substitute of iadd_carry since that one computes the overflow over a single add.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 01 2022 at 12:01):

afonso360 edited a comment on issue #5166:

I'm proposing removing iadd_cin because that one feels a bit redundant to me.

However iadd_carry does have the carry bit output, which makes it quite different.

It outputs the carry if either of the two add's overflows, and as you mentioned it is very useful. We also have specialized instructions in hardware to represent that operation. Which is not the case for our iadd_cin.

iadd_cout is also not a substitute of iadd_carry since that one computes the overflow over a single add.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 01 2022 at 12:04):

afonso360 edited a comment on issue #5166:

I'm proposing removing iadd_cin because that one feels a bit redundant to me since it could be replaced with two add's as mentioned above.

However iadd_carry does have the carry bit output, which makes it quite different.

It outputs the carry if either of the two add's overflows, and as you mentioned it is very useful. We also have specialized instructions in hardware to represent that operation. Which is not the case for our iadd_cin.

iadd_cout is also not a substitute of iadd_carry since that one computes the overflow over a single add.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 01 2022 at 12:05):

afonso360 edited a comment on issue #5166:

I'm proposing removing iadd_cin because that one feels a bit redundant to me since it could be replaced with two add's as mentioned above.

However iadd_carry does have the carry bit output, which makes it quite different.

It outputs the carry if either of the two add's overflows, and as you mentioned it is very useful. We also have specialized instructions in hardware to represent that operation. Which is not the case for our iadd_cin.

iadd_cout is also not a substitute of iadd_carry since that one computes the overflow over a single add.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 01 2022 at 12:09):

bjorn3 commented on issue #5166:

iadd_cin is iadd_carry except that it doesn't materialize the carry flag. In other words iadd_cin can be implemented using a single adc instruction on x86 while iadd_carry must be implemented as adc+setc if not fused with the instruction using the output carry flag.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 01 2022 at 12:19):

afonso360 commented on issue #5166:

In other words iadd_cin can be implemented using a single adc instruction on x86

Right, but in our case we have no guarantee that the previous input has set the carry flag unless we pattern match some previous instruction.

So we need to pattern mach the carry input of iadd_cin in order to lower it to an adc. And if we are doing that, we can also do that on iadd and use adc there.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 01 2022 at 12:37):

afonso360 commented on issue #5166:

What I'm proposing is replacing the following:

function %add1(i8) -> i8 {
block0(v0: i8):
    v1, v2 = iadd_cout v0, v0
    v3 = iadd_cin v1, v1, v2
    return v3
}

With this:

function %add2(i8) -> i8 {
block0(v0: i8):
    v1, v2 = iadd_cout v0, v0
    v3 = iadd v1, v2
    v4 = iadd v1, v3
    return v4
}

In the v3 = iadd v1, v2 line, we know that v2 is a carry output, so we can pattern match (iadd _ (iadd_cout _ _)) and avoid materializing the carry flag by doing the adc instruction there.


We actually can't yet do this exact pattern matching since ISLE can't match two output instructions, but the same goes for iadd_cin.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 01 2022 at 12:38):

afonso360 edited a comment on issue #5166:

What I'm proposing is replacing the following:

function %add1(i8) -> i8 {
block0(v0: i8):
    v1, v2 = iadd_cout v0, v0
    v3 = iadd_cin v1, v1, v2
    return v3
}

With this:

function %add2(i8) -> i8 {
block0(v0: i8):
    v1, v2 = iadd_cout v0, v0
    v3 = iadd v1, v2
    v4 = iadd v1, v3
    return v4
}

In the v3 = iadd v1, v2 line, we know that v2 is a carry output, so we can pattern match (iadd _ (iadd_cout _ _)) and avoid materializing the carry flag by doing the adc instruction there.


We actually can't do this exact pattern matching yet, since ISLE can't match two output instructions, but the same goes for iadd_cin.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 01 2022 at 12:43):

afonso360 edited a comment on issue #5166:

What I'm proposing is replacing the following:

function %add1(i8) -> i8 {
block0(v0: i8):
    v1, v2 = iadd_cout v0, v0
    v3 = iadd_cin v1, v1, v2
    return v3
}

With this:

function %add2(i8) -> i8 {
block0(v0: i8):
    v1, v2 = iadd_cout v0, v0
    v3 = iadd v1, v2
    v4 = iadd v1, v3
    return v4
}

~In the v3 = iadd v1, v2 line, we know that v2 is a carry output, so we can pattern match (iadd _ (iadd_cout _ _)) and avoid materializing the carry flag by doing the adc instruction there.~

Edit: In the v4 = iadd v1, v3 line, we can pattern match (iadd _ (iadd _ (iadd_cout _ _))) and avoid materializing the carry flag by doing the adc instruction there.


We actually can't do this exact pattern matching yet, since ISLE can't match two output instructions, but the same goes for iadd_cin.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 27 2023 at 20:17):

scottmcm commented on issue #5166:

Maybe the value is more clear here for wider integers?

It seems like

    v4, v5 = iadd_cout v0, v1
    v6 = iadd_cin v1, v2, v5

would, in general, need to be

    v4, v5 = iadd_cout v0, v1
    v7 = uextend v5
    v8 = iadd v1, v2
    v6 = iadd v8, v7

in order to not have iadd_cin. Though that's certainly still pattern-matchable, just increasing the number of permutations that would need to be matched.

(but it's also possible that I completely don't understand cranelift's type expectations)

view this post on Zulip Wasmtime GitHub notifications bot (Sep 06 2024 at 20:11):

afonso360 closed issue #5166:

:wave: Hey,

This operation is essentially two add's chained together, I think that any optimizations that we make based on this operation can also be made on the iadd instruction, so there doesn't seem to be any benefit of keeping this around other than frontend convenience. And if that is important, maybe we should just move this to a frontend helper that expands to two iadd's + some masking for the carry input?

Additionally since the removal of booleans, the input for the carry bit seems a bit under-specified. We need some form of masking on the carry input, either carry & 1 or carry != 0. (This also affects the iadd_carry instruction).

We discussed carry operations last week in #5123 and also in the cranelift weekly meeting (notes) but nothing concrete about iadd_cin I think.

Some previous discussion about this mentions that cg-clif may need iadd_cin to "efficiently implement certain llvm intrinsics used by core::arch intrinsics that are stabilized" and points to the current implementation but that seems to be implementing iadd_carry which I agree probably needs to remain.

@bjorn3 Do you know of any operation that cg-clif would need to do, that needs iadd_cin but couldn't efficiently be replaced with iadd+iadd?


For reference here's the llvm output for this instruction, and as far as I can tell, it doesn't seem to be doing anything particularly special for any of the arches that we support.

For the carry & 1 it does emit the and instruction + two add's.

And for the carry != 0 it does do something special on aarch64 and x86, but we can also pattern match iadd+icmp not zero and emit that same code.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 06 2024 at 20:11):

afonso360 commented on issue #5166:

Fixed by #9199


Last updated: Dec 23 2024 at 12:05 UTC