wasmtime / issue #4602 cranelift: Sign extend immediates ... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #4602 cranelift: Sign extend immediates ...

Wasmtime GitHub notifications bot (Aug 04 2022 at 14:22):

As I stated in the linked issue, IMHO the CLIF instruction definition should be updated.

Wasmtime GitHub notifications bot (Aug 04 2022 at 14:53):

As I stated in the linked issue, IMHO the CLIF instruction definition should be updated.

You are right, I didn't read your comment properly, ill mark this as draft until we make that decision.

Wasmtime GitHub notifications bot (Aug 04 2022 at 14:53):

afonso360 edited a comment on issue #4602:

As I stated in the linked issue, IMHO the CLIF instruction definition should be updated.

You are right, I didn't read your comment properly (sorry!), ill mark this as draft until we make that decision.

Wasmtime GitHub notifications bot (Aug 09 2022 at 19:27):

afonso360 commented on issue #4602:

@akirilov-arm I've updated the documentation, is this all you meant, or did I miss something else?

Also, cc @cfallin since I can't request reviews.

Wasmtime GitHub notifications bot (Aug 10 2022 at 13:21):

afonso360 commented on issue #4602:

I also think zero extending the unsigned operations is probably the better choice.

However, what if we instead altered the type of the instruction to accept the full Imm128?

I don't think this would grow the size of InstructionData since we already have Shuffle which also stores a Imm128. However Shuffle stores the immediate in the DataFlowGraph, we could do the same, and in this legalization pass emit the appropriate iconst / iconcat / sextend depending on what the constant is.

Would storing the Imm in the DFG for these operations cause performance issues?

Wasmtime GitHub notifications bot (Aug 10 2022 at 17:43):

cfallin commented on issue #4602:

Actually that would be the ideal design: from first principles one would expect "with immediate" variants of 128-bit ops to carry 128-bit (16-byte) immediates.

However we can't carry that inline in the instruction, as you're hinting, because InstructionData is 16 bytes total (and we want to keep it that way).

I suspect that indirecting for all constants would have some nontrivial overhead, though it might be a bit of work to measure (a lot of refactoring).

I had another thought, though, that may be more "honest" about the costs: what if we (i) limited op_imm instructions and the iconst instruction to work for up to 64 bits only? Producers that want to emit full 128-bit values already produce two iconsts and iconcat them. In some sense, trying to allow iconst.i128 -1 or the op_imm folded variant is lulling producers into a false sense of what's supported. As soon as one writes iconst.i128 -0x8000_0000_0000_0001 one finds out that there are only half as many bits in the immediate field as the type otherwise implies.

Thoughts on that?

Wasmtime GitHub notifications bot (Aug 10 2022 at 17:44):

cfallin edited a comment on issue #4602:

Actually that would be the ideal design: from first principles one would expect "with immediate" variants of 128-bit ops to carry 128-bit (16-byte) immediates.

However we can't carry that inline in the instruction, as you're hinting, because InstructionData is 16 bytes total (and we want to keep it that way).

I suspect that indirecting for all constants would have some nontrivial overhead, though it might be a bit of work to measure (a lot of refactoring).

I had another thought, though, that may be more "honest" about the costs: what if we limited (i) op_imm instructions and (ii) the iconst instruction to work for up to 64 bits only? Producers that want to emit full 128-bit values already produce two iconsts and iconcat them. In some sense, trying to allow iconst.i128 -1 or the op_imm folded variant is lulling producers into a false sense of what's supported. As soon as one writes iconst.i128 -0x8000_0000_0000_0001 one finds out that there are only half as many bits in the immediate field as the type otherwise implies.

Thoughts on that?

Wasmtime GitHub notifications bot (Aug 10 2022 at 19:02):

afonso360 commented on issue #4602:

I suspect that indirecting for all constants would have some nontrivial overhead, though it might be a bit of work to measure (a lot of refactoring).

I was proposing that only for *_imm variants. The way I see it, these instructions are already just sort of "helpers" in the sense that they just construct a pattern that we recognize (i.e. iadd + iconst), they could just emit more advanced patterns to support i128 (i.e. iadd+iconst+sext or iadd+iconst+iconcat all in 64bits).

Having more operations only work up to i64 to me feels like i128 is a second class citizen, which is sort of true for the current backends, but I don't really like it at the lang level.

I don't know, I'm a bit undecided on this, I'd like to hear what other people think.

Wasmtime GitHub notifications bot (Aug 10 2022 at 19:07):

cfallin commented on issue #4602:

Generally I agree at an IR-design level: it would be nice to have orthogonality here, and allow all types up through i128 in all places where integers are allowed.

This compromise feels similar to "small immediates" in RISC ISAs though: we have a relatively small struct size for instructions, and its size is critical for performance, and its entire width would be taken by just the i128 immediate (hence it would need to grow). So in that sense I don't think we can have an inline Imm128.

I would be curious to see the overhead of a "every immediate is indirect" approach, if you're up for prototyping that. If it's actually negligible, then I think it's a reasonable change to make.

Wasmtime GitHub notifications bot (Aug 11 2022 at 17:52):

afonso360 commented on issue #4602:

I would be curious to see the overhead of a "every immediate is indirect" approach, if you're up for prototyping that. If it's actually negligible, then I think it's a reasonable change to make.

Me too, and I think we could mitigate some of the cost (if it is non-negligible) by having two const InstructionData variants, one inline up to 64bits and one in the DFG for larger values, and choose between them when the user emits a iconst (can we do this?).

Although right now I don't have a lot of time to try this out.

I'm planning on addressing the feedback above and getting this merged, is that okay, or would you prefer restricting the types that we accept?

Last updated: Apr 18 2025 at 16:03 UTC