akirilov-arm commented on issue #4602:
As I stated in the linked issue, IMHO the CLIF instruction definition should be updated.
afonso360 commented on issue #4602:
As I stated in the linked issue, IMHO the CLIF instruction definition should be updated.
You are right, I didn't read your comment properly, ill mark this as draft until we make that decision.
afonso360 edited a comment on issue #4602:
As I stated in the linked issue, IMHO the CLIF instruction definition should be updated.
You are right, I didn't read your comment properly (sorry!), ill mark this as draft until we make that decision.
afonso360 commented on issue #4602:
@akirilov-arm I've updated the documentation, is this all you meant, or did I miss something else?
Also, cc @cfallin since I can't request reviews.
afonso360 commented on issue #4602:
I also think zero extending the unsigned operations is probably the better choice.
However, what if we instead altered the type of the instruction to accept the full
Imm128
?I don't think this would grow the size of
InstructionData
since we already haveShuffle
which also stores aImm128
. HoweverShuffle
stores the immediate in theDataFlowGraph
, we could do the same, and in this legalization pass emit the appropriateiconst
/iconcat
/sextend
depending on what the constant is.Would storing the Imm in the DFG for these operations cause performance issues?
cfallin commented on issue #4602:
Actually that would be the ideal design: from first principles one would expect "with immediate" variants of 128-bit ops to carry 128-bit (16-byte) immediates.
However we can't carry that inline in the instruction, as you're hinting, because
InstructionData
is 16 bytes total (and we want to keep it that way).I suspect that indirecting for all constants would have some nontrivial overhead, though it might be a bit of work to measure (a lot of refactoring).
I had another thought, though, that may be more "honest" about the costs: what if we (i) limited
op_imm
instructions and theiconst
instruction to work for up to 64 bits only? Producers that want to emit full 128-bit values already produce twoiconst
s andiconcat
them. In some sense, trying to allowiconst.i128 -1
or theop_imm
folded variant is lulling producers into a false sense of what's supported. As soon as one writesiconst.i128 -0x8000_0000_0000_0001
one finds out that there are only half as many bits in the immediate field as the type otherwise implies.Thoughts on that?
cfallin edited a comment on issue #4602:
Actually that would be the ideal design: from first principles one would expect "with immediate" variants of 128-bit ops to carry 128-bit (16-byte) immediates.
However we can't carry that inline in the instruction, as you're hinting, because
InstructionData
is 16 bytes total (and we want to keep it that way).I suspect that indirecting for all constants would have some nontrivial overhead, though it might be a bit of work to measure (a lot of refactoring).
I had another thought, though, that may be more "honest" about the costs: what if we limited (i)
op_imm
instructions and (ii) theiconst
instruction to work for up to 64 bits only? Producers that want to emit full 128-bit values already produce twoiconst
s andiconcat
them. In some sense, trying to allowiconst.i128 -1
or theop_imm
folded variant is lulling producers into a false sense of what's supported. As soon as one writesiconst.i128 -0x8000_0000_0000_0001
one finds out that there are only half as many bits in the immediate field as the type otherwise implies.Thoughts on that?
afonso360 commented on issue #4602:
I suspect that indirecting for all constants would have some nontrivial overhead, though it might be a bit of work to measure (a lot of refactoring).
I was proposing that only for
*_imm
variants. The way I see it, these instructions are already just sort of "helpers" in the sense that they just construct a pattern that we recognize (i.e.iadd
+iconst
), they could just emit more advanced patterns to support i128 (i.e.iadd+iconst+sext
oriadd+iconst+iconcat
all in 64bits).Having more operations only work up to
i64
to me feels likei128
is a second class citizen, which is sort of true for the current backends, but I don't really like it at the lang level.I don't know, I'm a bit undecided on this, I'd like to hear what other people think.
cfallin commented on issue #4602:
Generally I agree at an IR-design level: it would be nice to have orthogonality here, and allow all types up through
i128
in all places where integers are allowed.This compromise feels similar to "small immediates" in RISC ISAs though: we have a relatively small struct size for instructions, and its size is critical for performance, and its entire width would be taken by just the
i128
immediate (hence it would need to grow). So in that sense I don't think we can have an inlineImm128
.I would be curious to see the overhead of a "every immediate is indirect" approach, if you're up for prototyping that. If it's actually negligible, then I think it's a reasonable change to make.
afonso360 commented on issue #4602:
I would be curious to see the overhead of a "every immediate is indirect" approach, if you're up for prototyping that. If it's actually negligible, then I think it's a reasonable change to make.
Me too, and I think we could mitigate some of the cost (if it is non-negligible) by having two const
InstructionData
variants, one inline up to 64bits and one in the DFG for larger values, and choose between them when the user emits aiconst
(can we do this?).Although right now I don't have a lot of time to try this out.
I'm planning on addressing the feedback above and getting this merged, is that okay, or would you prefer restricting the types that we accept?
Last updated: Jan 24 2025 at 00:11 UTC