sparker-arm opened PR #19 from cranelift-sizeless-vector
to main
:
Copyright (c) 2021, Arm Limited.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Please take it from the special types range instead.
bjorn3 created PR review comment:
Riscv allows smaller vectors, right?
bjorn3 submitted PR review.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Will operations like
iadd
work just like with regular vectors?
sparker-arm submitted PR review.
sparker-arm created PR review comment:
Yes, apparently so, but the flexible vector spec does not.
sparker-arm submitted PR review.
sparker-arm created PR review comment:
Yes, I think most polymorphic instructions can just be extended for sizeless types too. I've currently implemented: iadd, isub, imul, fadd, fsub and splat.
akirilov-arm submitted PR review.
akirilov-arm created PR review comment:
Also, the flexible vectors specification might introduce the requirement that the underlying vector length is a power of 2, even though the Arm SVE and the RISC-V V extension, I suppose, do not have similar restrictions. Whether our IR should cover more than the common use cases (such as the Wasm flexible vectors, power-of-2 vector sizes, etc.) is something to discuss.
cfallin submitted PR review.
cfallin created PR review comment:
It seems to me that supporting more general expressivity at the CLIF level is a good thing to establish early on if we plan to do it at all, or else we'll bake in a lot of assumptions to code that operates on these types, over time. So non-power-of-2 sizes is something to include (and test for) early. It does seem valuable to me to support this if underlying hardware commonly does.
On the other hand, we don't want to come up with a design that requires handling difficult edge-cases; emulating smaller vector support than the fundamental unit seems like a good example of that. (Edge-cases if one were to try to emulate with bigger ops involve e.g. ensuring loads/stores don't run beyond the end of the heap, and having smaller alignments than usual.) So scaling down to 64 bits seems like an unreasonable goal just because one possible future platform supports it. So, I'd tentatively suggest a 128-bit minimum size.
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen created PR review comment:
To avoid ambiguity, we should rename
is_vector
tois_sized_vector
if we move forward with this.
sparker-arm updated PR #19 from cranelift-sizeless-vector
to main
.
sparker-arm submitted PR review.
sparker-arm created PR review comment:
Updated in latest version.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
What does "vector_scale" mean in this context? Number of basic-vector-register units in the vector value's width, or...?
cfallin created PR review comment:
We can definitely talk about designs that allow for different spill slot sizes; I can imagine a notion of "multiple spill slot spaces" (i.e., different index domains managed separately).
But I think the simplest approach here is what we had talked about before: allocate vector registers as the basic unit; they are as wide as they are, and both v128-typed values and bigger-sizeless-typed values allocate one vector register. The spillslot then needs to be as big as the actual register width but otherwise this scheme seems reasonable enough to me.
(FWIW, regalloc.rs and regalloc2 have the same design regarding spillslot abstractions and API, so the same considerations apply both before and after migration)
cfallin created PR review comment:
Key question that remains for me here just from reading the RFC: at what time do we know the size (i.e., at compile time or only at runtime -- I think in discussion you had said it will be known at compile time?), and then how is this communicated to the ABI code?
Is the size of a sizeless vector actually always given by a method on the
TargetIsa
, and this is meant to be queried by the ABI code? Or does it have to be passed into the prologue somehow by e.g. giving it a global value that loads this config from a vmctx or register or...?
sparker-arm submitted PR review.
sparker-arm created PR review comment:
This is the number of bytes of a 'sizeless' vector, allowing us to take a slot index and calculate an address.
sparker-arm submitted PR review.
sparker-arm created PR review comment:
The current proposal only gets us half way to the runtime length support, as the register allocator wants a fixed number for a the size of a spill slot for a given register class. I've tried to enable 'sizeless' as much as possible with the hope of playing with regalloc2 to enable full runtime support later. So, for now, TargetIsa would have to provide a fixed size and so TargetIsa is passed, instead of settings::Flags, when calling new on ABICalleeImpl.
sparker-arm submitted PR review.
sparker-arm created PR review comment:
I'd add that modifying the backend ABI implementations to further support runtime lengths shouldn't be complicated but there is the issue of the frame size calculation for stack size checks and offset initialization, and I haven't looked into this much.
abrown submitted PR review.
abrown created PR review comment:
@cfallin, @sparker-arm: I just caught up on all of the recent discussion above for how to implement this proposal in a platform-agnostic way, etc. It's good stuff and I think I understand a lot better your thoughts from the examples given.
I do think, though, that this point is unaddressed by the comments about creating dynamic types in the prologue: if we create a dynamic type of a certain length, how then could we change that length dynamically? @penzn included a set of
set_length
instructions (with caveats about needing more discussion). @sunfishcode had an alternate design idea here and, @sparker-arm, you've commented in the issue (https://github.com/WebAssembly/flexible-vectors/issues/21).My point is: I sense a mismatch with this dynamic type idea and
set_length
and I think we should figure that out (here or in the flexible vectors repository) before something gets designed that precludes it.
abrown edited PR review comment.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, that is a really interesting twist!
I haven't fully digested the examples in that issue but e.g. the existence of a
vec_loop
primitive together with dynamically-sized vectors makes me think that somehow there should be a lowering step involved: once the code reaches CLIF, there is no more vector-specific loop primitive, so whatever blocking or grouping that thevec_loop
does, and how it interacts with the truly dynamic and resizable size, should be lowered onto "fixed for the function invocation"-sized chunks.Also from a codegen perspective, resizable types have implications on stackframe layout: what does it mean for the regalloc if we end up having to spill
v0
, butv0
has a type whose size changes throughout the function body?So I think there is a difficulty/complexity jump from "dynamically sized, known at function entry" and "dynamically sized, changes size at arbitrary times" and we should figure out if we can lower the latter (at Wasm level) onto the former (at CLIF level) if we can. Curious if there are other ideas here though!
abrown submitted PR review.
abrown created PR review comment:
resizable types have implications on stackframe layout
Brainstorming: it feels like there are two parts here, 1) the size of the slot that needs to be spilled, which could be the "max vector size on this uarch," like the
dt0
example above, and 2) the number of lanes used by that vector, which could be modified byset_length
. Even if only the lowest few lanes are used, the entire vector can be spilled, right?
abrown submitted PR review.
abrown created PR review comment:
I guess I'm saying that the type need not be resizable, but there should be some other way to express how much of that type is being used.
abrown edited PR review comment.
cfallin submitted PR review.
cfallin created PR review comment:
Yep, that's true; maybe what we want is an extra parameter on the operations that actually use the vector, to indicate how many lanes to operate on. So the type is
Nxi32
where N is the max size (could be dynamic, but fixed at function entry); and we then do an add onMxi32
units, whereM
was set by the latestset_length
. I think this would imply tracking a separate "current length" variable for each dynamically-typed-and-resizable vector in the Wasm translator when building the SSA for CLIF. Does that seem reasonable?
abrown submitted PR review.
abrown created PR review comment:
I think Petr's idea for
set_length
was even more dynamic than that--it can accept arbitraryi32
values from the stack. I might be misunderstanding what you mean, but we wouldn't know what thatM
is until runtime, so it couldn't be fixed during translation. But maybe you mean that that "variable" in translation would be made available forset_length
to change at runtime?
cfallin submitted PR review.
cfallin created PR review comment:
Yes, exactly, it would be a variable that has an SSA
Value
at any given program point, like other Wasm locals, so truly dynamic. Then we could have something likeadd_dynamic.dt0 v0, v1, v2
wherev0
andv1
have typedt0
(which isNxi32
, set to the max possible size), andv2
is ani32
that tells the op how many lanes to operate on.
sparker-arm submitted PR review.
sparker-arm created PR review comment:
Just to note, my understanding was the vec_loop construct won't be pursued and instead predication, as @cfallin is suggesting here, will be the way to go. We can add predicated operations for those if/when it happens. As @abrown said, when it comes to spilling, I also presume predication (set_length) won't affect this, we'll still spill a whole register.
sparker-arm updated PR #19 from cranelift-sizeless-vector
to main
.
sparker-arm updated PR #19 from cranelift-sizeless-vector
to main
.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Indeed, "not allowed" rather than just "not expected" -- the semantics are as-if loaded once in the prologue.
cfallin created PR review comment:
s/accidently/accidentally/
sparker-arm updated PR #19 from cranelift-sizeless-vector
to main
.
sparker-arm updated PR #19 from cranelift-sizeless-vector
to main
.
sparker-arm updated PR #19 from cranelift-sizeless-vector
to main
.
abrown submitted PR review.
sparker-arm requested fitzgen for a review on PR #19.
sparker-arm updated PR #19 from cranelift-sizeless-vector
to main
.
sparker-arm updated PR #19 from cranelift-sizeless-vector
to main
.
penzn submitted PR review.
sparker-arm merged PR #19.
Last updated: Dec 23 2024 at 12:05 UTC