Stream: rfc-notifications

Topic: rfcs / PR #19 RFC: Cranelift sizeless vector types


view this post on Zulip RFC notifications bot (Dec 17 2021 at 10:38):

sparker-arm opened PR #19 from cranelift-sizeless-vector to main:

Copyright (c) 2021, Arm Limited.

view this post on Zulip RFC notifications bot (Dec 17 2021 at 10:40):

bjorn3 submitted PR review.

view this post on Zulip RFC notifications bot (Dec 17 2021 at 10:40):

bjorn3 created PR review comment:

Please take it from the special types range instead.

view this post on Zulip RFC notifications bot (Dec 17 2021 at 10:42):

bjorn3 created PR review comment:

Riscv allows smaller vectors, right?

view this post on Zulip RFC notifications bot (Dec 17 2021 at 10:42):

bjorn3 submitted PR review.

view this post on Zulip RFC notifications bot (Dec 17 2021 at 10:43):

bjorn3 submitted PR review.

view this post on Zulip RFC notifications bot (Dec 17 2021 at 10:43):

bjorn3 created PR review comment:

Will operations like iadd work just like with regular vectors?

view this post on Zulip RFC notifications bot (Dec 17 2021 at 11:05):

sparker-arm submitted PR review.

view this post on Zulip RFC notifications bot (Dec 17 2021 at 11:05):

sparker-arm created PR review comment:

Yes, apparently so, but the flexible vector spec does not.

view this post on Zulip RFC notifications bot (Dec 17 2021 at 11:07):

sparker-arm submitted PR review.

view this post on Zulip RFC notifications bot (Dec 17 2021 at 11:07):

sparker-arm created PR review comment:

Yes, I think most polymorphic instructions can just be extended for sizeless types too. I've currently implemented: iadd, isub, imul, fadd, fsub and splat.

view this post on Zulip RFC notifications bot (Dec 17 2021 at 11:27):

akirilov-arm submitted PR review.

view this post on Zulip RFC notifications bot (Dec 17 2021 at 11:27):

akirilov-arm created PR review comment:

Also, the flexible vectors specification might introduce the requirement that the underlying vector length is a power of 2, even though the Arm SVE and the RISC-V V extension, I suppose, do not have similar restrictions. Whether our IR should cover more than the common use cases (such as the Wasm flexible vectors, power-of-2 vector sizes, etc.) is something to discuss.

view this post on Zulip RFC notifications bot (Dec 17 2021 at 21:50):

cfallin submitted PR review.

view this post on Zulip RFC notifications bot (Dec 17 2021 at 21:50):

cfallin created PR review comment:

It seems to me that supporting more general expressivity at the CLIF level is a good thing to establish early on if we plan to do it at all, or else we'll bake in a lot of assumptions to code that operates on these types, over time. So non-power-of-2 sizes is something to include (and test for) early. It does seem valuable to me to support this if underlying hardware commonly does.

On the other hand, we don't want to come up with a design that requires handling difficult edge-cases; emulating smaller vector support than the fundamental unit seems like a good example of that. (Edge-cases if one were to try to emulate with bigger ops involve e.g. ensuring loads/stores don't run beyond the end of the heap, and having smaller alignments than usual.) So scaling down to 64 bits seems like an unreasonable goal just because one possible future platform supports it. So, I'd tentatively suggest a 128-bit minimum size.

view this post on Zulip RFC notifications bot (Jan 04 2022 at 19:02):

fitzgen submitted PR review.

view this post on Zulip RFC notifications bot (Jan 04 2022 at 19:02):

fitzgen submitted PR review.

view this post on Zulip RFC notifications bot (Jan 04 2022 at 19:02):

fitzgen created PR review comment:

To avoid ambiguity, we should rename is_vector to is_sized_vector if we move forward with this.

view this post on Zulip RFC notifications bot (Feb 01 2022 at 14:18):

sparker-arm updated PR #19 from cranelift-sizeless-vector to main.

view this post on Zulip RFC notifications bot (Feb 01 2022 at 14:26):

sparker-arm submitted PR review.

view this post on Zulip RFC notifications bot (Feb 01 2022 at 14:26):

sparker-arm created PR review comment:

Updated in latest version.

view this post on Zulip RFC notifications bot (Feb 14 2022 at 22:53):

cfallin submitted PR review.

view this post on Zulip RFC notifications bot (Feb 14 2022 at 22:53):

cfallin submitted PR review.

view this post on Zulip RFC notifications bot (Feb 14 2022 at 22:53):

cfallin created PR review comment:

What does "vector_scale" mean in this context? Number of basic-vector-register units in the vector value's width, or...?

view this post on Zulip RFC notifications bot (Feb 14 2022 at 22:53):

cfallin created PR review comment:

We can definitely talk about designs that allow for different spill slot sizes; I can imagine a notion of "multiple spill slot spaces" (i.e., different index domains managed separately).

But I think the simplest approach here is what we had talked about before: allocate vector registers as the basic unit; they are as wide as they are, and both v128-typed values and bigger-sizeless-typed values allocate one vector register. The spillslot then needs to be as big as the actual register width but otherwise this scheme seems reasonable enough to me.

(FWIW, regalloc.rs and regalloc2 have the same design regarding spillslot abstractions and API, so the same considerations apply both before and after migration)

view this post on Zulip RFC notifications bot (Feb 14 2022 at 22:53):

cfallin created PR review comment:

Key question that remains for me here just from reading the RFC: at what time do we know the size (i.e., at compile time or only at runtime -- I think in discussion you had said it will be known at compile time?), and then how is this communicated to the ABI code?

Is the size of a sizeless vector actually always given by a method on the TargetIsa, and this is meant to be queried by the ABI code? Or does it have to be passed into the prologue somehow by e.g. giving it a global value that loads this config from a vmctx or register or...?

view this post on Zulip RFC notifications bot (Feb 15 2022 at 10:13):

sparker-arm submitted PR review.

view this post on Zulip RFC notifications bot (Feb 15 2022 at 10:13):

sparker-arm created PR review comment:

This is the number of bytes of a 'sizeless' vector, allowing us to take a slot index and calculate an address.

view this post on Zulip RFC notifications bot (Feb 15 2022 at 10:16):

sparker-arm submitted PR review.

view this post on Zulip RFC notifications bot (Feb 15 2022 at 10:16):

sparker-arm created PR review comment:

The current proposal only gets us half way to the runtime length support, as the register allocator wants a fixed number for a the size of a spill slot for a given register class. I've tried to enable 'sizeless' as much as possible with the hope of playing with regalloc2 to enable full runtime support later. So, for now, TargetIsa would have to provide a fixed size and so TargetIsa is passed, instead of settings::Flags, when calling new on ABICalleeImpl.

view this post on Zulip RFC notifications bot (Feb 15 2022 at 10:44):

sparker-arm submitted PR review.

view this post on Zulip RFC notifications bot (Feb 15 2022 at 10:44):

sparker-arm created PR review comment:

I'd add that modifying the backend ABI implementations to further support runtime lengths shouldn't be complicated but there is the issue of the frame size calculation for stack size checks and offset initialization, and I haven't looked into this much.

view this post on Zulip RFC notifications bot (Feb 22 2022 at 17:10):

abrown submitted PR review.

view this post on Zulip RFC notifications bot (Feb 22 2022 at 17:10):

abrown created PR review comment:

@cfallin, @sparker-arm: I just caught up on all of the recent discussion above for how to implement this proposal in a platform-agnostic way, etc. It's good stuff and I think I understand a lot better your thoughts from the examples given.

I do think, though, that this point is unaddressed by the comments about creating dynamic types in the prologue: if we create a dynamic type of a certain length, how then could we change that length dynamically? @penzn included a set of set_length instructions (with caveats about needing more discussion). @sunfishcode had an alternate design idea here and, @sparker-arm, you've commented in the issue (https://github.com/WebAssembly/flexible-vectors/issues/21).

My point is: I sense a mismatch with this dynamic type idea and set_length and I think we should figure that out (here or in the flexible vectors repository) before something gets designed that precludes it.

view this post on Zulip RFC notifications bot (Feb 22 2022 at 17:11):

abrown edited PR review comment.

view this post on Zulip RFC notifications bot (Feb 22 2022 at 17:19):

cfallin submitted PR review.

view this post on Zulip RFC notifications bot (Feb 22 2022 at 17:19):

cfallin created PR review comment:

Ah, that is a really interesting twist!

I haven't fully digested the examples in that issue but e.g. the existence of a vec_loop primitive together with dynamically-sized vectors makes me think that somehow there should be a lowering step involved: once the code reaches CLIF, there is no more vector-specific loop primitive, so whatever blocking or grouping that the vec_loop does, and how it interacts with the truly dynamic and resizable size, should be lowered onto "fixed for the function invocation"-sized chunks.

Also from a codegen perspective, resizable types have implications on stackframe layout: what does it mean for the regalloc if we end up having to spill v0, but v0 has a type whose size changes throughout the function body?

So I think there is a difficulty/complexity jump from "dynamically sized, known at function entry" and "dynamically sized, changes size at arbitrary times" and we should figure out if we can lower the latter (at Wasm level) onto the former (at CLIF level) if we can. Curious if there are other ideas here though!

view this post on Zulip RFC notifications bot (Feb 22 2022 at 17:30):

abrown submitted PR review.

view this post on Zulip RFC notifications bot (Feb 22 2022 at 17:30):

abrown created PR review comment:

resizable types have implications on stackframe layout

Brainstorming: it feels like there are two parts here, 1) the size of the slot that needs to be spilled, which could be the "max vector size on this uarch," like the dt0 example above, and 2) the number of lanes used by that vector, which could be modified by set_length. Even if only the lowest few lanes are used, the entire vector can be spilled, right?

view this post on Zulip RFC notifications bot (Feb 22 2022 at 17:32):

abrown submitted PR review.

view this post on Zulip RFC notifications bot (Feb 22 2022 at 17:32):

abrown created PR review comment:

I guess I'm saying that the type need not be resizable, but there should be some other way to express how much of that type is being used.

view this post on Zulip RFC notifications bot (Feb 22 2022 at 17:32):

abrown edited PR review comment.

view this post on Zulip RFC notifications bot (Feb 22 2022 at 18:40):

cfallin submitted PR review.

view this post on Zulip RFC notifications bot (Feb 22 2022 at 18:40):

cfallin created PR review comment:

Yep, that's true; maybe what we want is an extra parameter on the operations that actually use the vector, to indicate how many lanes to operate on. So the type is Nxi32 where N is the max size (could be dynamic, but fixed at function entry); and we then do an add on Mxi32 units, where M was set by the latest set_length. I think this would imply tracking a separate "current length" variable for each dynamically-typed-and-resizable vector in the Wasm translator when building the SSA for CLIF. Does that seem reasonable?

view this post on Zulip RFC notifications bot (Feb 22 2022 at 18:55):

abrown submitted PR review.

view this post on Zulip RFC notifications bot (Feb 22 2022 at 18:55):

abrown created PR review comment:

I think Petr's idea for set_length was even more dynamic than that--it can accept arbitrary i32 values from the stack. I might be misunderstanding what you mean, but we wouldn't know what that M is until runtime, so it couldn't be fixed during translation. But maybe you mean that that "variable" in translation would be made available for set_length to change at runtime?

view this post on Zulip RFC notifications bot (Feb 22 2022 at 19:19):

cfallin submitted PR review.

view this post on Zulip RFC notifications bot (Feb 22 2022 at 19:19):

cfallin created PR review comment:

Yes, exactly, it would be a variable that has an SSA Value at any given program point, like other Wasm locals, so truly dynamic. Then we could have something like add_dynamic.dt0 v0, v1, v2 where v0 and v1 have type dt0 (which is Nxi32, set to the max possible size), and v2 is an i32 that tells the op how many lanes to operate on.

view this post on Zulip RFC notifications bot (Feb 23 2022 at 08:38):

sparker-arm submitted PR review.

view this post on Zulip RFC notifications bot (Feb 23 2022 at 08:38):

sparker-arm created PR review comment:

Just to note, my understanding was the vec_loop construct won't be pursued and instead predication, as @cfallin is suggesting here, will be the way to go. We can add predicated operations for those if/when it happens. As @abrown said, when it comes to spilling, I also presume predication (set_length) won't affect this, we'll still spill a whole register.

view this post on Zulip RFC notifications bot (Apr 05 2022 at 10:50):

sparker-arm updated PR #19 from cranelift-sizeless-vector to main.

view this post on Zulip RFC notifications bot (Apr 12 2022 at 15:16):

sparker-arm updated PR #19 from cranelift-sizeless-vector to main.

view this post on Zulip RFC notifications bot (Apr 21 2022 at 05:06):

cfallin submitted PR review.

view this post on Zulip RFC notifications bot (Apr 21 2022 at 05:06):

cfallin submitted PR review.

view this post on Zulip RFC notifications bot (Apr 21 2022 at 05:06):

cfallin created PR review comment:

Indeed, "not allowed" rather than just "not expected" -- the semantics are as-if loaded once in the prologue.

view this post on Zulip RFC notifications bot (Apr 21 2022 at 05:06):

cfallin created PR review comment:

s/accidently/accidentally/

view this post on Zulip RFC notifications bot (Apr 21 2022 at 08:30):

sparker-arm updated PR #19 from cranelift-sizeless-vector to main.

view this post on Zulip RFC notifications bot (Apr 21 2022 at 08:31):

sparker-arm updated PR #19 from cranelift-sizeless-vector to main.

view this post on Zulip RFC notifications bot (Apr 21 2022 at 10:50):

sparker-arm updated PR #19 from cranelift-sizeless-vector to main.

view this post on Zulip RFC notifications bot (Apr 21 2022 at 20:21):

abrown submitted PR review.

view this post on Zulip RFC notifications bot (Apr 25 2022 at 07:52):

sparker-arm requested fitzgen for a review on PR #19.

view this post on Zulip RFC notifications bot (Apr 27 2022 at 13:32):

sparker-arm updated PR #19 from cranelift-sizeless-vector to main.

view this post on Zulip RFC notifications bot (Apr 28 2022 at 07:51):

sparker-arm updated PR #19 from cranelift-sizeless-vector to main.

view this post on Zulip RFC notifications bot (Apr 28 2022 at 17:37):

penzn submitted PR review.

view this post on Zulip RFC notifications bot (May 10 2022 at 17:52):

sparker-arm merged PR #19.


Last updated: Nov 22 2024 at 16:03 UTC