Stream: git-wasmtime

Topic: wasmtime / issue #3783 [RFC] ISLE: Lowering of multi-outp...


view this post on Zulip Wasmtime GitHub notifications bot (Feb 09 2022 at 12:36):

github-actions[bot] commented on issue #3783:

Subscribe to Label Action

cc @cfallin, @fitzgen

<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64", "isle"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Feb 15 2022 at 13:20):

uweigand commented on issue #3783:

So I think a lower_zero, lower_one, maybe a special lower_two for instructions that statically produce exactly two results (e.g. iadd_ifcout-alikes) and lower_variable for calls (even if they produce 0, 1, or 2 results, this is known only at compile time) would make sense. Thoughts?

I guess what's not quite clear about this to me is just where that distinction would be introduced. If we have different lowering functions with different return types, we won't be able to handle all of them with a single lower_common handler, as the latter depends on the particular return type.

So should we have a lower_common that takes four different lower routines, and then internally decides which of them to use (based on the opcode, I guess?), and then handles each of them separately according to their return value?

Or should lower_common be modified to handle different return types, based on some template magic I guess, and then the various back-ends could use different instantiations, and chose by themselves which version to use (again based on the opcode, but now probably in the already-existing switch statements in the back-ends?)?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 17 2022 at 02:06):

cfallin commented on issue #3783:

I guess what's not quite clear about this to me is just where that distinction would be introduced. If we have different lowering functions with different return types, we won't be able to handle all of them with a single lower_common handler, as the latter depends on the particular return type.

So should we have a lower_common that takes four different lower routines, and then internally decides which of them to use (based on the opcode, I guess?), and then handles each of them separately according to their return value?

Or should lower_common be modified to handle different return types, based on some template magic I guess, and then the various back-ends could use different instantiations, and chose by themselves which version to use (again based on the opcode, but now probably in the already-existing switch statements in the back-ends?)?

This is a good question; the general principle would be to dispatch somewhere before we enter ISLE code so we end up with the type-safety once we get to specific rules. My first instinct is to have something like

trait IsleEntrypoints {
    fn lower_zero(...) -> Option<()>;
    fn lower_one(...) -> Option<ValueRegs>;
    fn lower_two(...) -> Option<(ValueRegs, ValueRegs)>;
    fn lower_variable(...) -> Option<SmallVec<[ValueRegs; 4]>>;
}

to replace the IF closure we currently parameterize lower_common on, and then

/// Arity (number of result values) for an instruction. Note that we make special distinctions
/// for:
/// - zero-result instructions: these are used for their side-effects only;
/// - one-result instructions: the most common kind, computing one new value;
/// - two-result instructions: some instructions produce two fixed outputs, like addend and carry;
/// - variable-result instructions: instructions like calls that produce some number of results different
///   for each instance of the instruction. Note that even a call with 0/1/2 results, dynamically, is still
///   `Variable`.
enum InstArity {
    Zero,
    One,
    Two,
    Variable,
}

impl InstData {
    // ...
    fn arity(&self) -> InstArity { ... }
}

Now that I've sketched that out, though, I'm wavering a bit on whether this is really better than just a SmallVec at the lower_common level (as long as the SmallVec is properly sized; 2 ValueRegs is probably optimal, since multivalue calls are rare and a too-large inline vec will cause measurable memcpy cost?), and then let each backend dispatch into specialized entry points as desired. @fitzgen any thoughts here?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 17 2022 at 18:23):

fitzgen commented on issue #3783:

Don't we already know the number of expected results, and therefore don't need the InstArity that you've sketched?

https://github.com/bytecodealliance/wasmtime/blob/37b0fd482d5ba2fafa3c8f73ab366e739116c58d/cranelift/codegen/src/isa/x64/lower.rs#L1189-L1191

view this post on Zulip Wasmtime GitHub notifications bot (Feb 17 2022 at 18:35):

cfallin commented on issue #3783:

The additional information needed is whether the number of outputs is fixed or variable; e.g. for call it may happen to have one result in this (and most) instances of the instruction, but it still needs the variable path.

The idea with the Arity enum was to allow a clean one-to-one match that dispatches into the trait but I suppose we could also have something like has_variable_results(). (Curiously, we do have inst_variable_args that allows this fixed/variable distinction on the inputs side, but nothing equivalent for results.)

Diving further into this rabbithole though I'm starting to think that just dispatching in the platform-specific glue into its ISLE toplevel is maybe a better answer: match on op, do a special entrypoint for calls (calls are special anyway in lots of ways), nothing else should be variable so then invoke lower_{zero,one,two} based on actual result count as you suggest; and return a SmallVec to lower_common...

view this post on Zulip Wasmtime GitHub notifications bot (Feb 17 2022 at 18:38):

fitzgen commented on issue #3783:

Diving further into this rabbithole though I'm starting to think that just dispatching in the platform-specific glue into its ISLE toplevel is maybe a better answer: match on op, do a special entrypoint for calls (calls are special anyway in lots of ways), nothing else should be variable so then invoke lower_{zero,one,two} based on actual result count as you suggest; and return a SmallVec to lower_common...

SGTM :+1:

view this post on Zulip Wasmtime GitHub notifications bot (Feb 17 2022 at 18:46):

uweigand commented on issue #3783:

Diving further into this rabbithole though I'm starting to think that just dispatching in the platform-specific glue into its ISLE toplevel is maybe a better answer: match on op, do a special entrypoint for calls (calls are special anyway in lots of ways), nothing else should be variable so then invoke lower_{zero,one,two} based on actual result count as you suggest; and return a SmallVec to lower_common...

If you always return a SmallVec to lower_common, then what's even the benefit of multiple lower constructors? Can't we then just go back to my approach (except with SmallVec instead of Vec) plus (the upcoming) default conversions?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 17 2022 at 18:52):

cfallin commented on issue #3783:

If you always return a SmallVec to lower_common, then what's even the benefit of multiple lower constructors? Can't we then just go back to my approach (except with SmallVec instead of Vec) plus (the upcoming) default conversions?

The advantage of at least having the distinction at the ISLE layer is type safety, mainly -- we can at least encode the restriction that, say, iadd should result in just one Value in the lowering rules, even if the toplevel driver is generic. It also keeps the rules in a form that is easier to invoke in a more efficient way if we do want/need to do something better than the generic approach in the future.

I'm not completely tied to this of course, and the way we're thinking of doing this now it would be a per-backend decision, so IMHO at least it'sOK to do the fully generic thing in s390x if you feel really strongly here.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 17 2022 at 18:52):

cfallin edited a comment on issue #3783:

If you always return a SmallVec to lower_common, then what's even the benefit of multiple lower constructors? Can't we then just go back to my approach (except with SmallVec instead of Vec) plus (the upcoming) default conversions?

The advantage of at least having the distinction at the ISLE layer is type safety, mainly -- we can at least encode the restriction that, say, iadd should result in just one Value in the lowering rules, even if the toplevel driver is generic. It also keeps the rules in a form that is easier to invoke in a more efficient way if we do want/need to do something better than the generic approach in the future.

I'm not completely tied to this of course, and the way we're thinking of doing this now it would be a per-backend decision, so IMHO at least it's OK to do the fully generic thing in s390x if you feel really strongly here.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 17 2022 at 19:00):

uweigand commented on issue #3783:

The advantage of at least having the distinction at the ISLE layer is type safety, mainly -- we can at least encode the restriction that, say, iadd should result in just one Value in the lowering rules, even if the toplevel driver is generic.

But lower_common would continue to verify that it got the correct number of outputs, so I'm sure I see what additional benefit this provides? If you have multiple constructors, you instead have the potential source of error of calling the wrong constructor for some opcode ...

It also keeps the rules in a form that is easier to invoke in a more efficient way if we do want/need to do something better than the generic approach in the future.

I'm not completely tied to this of course, and the way we're thinking of doing this now it would be a per-backend decision, so IMHO at least it's OK to do the fully generic thing in s390x if you feel really strongly here.

I don't really feel strongly either ... I guess once the default conversion patch is in, I can update this PR to use those (and switch to a SmallVec), then we can see how it looks.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 17 2022 at 19:02):

uweigand edited a comment on issue #3783:

The advantage of at least having the distinction at the ISLE layer is type safety, mainly -- we can at least encode the restriction that, say, iadd should result in just one Value in the lowering rules, even if the toplevel driver is generic.

But lower_common would continue to verify that it got the correct number of outputs, so I'm not sure I see what additional benefit this provides? If you have multiple constructors, you instead have the potential source of error of calling the wrong constructor for some opcode ...

It also keeps the rules in a form that is easier to invoke in a more efficient way if we do want/need to do something better than the generic approach in the future.

I'm not completely tied to this of course, and the way we're thinking of doing this now it would be a per-backend decision, so IMHO at least it's OK to do the fully generic thing in s390x if you feel really strongly here.

I don't really feel strongly either ... I guess once the default conversion patch is in, I can update this PR to use those (and switch to a SmallVec), then we can see how it looks.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 24 2022 at 11:51):

uweigand commented on issue #3783:

I guess once the default conversion patch is in, I can update this PR to use those (and switch to a SmallVec), then we can see how it looks.

OK, just pushed an updated version. The changes to the platform lower.isle files actually look very reasonable to me now, the default conversions can indeed handle nearly everything. The only nontrival changes are about iadd_ifcout (which is an improvement anyway), and replacing value_regs_none by side_effect (only used by s390x, and I think also an improvement).

What do you think?


Last updated: Jan 24 2025 at 00:11 UTC