Stream: git-wasmtime

Topic: wasmtime / PR #2504 Draft: I128 support (partial) on x64.


view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2020 at 06:32):

cfallin opened PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2020 at 06:34):

cfallin updated PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2020 at 06:51):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2020 at 06:51):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2020 at 06:51):

bjorn3 created PR Review Comment:

Can regalloc insert an instruction in between?

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2020 at 07:36):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2020 at 07:36):

julian-seward1 created PR Review Comment:

Regalloc is specifically disallowed from inserting any instructions that change the condition codes. This is specified in comments in the regalloc.rs interface.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2020 at 07:43):

julian-seward1 edited PR Review Comment.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2020 at 07:56):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2020 at 07:56):

julian-seward1 created PR Review Comment:

@cfallin Is it really so bad that we have to duplicate these circa 50 lines per-target, to do mul.i128? It would be even shorter if rustfmt didn't insist on laying out the calls in this space-inefficient way.

There will be roughly equivalent length sequences for 128-bit left/right shifts, and for 128-bit comparisons. For 128-bit division, we'll have to call a helper on all targets. On 32-bit targets, we'll probably have to use a helper for multiplies, shifts and comparisons too.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2020 at 12:15):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2020 at 12:15):

bjorn3 created PR Review Comment:

Could this be a Result with something like struct OnlyRegError; as error type? That would give a better panic message when unwrapping.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2020 at 18:45):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2020 at 18:45):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2020 at 18:45):

bjorn3 created PR Review Comment:

                let arg_regs = put_input_in_regs(ctx, *input);

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2020 at 18:45):

bjorn3 created PR Review Comment:

                abi.emit_copy_regs_to_arg(ctx, i, arg_regs);

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2020 at 18:45):

bjorn3 created PR Review Comment:

                let retval_regs = get_output_reg(ctx, *output);
                abi.emit_copy_retval_to_regs(ctx, i, retval_regs);

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 00:47):

cfallin updated PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 02:05):

cfallin updated PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 02:07):

cfallin updated PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 02:51):

cfallin updated PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 02:56):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 02:56):

cfallin created PR Review Comment:

Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 02:58):

cfallin updated PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 05:14):

cfallin updated PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 05:15):

cfallin updated PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 06:48):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 06:48):

bjorn3 created PR Review Comment:

StructArgument indicates that the argument is a pointer to a piece of memory with the given size that needs to be passed as on the stack in the arguments area. On the caller side you will need to memcpy it to the right place, on the callee side you need to get the address.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 06:51):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 06:51):

bjorn3 created PR Review Comment:

The old backend didn't handle this either, but the system-v abi defines a specific register for this argument. In addition kn x86_64 at least I think you also need to regurn this value in a different register. (I guess to reduce the need for saving it on the caller side.)

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 07:36):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 07:36):

cfallin created PR Review Comment:

OK, yes, this is making more sense now. Do you think you would be up for attempting an implementation? (My bandwidth is somewhat stretched thin at the moment, but I can come back to this at ... some point, eventually, if needed. Incidentally this is similar to what I think we also need for Windows fastcall, which is one of the other remaining new-backend TODOs...).

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 07:36):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 07:36):

julian-seward1 created PR Review Comment:

Would it be preferable here to key it on the word size of the target that the resulting CL will be compiling for? Is that even possible, given the limitations of the Rust config etc system?

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 07:37):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 07:37):

cfallin created PR Review Comment:

Such an option does exist, but the wrench in the works is cross-compilation -- e.g. clif-utils is normally built with all targets enabled (so we need to support arm32 compilation even on an x64 host, etc).

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 07:38):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 07:38):

cfallin created PR Review Comment:

(Ah, sorry, actually I misread your comment -- pointer size of target, not host; I don't think we have a config option for that as we have custom features for each backend.)

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 09:01):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 14 2020 at 09:01):

bjorn3 created PR Review Comment:

            emit_reloc(sink, state, Reloc::ElfX86_64TlsGd, symbol, -4);
            sink.put4(0);

view this post on Zulip Wasmtime GitHub notifications bot (Dec 28 2020 at 08:18):

cfallin updated PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 28 2020 at 08:38):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 28 2020 at 08:38):

cfallin created PR Review Comment:

Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Dec 28 2020 at 21:52):

cfallin updated PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 29 2020 at 02:06):

cfallin updated PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 29 2020 at 08:51):

cfallin updated PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 29 2020 at 10:01):

cfallin updated PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 29 2020 at 10:04):

cfallin updated PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 29 2020 at 22:12):

cfallin updated PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2021 at 05:43):

cfallin updated PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2021 at 05:58):

cfallin updated PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2021 at 06:02):

cfallin updated PR #2504 from multi-reg-result-2 to main:

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 05 2021 at 17:40):

cfallin closed without merge PR #2504.


Last updated: Nov 22 2024 at 17:03 UTC