wasmtime / Issue #2278 Introduce the Cranelift IR instruc... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / Issue #2278 Introduce the Cranelift IR instruc...

Wasmtime GitHub notifications bot (Oct 07 2020 at 20:19):

@abrown, would you be willing to help with this (and then we can land it all at once)?

Yes, but I might not get around to it until next week? If the delay is not a problem, I don't mind doing that so that the issue is resolved all at once. (I think I can push to this PR, right?)

Wasmtime GitHub notifications bot (Oct 07 2020 at 21:30):

cfallin commented on Issue #2278:

That sounds fine to me at least!

Wasmtime GitHub notifications bot (Oct 08 2020 at 15:30):

akirilov-arm commented on Issue #2278:

I don't mind a one week delay either.

Wasmtime GitHub notifications bot (Oct 09 2020 at 13:15):

akirilov-arm commented on Issue #2278:

I added the necessary legalization rule to the old x86 backend.

Wasmtime GitHub notifications bot (Oct 09 2020 at 13:17):

akirilov-arm edited a comment on Issue #2278:

I added the necessary legalization rule to the old x86 backend. I also removed the temporary work-around in cranelift-wasm, so the lowering logic is no longer conditional.

Wasmtime GitHub notifications bot (Oct 09 2020 at 13:18):

akirilov-arm edited a comment on Issue #2278:

I added the necessary legalization rule to the old x86 backend. I also removed the temporary work-around in cranelift-wasm, so the lowering logic is no longer target-specific.

Wasmtime GitHub notifications bot (Oct 09 2020 at 16:37):

abrown commented on Issue #2278:

@akirilov-arm, nice! So I'm tracking the only thing I need to do next week is lower LoadSplat in the x64 backend, right? (Unless you get to it first...).

Wasmtime GitHub notifications bot (Oct 09 2020 at 17:00):

akirilov-arm commented on Issue #2278:

@abrown Yes, that's the only thing remaining, and no, I don't intend to do any other x86 changes (unless I am addressing review feedback, of course).

However, I have noticed that the peephole optimizer tests are failing and according to the logs the reason seems unrelated to my changes...

Wasmtime GitHub notifications bot (Oct 13 2020 at 17:14):

abrown commented on Issue #2278:

@akirilov-arm, @cfallin: I committed an implementation of LoadSplat on x64--please take a look. Unfortunately I had to thread SourceLocs through more Inst formats in c9c8de6 and I feel there may a better way to do this so I opened #2290.

Wasmtime GitHub notifications bot (Oct 13 2020 at 17:35):

abrown commented on Issue #2278:

Looking at the failures in the CI, I think something is behaving weirdly re: Git. The lines it claims need extra parameters should have those added by c9c8de6 and ci/run-experimental-x64-ci.sh. Perhaps we should rebase on main and re-push?

Wasmtime GitHub notifications bot (Oct 13 2020 at 18:04):

akirilov-arm commented on Issue #2278:

@abrown I have just rebased, but the tests are still failing. Your changes look fine to me, but I am not particularly familiar with the x64 backend.

Wasmtime GitHub notifications bot (Oct 13 2020 at 18:14):

akirilov-arm edited a comment on Issue #2278:

@abrown I have just rebased, but the tests are still failing. Your changes look fine to me otherwise, but I am not particularly familiar with the x64 backend.

Wasmtime GitHub notifications bot (Oct 14 2020 at 12:45):

akirilov-arm commented on Issue #2278:

I am not sure why, but I have moved the new IR instruction definition to the end of the function defining all the operations in cranelift/codegen/meta/src/shared/instructions.rs and now the peephole optimizer tests are no longer failing.

cc @fitzgen

Wasmtime GitHub notifications bot (Oct 14 2020 at 13:08):

akirilov-arm commented on Issue #2278:

@abrown It looks like you missed updating a couple of Inst::xmm_rm_r and Inst::xmm_rm_r_imm call sites, probably because my branch was outdated when you started working on it. I just rebased, so would you be able to fix it?

Wasmtime GitHub notifications bot (Oct 14 2020 at 15:42):

julian-seward1 commented on Issue #2278:

I apologise for getting to this so late in the day.

I would like to request that this be reconsidered. I believe the approach discussed so far works against the design goals of the new backend framework, and that an alternative approach would be preferable.

A primary goal of CLIF, in the new backend framework, is to provide a way to conveniently do redundancy elimination. One key aspect is to provide only a single way to represent any computation. This change runs counter to those goals. For example, consider GVN: with this change in place, there is no way to GVN a single LoadSplat CLIF with an equivalent two-CLIF load and splat sequence.

Of course I want this to become a single instruction on AArch64. The instruction selection framework in the new backend has been designed from the start to allow matching multiple CLIF nodes into a single machine instruction, and so we should use that here, and not add the proposed LoadSplat CLIF node.

@akirilov-arm writes:

In order to generate it, it is necessary to merge the Load and the Splat in the backend, which is not possible because the load may have side effects.

I don't follow .. can you clarify this? What instruction are you referring to?

Wasmtime GitHub notifications bot (Oct 14 2020 at 15:55):

akirilov-arm commented on Issue #2278:

What instruction are you referring to?
Opcode::Load and Opcode::Splat. I actually tried what you are talking about and it didn't work - refer to the discussion in #1175.

Wasmtime GitHub notifications bot (Oct 14 2020 at 15:55):

akirilov-arm edited a comment on Issue #2278:

What instruction are you referring to?

Opcode::Load and Opcode::Splat. I actually tried what you are talking about and it didn't work - refer to the discussion in #1175.

Wasmtime GitHub notifications bot (Oct 14 2020 at 16:10):

julian-seward1 commented on Issue #2278:

Ah, I see. Yes, it's true; the new framework currently doesn't allow what you tried -- it only allows a load to be at the root of a pattern tree. Whereas here, the splat would be at the root.

Having said that, @cfallin was talking recently about lifting that restriction in the framework. This isn't of much significance in the AArch64 case (apart from here, obviously) since AArch64 have very few load-op instructions. But for x64, almost all instructions have a load-op form, and so it's important that the framework can support that idiom effectively. Hence it might be the case that this limitation gets lifted sooner rather than later.

Wasmtime GitHub notifications bot (Oct 14 2020 at 16:11):

julian-seward1 edited a comment on Issue #2278:

Ah, I see. Yes, it's true; the new framework currently doesn't allow what you tried -- it only allows a load to be at the root of a pattern tree. Whereas here, the splat would be at the root.

Having said that, @cfallin was talking recently about lifting that restriction in the framework. This isn't of much significance in the AArch64 case (apart from here, obviously) since AArch64 has very few load-op instructions. But for x64, almost all instructions have a load-op form, and so it's important that the framework can support that idiom effectively. Hence it might be the case that this limitation gets lifted sooner rather than later.

Wasmtime GitHub notifications bot (Oct 14 2020 at 16:46):

abrown commented on Issue #2278:

@akirilov-arm: good call, now d990dd4 should have source locations in more places. @julian-seward1: regardless of what we decide to do with this PR, you should also take a look at #2290--we need a good way to trap at all of the places an x64 instruction could load (which could be quite a few) so I suggest a possible alternative in that issue.

Wasmtime GitHub notifications bot (Oct 15 2020 at 11:14):

akirilov-arm commented on Issue #2278:

Thanks, @abrown!

@cfallin @julian-seward1 How are we going to proceed with this? The implementation is now complete (covering all three major backends) and passes all tests. Based on the discussion in #1175, I believed that this would be an acceptable approach, but even if it is not, most of my code is actually independent of whether there is a new CLIF operation or not, and I think that's the case for @abrown's code as well. Are there plans to change the new framework soon, so that it supports the type of pattern matching that would be applicable to this case?

Wasmtime GitHub notifications bot (Oct 15 2020 at 18:51):

cfallin commented on Issue #2278:

@akirilov-arm @julian-seward1 @abrown so, I'm off for this week and next (but I'll try to keep up with this when I can). At a high level, I am fine with the pragmatic approach here (and in #1175) of creating these merged opcodes for now, and then eventually removing them when we have the story for load-related code motion sorted out.

In other words, I don't necessarily think that we are so close to having the "right thing" built that we should hold back true codegen improvements in the meantime and create a dependence on a to-be-built part of the framework; I think the cost of removing these opcodes later is minimal, and can be done as part of a PR that introduces the proper pattern-matching. It's not the ideal end-state, but it lets us move forward in the meantime.

@julian-seward1, what do you think?

Wasmtime GitHub notifications bot (Oct 28 2020 at 04:09):

cfallin commented on Issue #2278:

So, @julian-seward1 and I discussed this last week; apologies for taking so long to get back to it (I'm back to work again, albeit somewhat busy with onboarding stuff.)

In general I agree with @julian-seward1 (and it seems others would be on board with) a better approach eventually that can actually handle pattern-matching load sources to instructions. This has been on my to-do list for a while; but in theory is possible if we loosen the load-motion conditions a bit.

However, I think that having the better codegen as a starting point is valuable; if we take this as-is for now, then we have a baseline whose codegen output we should match when we do have the proper pattern-matching in place.

Thus, I'm happy to take this, and I'll commit to removing the LoadSplat as part of my pattern-matching fixup later, as long as @julian-seward1 doesn't strongly object to this transitional state. Thoughts?

Wasmtime GitHub notifications bot (Oct 28 2020 at 18:55):

akirilov-arm commented on Issue #2278:

@cfallin Sounds good to me. It's always better to have a good test case before adding non-trivial functionality and, if I understand correctly, previously there were no actual use cases for the enhanced pattern matching.

Wasmtime GitHub notifications bot (Oct 28 2020 at 19:24):

julian-seward1 commented on Issue #2278:

Well, I can't say I'm enthusiastic; I think it moves the design in the wrong direction. But if it's only temporary, and since it's useful to land the rest of it anyway, I won't stand in the way.

I should add .. Today I implemented the proposed wasm instructions v128.load32_zero and v128.load64_zero, although the patch isn't on any PR yet. These are almost identical to the load-splat insns, except that all lanes other than lane zero are zeroed out, not made into clones of it. I used a two-CLIF-insn scheme -- a scalar load, and a move-scalar-to-lowest-lane-and-zero-others. Once the isel framework can do load-op merging, I'll add a matching rule so as to only generate a single insn.

So .. if/when that lands, it'll create further CLIF-level inconsistency, at least in the short term :-/

Wasmtime GitHub notifications bot (Oct 28 2020 at 19:51):

cfallin commented on Issue #2278:

@julian-seward1 understood, and yes, not an ideal situation; but I think this should be a useful test case in the sense that if a new matching framework can handle this case (allow LoadSplat to be removed while retaining identical output), it will have succeeded. I definitely want to get to that point as soon as possible :-)

I'll go ahead and approve and merge now; thanks for the discussion, all!

Last updated: Apr 18 2025 at 12:05 UTC