Stream: git-wasmtime

Topic: wasmtime / PR #4953 egraph-based midend: draw the rest of...


view this post on Zulip Wasmtime GitHub notifications bot (Oct 04 2022 at 06:13):

cfallin edited PR #4953 from mid-end to main:

This PR is a draft of an updated version of the egraph patch (and thus supersedes #4249) with the two parts already merged (multi-etors and the egraph crate proper) removed; it includes the Cranelift integration, the egraph build (CLIF to egraph) and elaboration (egraph to CLIF) algorithms, and rule application engine, as well as a set of rewrite rules that replaces the existing mid-end optimizations.

It still needs a bit more productionizing:

The purpose of this draft PR is to be a place to do this work on a rebased and up-to-date basis. (Lots happened since the original egraph work branched off in May, including incremental compilation and a good number of smaller changes.)

While patch-wrangling this week, I tried pulling this apart into smaller pieces, but the remaining bits are pretty cyclically entangled, and/or some of the intermediate points that might make sense (e.g. egraph build and elaboration without rule application) require re-synthesizing some scaffolding that would then disappear in the final state, so that seems a bit counterproductive. Once we have a polished state I can try pulling it apart into separate logical commits at least.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 04 2022 at 17:02):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 04 2022 at 18:11):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 04 2022 at 21:22):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 04 2022 at 21:31):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 05 2022 at 00:42):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 05 2022 at 00:42):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 05 2022 at 00:42):

jameysharp created PR review comment:

I'm a little confused about this optimized flag.

I guess I'd like to at least see a comment about why fn optimize is pub, perhaps similar to the one on compile_stencil.

If fn optimize were not public, would it still be necessary to guard against it being called multiple times? Under what circumstances would it get called again?

Should the "already optimized" check be at the beginning of optimize instead of in compile_stencil?

I wonder, would it make sense to set self.optimized at the beginning of fn optimize? If an error occurs, I wouldn't expect the optimizer would work any better if you run it a second time. Maybe after failed optimization one might want to retry compilation without optimization?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 05 2022 at 00:42):

jameysharp created PR review comment:

Why is it necessary to re-run all of these passes after three of the four have just been run? Could you let the earlier DCE pass run if isa.flags().use_egraphs() || opt_level != OptLevel::None and then not need any of these?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 05 2022 at 00:42):

jameysharp created PR review comment:

Is "at the cost of a longer compile time" still true?

I think the main "cost" of turning this flag on right now is that there's a lot of room for bugs still.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 05 2022 at 00:42):

jameysharp created PR review comment:

I'm concerned about marking Ieee32 and Ieee64 as Ord. Their PartialOrd implementation really is partial, since it's based on IEEE754 ordering. We might need to split these into two versions: one that respects IEEE754 except for preserving exact NaN representations, and another where ordering and equality are defined entirely by bitwise representation. I had a really hard time balancing those two competing goals recently.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 05 2022 at 00:42):

jameysharp created PR review comment:

I really appreciate you making sure that git grep 'enum InstructionData' finds the right thing!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 05 2022 at 00:42):

jameysharp created PR review comment:

Now you have cranelift-egraph in this list in two places. I think that would only be necessary if there's a cyclic dependency between crates, and I think Cargo would refuse to build that... right?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 05 2022 at 01:06):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 05 2022 at 01:06):

cfallin created PR review comment:

Different list actually! There is the PUBLIC_CRATES list (the line here) and the CRATES_TO_PUBLISH list and they are different for reasons I don't fully grok, but the crate needs to be in both for the publish script to be happy.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 05 2022 at 01:12):

jameysharp created PR review comment:

Ohhh, that makes sense.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 05 2022 at 01:12):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 06 2022 at 21:33):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 06 2022 at 21:33):

cfallin created PR review comment:

Good point -- fixed!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 06 2022 at 21:33):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 06 2022 at 21:34):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 06 2022 at 21:34):

cfallin created PR review comment:

Yes, this was an attempt to make the API "smarter" in some sense, but I agree that the don't-re-optimize behavior is in the end just unnecessary complexity. I've marked optimize() as "only for testing" in the doc-comment and removed the optimized flag.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 06 2022 at 21:34):

cfallin created PR review comment:

Removed! This was needed for an earlier egraph implementation but no longer used, so I've removed all of the now-extraneous Ord derivations on IR types.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 06 2022 at 21:34):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 06 2022 at 21:34):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 06 2022 at 21:34):

cfallin created PR review comment:

Yep, that bit of text is outdated! Updated to "is currently considered experimental" instead.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 01:06):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 01:06):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 01:06):

jameysharp created PR review comment:

I'd have guessed that if you're handed a DominatorTree that says this block has an immediate dominator, then it would be an error if that dominating instruction is not anywhere in the layout. Is there a situation where that's actually expected? If not, I think I'd prefer an unwrap on the call to inst_block.

BTW this module's use of Default makes me very happy, so thank you. :grin: Also I quite like the simplicity and elegance of the implementation.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 01:06):

jameysharp created PR review comment:

I think this should pass the borrow-checker; what do you think of it?

        self.block.expand().map(|block| {
            self.block = self.domtree.nodes[block].next;
            block
        })

Otherwise I'd at least prefer this version, as I have a preference for avoiding calls to unwrap if it's easy to prove they'll never fail:

        if let Some(block) = self.block.expand() {
            self.block = self.domtree.nodes[block].next;
            Some(block)
        } else {
            None
        }

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 01:06):

jameysharp created PR review comment:

The equivalent function on Vec is called dedup; would you mind renaming it to match that?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 01:06):

jameysharp created PR review comment:

Just to check, does SourceLoc need Ord for something or was that also left over from earlier versions?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 01:06):

jameysharp created PR review comment:

Would you mind double-checking for me that you need Analysis::Value to implement all three of Debug, Clone, and Default? I naively wouldn't have expected Default to be necessary since initial analysis results should come from Analysis::for_node.

I assume both Default and Clone are because of the requirements for SecondaryMap.

Is it useful to decide at runtime whether to do analysis or not? If you know statically whether you'll need an analysis, then I'd prefer to have a trivial implementation of Analysis on () (or perhaps on PhantomData<L>, if necessary) where the Value associated type is also (). That way you can store both A and A::Value without wrapping them in Option, and they won't occupy any memory because they're zero-sized types.

Then you could declare EGraph::nodes as a vector of (L::Node, A::Value), so that it's clear at the type level that if a node has been added, its analysis result exists too. And then you shouldn't need Default or Clone.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 01:06):

jameysharp created PR review comment:

Nit: missing space in "other".Any

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 17:21):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 17:21):

cfallin created PR review comment:

Would you mind double-checking for me that you need Analysis::Value to implement all three of Debug, Clone, and Default? I naively wouldn't have expected Default to be necessary since initial analysis results should come from Analysis::for_node.

I assume both Default and Clone are because of the requirements for SecondaryMap.

Indeed, they both come from SecondaryMap's constraints on its item type V; verified just now by attempting to remove them.

It turns out Debug was only needed for, well, my debugging (some logs I had in place); those are no longer present so I've removed that constraint. We can always put it back if we want to check in some persistent logs, I think.

Then you could declare EGraph::nodes as a vector of (L::Node, A::Value), so that it's clear at the type level that if a node has been added, its analysis result exists too. And then you shouldn't need Default or Clone.

There is a nice appeal to this for sure! I actually had built the thing this way originally. I'm hesitant to go back to it chiefly because it makes the API for for_node more awkward: we need to provide it access to the analysis values for prior nodes (so it can depend on e.g. arguments' analysis values) and I don't like the idea of exposing internal data-structure details, like a tuple-of-node-and-value. Unfortunately we can't parameterize it on the "getter" because we can't provide a function trait-constrained type parameter to the trait method. And we can't just pass in the whole EGraph because the EGraph is parameterized on the particular Analysis and if I try to use Self there, I run into issues with unsized types (??). Basically, it becomes a type-system mess. So I'd kind of prefer to at least run with what we have for now and maybe explore more advanced solutions here later, if possible :-)

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 17:21):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 17:22):

cfallin created PR review comment:

Nope, removed!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 17:22):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 17:22):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 17:22):

cfallin created PR review comment:

Fixed!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 17:22):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 17:22):

cfallin created PR review comment:

Much better, thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 17:22):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 17:22):

cfallin created PR review comment:

Yep, I've added an expect instead; we shouldn't have dangling instructions at this point. Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 17:23):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 17:23):

cfallin created PR review comment:

Actually it turns out this was completely unused, as was sort above it (they were left over from an earlier approach that stored ... parent pointers, I think? ... in EntityLists, and used sort/dedup to do merging -- parent pointers are gone now!). Removed this part of the diff.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 23:56):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 01:31):

jameysharp created PR review comment:

I just had a go at doing this, and while I think it's feasible (with some ?Sized constraints to allow unsized types if needed), the current implementation is tangled up enough in assumptions about analysis working this way that I don't want to try harder right now. So yes, let's come back to this later. :laughing:

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 01:31):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 02:07):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 02:07):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 02:07):

jameysharp created PR review comment:

This confused me a moment because I mixed up Inst with InstructionData, and I wondered why you were storing the whole instruction. I think it was the use of the phrase, "The original instruction", that threw me.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 02:07):

jameysharp created PR review comment:

I think I'd put id_eq and hash_id on UnionFind instead of on NodeCtx since they don't use anything from the latter. It's only the plural versions (ids_eq, hash_ids) that need self.args. Not important, just a thought.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 02:07):

jameysharp created PR review comment:

I'm not a huge fan of the pattern of stripping off a layer of reference by sticking an & at the beginning of a pattern, and then putting it back with the ref keyword. I rather like Node::Pure { op, args, types } here, for instance. But I don't feel anywhere near strong enough to, y'know, fight about it or whatever.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 02:07):

jameysharp created PR review comment:

I appreciated the comment above that types are fully determined by op and args. I think it'd be nice if there's a corresponding comment here explaining why that isn't also true for Load ops. I assume it's that the type is the only thing indicating the width of the load; multiple loads with the same address can have different widths and must not be deduplicated.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 02:07):

jameysharp created PR review comment:

I don't see the same sort of guard here in at_level, for saturating at less than "infinity", that I see in the implementation for add. Is that okay?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 07:13):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 07:13):

jameysharp created PR review comment:

I think this map/filter/map is equivalent to a single .filter(|&lp| self.loop_header(lp) == block)… isn't it?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 07:13):

jameysharp created PR review comment:

I think this rule should be deleted: you have below that x^x is zero, which is correct, but this rule says it's also equal to x.

I also was surprised to not find rules for multiplication or bitwise-and by zero. Would that be unsound because it could delete integer divisions that might otherwise trap? If so I'm curious if there's an easy way to preserve the traps while otherwise deleting all the arithmetic in subexpressions whose results get annihilated.

But also, then, is the x^x=0 rule okay? Or other rules like x^not(x), where the right-hand side is a constant?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 07:13):

jameysharp created PR review comment:

I'm tempted to write this as LoopLevel::root().inc() instead of enshrining the knowledge that the result is 1.

More broadly, you could write this without a stack, though I'm not sure if the result is better. Do one loop to chase parents to the root or the first loop that's already been assigned. Count how many iterations that loop did and add to the level of the upper-most loop reached. Then start over at lp and chase the parent pointers again to fill in the loop depths for all the loops visited the first time.

I think that might be a little easier for me to understand than the pseudo-recursion. Maybe?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 07:13):

jameysharp created PR review comment:

I guess this function does something reasonable even if self equals other: it doubles the list, right? I was thinking there should be an assertion that the two lists are different but I guess that isn't necessary.

I would be inclined to use copy_within rather than a hand-rolled loop full of bounds checks and unwraps. You can actually guarantee the copy is non-overlapping, but I don't think it's worth using an unsafe block here for whatever gains that brings.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 07:13):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 07:13):

jameysharp created PR review comment:

How do you feel about using map_or instead of map+unwrap_or?

Also about maybe using innermost_loop, like in is_loop_header.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 07:13):

jameysharp created PR review comment:

I'm curious, why pick 0x80 as the invalid level, instead of 0xFF?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin created PR review comment:

Updated comment, thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin created PR review comment:

Good idea, done!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin created PR review comment:

So in isolation I'd definitely agree! However I tend to like the explicit match-the-borrow form in other cases, so we don't get Copy types that nonetheless have to be dereferenced. E.g. elsewhere there is a &Node::Inst { inst, .. } -- if we instead did Node::Inst { inst, .. } then we'd need *inst everywhere. Here I'm trying to remain consistent with that form elsewhere.

(I also think that really confusing errors can arise from this bit of automatic/ergonomic magic and I tend to fall back to pre-2018 Rust out of habit as a result, but that's another discussion :-) )

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin created PR review comment:

Done!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin created PR review comment:

Good point, fixed! I added the finite() helper here for this.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin created PR review comment:

Actually this was another addition from earlier work that is no longer used so I've gone ahead and deleted it. Sorry about that!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin created PR review comment:

Good catch -- x^x does not actually equal x at all, except in one rare case :-)

Multiplication by zero

Added!

bitwise-and by zero

Added!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin created PR review comment:

Goodness, yes, I don't know why I wrote it that way!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin created PR review comment:

Ah, nice, I didn't know about map_or! Done (and using the helper).

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin created PR review comment:

Yep, I agree that using inc() once we've defined it is cleaner here.

Re: alternative algorithm -- maybe, but the details aren't completely clear to me; happy to take a followup PR (or a suggestion here) if you can sketch it out? Although if I'm understanding what you've proposed correctly, it sounds like it may have quadratic worst-case (the algorithm here sets the level on all loops it visits as it comes back down the stack).

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin created PR review comment:

To be honest, I have no idea -- lost to the mists of time! I've reworked this to use a sentinel of 0xff and hopefully be a little clearer.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 19:52):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:28):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:28):

jameysharp created PR review comment:

Would you add comments explaining what each of these states mean? In particular, the module comment doesn't talk about the before/after instruction states. After reading the whole module, I still don't understand exactly what invariant these states are supporting.

When I saw "Top", I assumed that would correspond to the "unknown" state described in the module comment, so I was a little surprised it wasn't called "Bottom". But apparently the "unknown" state is actually BeforeInst, judging by the implementation of meet_from. I'd suggest renaming "Top" to something more descriptive and less lattice-y.

Better yet, remove "Top" entirely. Use Option<LastStores> during compute_block_input_states, and .unwrap() it at the beginning of the loop, because before any block is added to the queue, it first has its state set to something else. And compute_load_last_stores can just skip blocks where the block_input is None, because those blocks are unreachable.

I'd also use a derived implementation of Default: just add the #[default] attribute to whichever variant you want. If you remove "Top", then you could use "Entry" as the default variant, and remove the LastStores::entry method in favor of LastStores::default.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:28):

jameysharp created PR review comment:

I see three opcodes for which can_store is true that don't have a MemFlags field: Debugtrap, DynamicStackStore, and StackStore. I can understand treating Debugtrap like a memory fence, but shouldn't stack stores just be categorized as "other"? I assume that's what a load whose address comes from a StackAddr op would be, anyway.

For that matter, should has_memory_fence_semantics be updated to return true for Debugtrap? Then maybe we could treat as "other" any instruction that 1) is not a memory fence, 2) can store, and 3) has no memflags.

I guess there's currently no must-alias analysis for stack loads and stores. I assume that would be useful for future work, right?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:28):

jameysharp created PR review comment:

compute_block_input_states never uses self, and compute_load_last_stores only uses self to write into the load_mem_state map. If you move both functions to top-level, drop their self parameters, and make compute_load_last_stores return a new map, then you can write this function this way, which I think is more clear:

        let block_input = compute_block_input_states(func, cfg);
        let load_mem_state = compute_load_last_stores(func, block_input);
        AliasAnalysis {
            load_mem_state,
        }

Also: why are those two functions marked #[inline(never)]?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:28):

jameysharp created PR review comment:

I'd like a comment here on why this is currently limited to the Load opcode specifically, rather than any instruction with that format. I assume it's something about store-to-load forwarding requiring that the load and store access the same memory width, and also have the same Value type, so that the value passed to the store can be substituted for the load. (I imagine future work could include replacing widening loads with sign- or zero-extend instructions given a suitable narrow store.) But it's not obvious to me how this interacts with redundant-load elimination.

This is the only call to get_load_input_state. It looks up the InstructionData twice: once to check that its opcode is Load, and a second time in get_load_input_state to dig out the memflags.

I'd rather see something like this, with corresponding changes to get_load_input_state. The main reason I'd prefer this is that it statically proves that mem_state can never be MemoryState::AfterInst(inst), which seems like it would be non-sensical. (The load sees the state after itself?)

                if let InstructionData::Load { opcode: Opcode::Load, flags, .. } = func.dfg[inst] {
                    let mem_state = state.get_load_input_state(flags);

One step further would be to replace get_load_input_state with a function like fn for_flags(&mut self, flags: MemFlags) -> &mut MemoryState, which can be shared between here and LastStores::update. The use here doesn't need mutability and it's generally bad style to get a mutable borrow you'll only read from. But state has to be mutable anyway to update it below, and this is a fairly small source file, so I think it's worth just eliminating the duplication.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:28):

jameysharp created PR review comment:

Does this work?

                let succ_first_inst = func.layout.first_inst(succ).unwrap();

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:28):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:28):

jameysharp created PR review comment:

Nit: I prefer to use .copied() instead of .cloned() for types that are Copy. That serves as documentation, checked at compile-time, that we expected the copy to be cheap.

There are several other calls to .clone() in this module that I'd prefer to remove for the same reason. They're all on either MemoryState or LastStores, which are both Copy.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:28):

jameysharp created PR review comment:

I think this should work just as well:

        let mut block_input = SecondaryMap::with_capacity(func.dfg.num_blocks());

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:28):

jameysharp created PR review comment:

After the end of this loop it'd be nice if state is not mutable, so the compiler will catch any bugs that would be caused by using different states in different successor blocks.

We could use Iterator::fold for that:

            let state = func.layout.block_insts(block).fold(state, |mut state, inst| {
                state.update(func, inst);
                log::trace!("after {}: state is {:?}", inst, state);
                state
            });

Or just let state = state; after the loop.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:28):

jameysharp created PR review comment:

Nit: this queue is being used as a stack. I like todo or perhaps worklist as names that don't have quite the same data structure connotations.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:28):

jameysharp created PR review comment:

I'm not sure this comment is still accurate. This alias analysis does do some sort of fixpoint analysis, and doesn't check dominance anywhere.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:28):

jameysharp created PR review comment:

Should this say something more like:

if at a given load or store, we look backward to the "last store" of the same type, *AND* we find that it has exactly the same address expression,

As it stands, it sounds like we can't conclude must-alias if there's been an intervening store of a different type.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:28):

jameysharp created PR review comment:

Nit: I believe the Display impl on Inst is equivalent to formatting "inst{}" with inst.index():

                    "alias analysis: scanning at {} with state {:?} ({:?})",
                    inst,

There are other trace! invocations in this module (and maybe elsewhere?) with the same pattern. Block also formats with "block{}", for example. This applies to any type that uses the two-argument version of the entity_impl! macro, as almost everything in cranelift/ does. (Though nothing in crates/ does.)

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:28):

jameysharp created PR review comment:

If you get rid of MemoryState::Top it becomes pretty straightforward to have meet_from return whether any of the states changed, which could simplify compute_block_input_states:

    fn update(&mut self, other: &LastStores, loc: Inst) -> bool {
        let meet = |a: &mut MemoryState, b: MemoryState| -> MemoryState {
            if *a == b {
                false
            } else {
                *a = MemoryState::BeforeInst(loc);
                true
            }
        };

        meet(&mut self.heap, other.heap)
            | meet(&mut self.table, other.table)
            | meet(&mut self.vmctx, other.vmctx)
            | meet(&mut self.other, other.other)
    }

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:36):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:36):

jameysharp created PR review comment:

Wait, so it _is_ okay to delete instructions that might trap? Where would I find something describing whether Cranelift is allowed to do optimizations like that?

If so you could add x|not(x)=-1 and x&not(x)=0, I guess.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:46):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:46):

cfallin created PR review comment:

Oh, no, we definitely don't delete any side-effecting instructions (aside from redundant loads but that's separate). I'm not sureI understand why including x*0 == 0 or x&0 == 0 has any effect on that?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:48):

cfallin edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:56):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 20:59):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:02):

cfallin created PR review comment:

I think that updated version is actually subtly mismatching how the analysis works, and the version in the code is correct: the "last store" analysis doesn't consider store types, so we really are looking a the "last store" to the category, regardless of type or address. Then we compare address and type ("it has exactly the same address expression and type") and forward only if those match.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:02):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:03):

cfallin created PR review comment:

And specifically "an intervening store of a different type" would be caught by the logical condition described in the code, but not the modified text above.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:03):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:27):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:27):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:27):

cfallin created PR review comment:

Right, it's not fully a meet-lattice (or a proper "meet" binary operator), because meet takes a location and generates a different "bottom" value for each location.

Or more properly, it is a family of meet semilattices, indexed on location; each location actually meets over a slightly different semilattice with its own private "bottom".

All that's a fancy and fairly useless way of saying that we want to not fall back to a generic "unknown, we can't conclude anything" state: we can actually still name the state (every program point has some symbolic state), we just don't have a better name for it or a proof that it is equal to any other state in the program.

So -- given that it's definitely not a meet-semilattice -- your suggestions are great actually and I've updated things (and added doc-comments). Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:27):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:27):

cfallin created PR review comment:

I'd like a comment here on why this is currently limited to the Load opcode specifically, rather than any instruction with that format. I assume it's something about store-to-load forwarding requiring that the load and store access the same memory width, and also have the same Value type, so that the value passed to the store can be substituted for the load. (I imagine future work could include replacing widening loads with sign- or zero-extend instructions given a suitable narrow store.) But it's not obvious to me how this interacts with redundant-load elimination.

Yep, that's exactly it. I've added a comment to that effect.

The interaction with RLE is that the RLE pass performs an action only if this analysis provides a "memory state" for an instruction, so if we withhold that state, then the analysis is effectively saying "this is not an analyzable load".

[removing redundant match, and for_flags update]

Awesome, I like this much better -- done!

(I think this is a Lens in the Haskell sense, or at least a Functor, but I'm gonna look away from the abyss before it swallows me here)

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:32):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:32):

cfallin created PR review comment:

[handle stack stores specially]

Yep, we definitely could do this. None of the benchmarks I had been developing with use them (actually, cranelift-wasm doesn't use them at all) so I didn't have an easy way to test or evaluate the effect of this; it can definitely be done as future work.

[treat some instructions only as "other"]

Unfortunately I don't think this is correct in all circumstances: an instruction about which we don't know anything has to clobber all segments of the state, not just "other" (i.e., it could alias with any of the categories).

It's possible that reasoning about the specific instruction formats that exist could lead us to something more precise here, but I really don't want to go down that road, because it bakes in knowledge of the opcode set as static and unchanging, and thus sets up an action-at-a-distance timebomb for when we later add some other instruction kind that involves memory. I'd rather have fallback logic here that basically says "if we don't know, clobber everything" -- it's the safe and obviously-correct (for some value of "obvious") approach :-)

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:44):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:44):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:44):

cfallin created PR review comment:

It's almost like these convenience functions exist to increase my convenience! I should know our APIs better; thanks :-)

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:44):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:44):

cfallin created PR review comment:

Oh, neat, I didn't know about .copied(); new idiom noted!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:44):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:44):

cfallin created PR review comment:

Needs a constructor that takes a default and capacity, with refactors now; I was about to complain that such a thing doesn't exist on SecondaryMap but I just went to add it instead!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:44):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:44):

cfallin created PR review comment:

Ah, that's much cleaner!

I'm not sure why I had #[inline(never)] here; perhaps that's from some performance tweaking but I should have left a comment why. In the absence of such a comment, I'll delete these (it seems like it shouldn't matter); and can always add them back if something blows up in final measurements.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:44):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:44):

cfallin created PR review comment:

Fixed!

I also went ahead and converted these log::trace!'s to trace!'s, to ensure we use our custom-defined macro that conditionally compiles (I often forget this!).

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:45):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:45):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:45):

jameysharp created PR review comment:

I think the following implementation is correct. Asymptotically, it has the same time complexity, but its space complexity is O(1) instead of O(n). Whether it's faster or slower in practice is another question entirely, of course. My guess is it's about the same.

        for lp in self.loops.keys() {
            if self.loops[lp].level == LoopLevel::invalid() {
                let mut assigned = self.loops[lp].parent;
                let mut level = 0usize;
                while let Some(parent) = assigned.expand() {
                    level += 1;
                    if self.loops[parent].level != LoopLevel::invalid() {
                        level += self.loops[parent].level.0.into();
                        break;
                    }
                    assigned = self.loops[parent].parent;
                }
                let mut cur = PackedOption::from(lp);
                while cur != assigned {
                    self.loops[cur.unwrap()].level = level.into();
                    cur = self.loops[cur.unwrap()].parent;
                    level -= 1;
                }
            }
        }

In writing this, though, I realized that LoopLevel::root() can't appear in loops[lp].level after analysis is done, which means I think zero would be a better choice for an "invalid" value. This would all be a little simpler if a loop level were represented as Option<NonZeroU8>. Then you can even replace LoopLevel::inc with NonZeroU8::saturating_add, which apparently was just added in Rust 1.64.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:45):

jameysharp created PR review comment:

Okay, it's clear that these optimizations can potentially delete the only transitive data dependency on a potentially side-effecting instruction like integer division, right?

I guess the reason you're saying we don't delete those in spite of these optimizations is because they're kept even if there is no data dependency on them from any return instruction. Is that right?

In that case I think I'm satisfied. Though maybe a comment to that effect, in this source file specifically, would help somebody later.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:45):

jameysharp created PR review comment:

I weakly disagree about avoiding e.g. *inst, and weakly agree about the hazards of this particular magic. :laughing: Regardless, fair enough!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:45):

jameysharp created PR review comment:

This rematerialization test case seems like an odd example since block2 is strictly worse on any machine that has... um... at least one register, right?

It's a nice example of re-associating and constant-folding, though.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:50):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:50):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:50):

cfallin created PR review comment:

Done!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:50):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:50):

jameysharp created PR review comment:

But you're passing the, um, default default to with_default. So the existing with_capacity is fine, right?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:50):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:50):

cfallin created PR review comment:

Ah, yes, good catch; updated.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:50):

cfallin created PR review comment:

The second version reads more confusingly to me; and it seems less error-prone to retain the top-level "if old != new" check than to thread a bool through the recomputation. If you really feel strongly here I'm happy to tweak further though...

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:50):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:50):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:50):

cfallin created PR review comment:

Fixed (with worklist).

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:51):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:53):

cfallin created PR review comment:

Right, side-effecting ops are kept in the top-level "side-effect skeleton", so they are never deleted.

Do you think we need a comment specifically on these rules, or ...? The reason I ask is that strictly speaking, any optimization at all that removes a reference to an input could trigger this confusion. I guess a top-level note on what rules are allowed to do (rewrite pure ops arbitrarily, remove references...) could help though.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:53):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:55):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:55):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:55):

cfallin created PR review comment:

(Added a comment; feel free to suggest better clarifications if you have more ideas)

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:57):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:57):

cfallin created PR review comment:

Oh, huh, None is indeed the default for an Option. I guess I could have guessed that but it didn't occur to me. Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:57):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:58):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:58):

jameysharp created PR review comment:

Maybe I'm confused by the use of the word "type". In this paragraph, I read that as referring to heap/table/vmctx/other. If you meant Cranelift type, like I8/I16/etc, that would make sense. Maybe saying "value type" disambiguates enough that I wouldn't have tripped over it?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:59):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 21:59):

jameysharp created PR review comment:

Nah, I don't feel strongly about this.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 22:03):

jameysharp created PR review comment:

Somehow I assumed cranelift-wasm used stack slots, but I guess there really is no reason for it to. Huh!

Well, I don't think it matters for this PR, but perhaps has_memory_fence_semantics should still be updated to include Debugtrap? It does seem like an instruction that shouldn't have memory accesses reordered across it.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 22:03):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 22:17):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 22:17):

jameysharp created PR review comment:

I think this should be .get_or_insert(state), because we want meet_from to leave the state unchanged if this is the first time we're setting it. Or just:

                if let Some(succ_state) = succ_state.as_mut() {
                    succ_state.meet_from(&state, succ_first_inst);
                } else {
                    *succ_state = state;
                }

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 23:02):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 23:02):

cfallin created PR review comment:

Ah, yes, it is indeed a value type; I'll clarify the wording. Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 23:03):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 23:04):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 23:05):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 23:05):

cfallin created PR review comment:

Ah, yes, I agree -- updated (and actually added op if op.can_trap() as an additional match arm).

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 01:04):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 01:04):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 01:04):

jameysharp created PR review comment:

opts/store_to_load.isle doesn't exist. It looks like after writing this comment you changed things so the store-to-load optimization is called directly from egraph.rs, not from ISLE.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 01:04):

jameysharp created PR review comment:

Why is at_loop_level fallible?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 01:04):

jameysharp created PR review comment:

Why doesn't eclass_type return types found in Param or Result nodes? It seems like the only case where it ought to fail is if all nodes are Pure or Inst with multiple results. I'd also think that if any node in the class has multiple results, they all should.

If all of that's true, then I think you can match on any node in the class and determine the result without looking further.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 01:04):

jameysharp created PR review comment:

This is an option now, right?

                                    if egraph.egraph.unionfind.equiv_id_mut(store_addr, *load_addr) {

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 01:04):

jameysharp created PR review comment:

Is there an implicit precondition that egraph.subsume_ids is empty on entry to optimize_eclass?

For this sort of memory reuse, I like to call .clear() on entry to a function like optimize_eclass, instead of on exit, because that's one less precondition I have to keep track of and verify in callers.

Although, weird thought: couldn't you put a subsume bool in the IsleContext and just check that after each call to next?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 01:04):

jameysharp created PR review comment:

Nice: having can_trap imply has_memory_fence_semantics seems like a good idea. Though actually, should Debugtrap be in can_trap as well?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 01:04):

jameysharp created PR review comment:

Why does store_to_load only fire if there's a Load as the _first_ enode encountered while traversing the eclass? Is this because the node was added to the egraph immediately before this function is called?

I think that could use a comment.

Is it necessary for correctness to add the Load node to the egraph if the store gets forwarded to the load? Hypothetically, could the caller pass load_op, load_ty, etc to this function, and entirely skip constructing the Node::Load if it succeeds? I'm not asking you to change that in this PR, I just want to know if there are invariants that would break if someone did that.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 01:04):

jameysharp created PR review comment:

This is an intimidating amount of nesting for one function.

Studying it carefully, I think almost all of the branches are necessary, although I'd suggest .unwrap() on the two calls to get_node(): in both cases, the eclass is provided from a source that just added the appropriate kind of node.

The only other significant improvement I see is here: matching the load_op and store_op against appropriate InstructionImms cases can be folded into the earlier matches against Node::Load and Node::Inst, respectively.

If you make this function return Option<Id>, I believe the ? syntax works for unwrapping Options. So that could help some too, although if you use unwrap on the get_node calls I believe there's only one Option left.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 01:58):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 01:58):

cfallin created PR review comment:

Ah, I see, this is the same trick we saw in the famous single-predecessor-chain traversals :-) OK, yeah, avoiding the SmallVec might be worthwhile here. I agree that the shallow loop nests are unlikely to stress either algorithm (so e.g. pointer-chasing overhead of traversing twice is likely negligible) so on balance let's go with yours as it requires less storage.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 01:59):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:22):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:22):

jameysharp created PR review comment:

Yup, same trick! I might be over-eager to apply that trick now, but it's a pretty good trick. :grinning_face_with_smiling_eyes:

I hope my version works; I meant it to be illustrative, more than an exact solution. I typed it directly into the GitHub comment, so I didn't even know whether it would compile…

I was briefly tempted to suggest a different trick: temporarily overwrite the parent pointers during the traversal so they instead point to the child you need to return to, then put back the original parent pointers as you walk back up the stack. That gives you the stack traversal pattern in O(1) additional space. It's also too clever/confusing to justify using here, I think. :sweat_smile: I believe I first learned about it in the context of some garbage collector, where it makes more sense.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:23):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:23):

cfallin created PR review comment:

Right, it's testing specifically that remat is preferred even when GVN would do the opposite naturally. It looks silly here where register pressure is low, but it happened to help significantly with SpiderMonkey, so...

(The case where it's most interesting is when there are a whole bunch of values like x+1, x+2, ... rematerialized inside a loop or otherwise deeper into a function, and when these all would otherwise be hoisted by LICM or GVN; then we have just one register x live instead of all of the values)

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:27):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:27):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:28):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:32):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:33):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:33):

cfallin created PR review comment:

It almost worked! The last level -= 1 would underflow before we take the backedge and hit a loop-exiting condition; reconfigured things a bit to break before that.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:37):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:37):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:37):

cfallin created PR review comment:

Ah, right; I still had the Top default state in my mind and had figured that meeting with an initial Top was harmless; but here the default is Entry so we need to special-case the first set.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:56):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:56):

cfallin created PR review comment:

No reason at all -- updated!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:56):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:56):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:56):

cfallin created PR review comment:

Done!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:56):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:56):

cfallin created PR review comment:

The optimization entrypoints are invoked for every new eclass node created (i.e., every time an enode is added to an eclass), so we only need to (and in fact should only, to avoid redundancy) examine just the latest enode in the eclass. I'll add a comment regarding this!

Is it necessary for correctness to add the Load node to the egraph if the store gets forwarded to the load? Hypothetically, could the caller pass load_op, load_ty, etc to this function, and entirely skip constructing the Node::Load if it succeeds? I'm not asking you to change that in this PR, I just want to know if there are invariants that would break if someone did that.

I think that's technically possible, yeah, although there's one important case it would pessimize (in compile time, not code quality): having the Load node in the egraph as well lets other later Loads (with the same memory state) deduplicate with it. If we did not add the Load node to the egraph at all, then later Loads would eventually still end up with the forwarded stored value if the original did, because store_to_load is deterministic; but we would redo the work each time to determine that.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:56):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:56):

cfallin created PR review comment:

That's a great question! I think I simply did not hit these cases while experimenting with this extractor with rewrite rules, but they absolutely should exist.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:56):

cfallin created PR review comment:

Actually, thinking about this has made me realize why both this line (and the subsume bool also) don't work: reentrancy of the rewriting! From within the call to simplify we can create a new node and enter optimize_eclass recursively; I think both a clear-at-beginning and clear-at-end potentially prematurely clear subsume directives that should be seen by an outer invocation. The most precise memory-reuse approach is probably to clear it when exiting the top level (based on the reentrancy limiter / depth counter) but in order to make reasoning about this easier I think we can probably just let it live for the duration of the egraph (function) rewrite.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:56):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:56):

cfallin created PR review comment:

Ah, I had missed that, thanks! (Yeah, I tried to write store-to-load forwarding as a rule at first, but it turned out to be too... bespoke?... to build a bunch of extractors and special support for side-effecting insts in the general rewrite framework to reason about loads and stores.)

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 04:56):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 05:05):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 05:05):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 05:05):

cfallin created PR review comment:

I golfed this a bit more, starting from your suggestions (thanks!); I combined the three conditionals (type, offset, address) into one as well. Much better now!

I'm still noodling over the idea of making this an ISLE rule instead; maybe eventually it will be better that way, when we have more infrastructure to support non-pure ops, but I think I want to approach that idea incrementally / as a followup.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 06:47):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 06:47):

jameysharp created PR review comment:

I think that means I got the edge cases wrong a different way: the smallest level it should assign is 1, so decrementing it an extra time in the last loop iteration should never underflow.

Also, I see you changed my usize to a u8 so you could directly construct a LoopLevel. I'd imagined a From impl that would saturate when converting from a usize that was too large. But this is fine too, at least in debug builds where overflow panics. Loop depth comes from untrusted input so I think that probably should be checked in release builds too, right? Restricting to 255-deep nested loops seems reasonable to me, anyway.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 15:08):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 15:08):

fitzgen created PR review comment:

    /// Last-store instruction (or none) for a given load. Use a hash map
    /// instead of a `SecondaryMap` because this is sparse.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 15:08):

fitzgen created PR review comment:

If we are computing this for ~every instruction, then a SecondaryMap should perform better here.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 15:08):

fitzgen created PR review comment:

Is there a reason we aren't using std::ops::Range here?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 15:08):

fitzgen created PR review comment:

Is always really needed over just #[inline]? In excess, always tends to slow down debug builds.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 15:08):

fitzgen created PR review comment:

Ah okay I see that this is just for load instructions.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 15:08):

fitzgen created PR review comment:

Are these meant to happen before we land this PR or in a follow up? Because IIUC right now if egraphs are enabled then we don't run simple preopt which means that we won't do either of these optimizations at all.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 15:08):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 15:08):

fitzgen created PR review comment:

Default is in the std prelude so the namespace shouldn't need to be fully elaborated here.

impl Default for AnalysisValue {

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 15:26):

jameysharp created PR review comment:

I think there should be a stat for max rewrite_depth, or for number of times REWRITE_LIMIT was exceeded, or both.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 15:26):

jameysharp created PR review comment:

I think letting subsume_ids live for the duration of the function rewriting is a fine idea; as you say, it's simplest to reason about.

That said: I think putting a bool in the IsleContext is actually reentrancy-safe, because there's a separate instance of that context for every call to optimize_eclass. They all share the same reference to the egraph, which is why a shared HashSet there is _not_ safe, but we can add things that are local to each rewrite.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 15:26):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 15:29):

jameysharp created PR review comment:

I'm not Chris, but I think they're intended for a follow-up. Compiling with egraphs is disabled by default for now, so it doesn't have to have every optimization yet. They're probably necessary before egraphs can be the default though, I'd think.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 15:29):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 16:46):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 16:46):

cfallin created PR review comment:

I'm not sure; it's pre-existing though (this definition was copied from the ISLE definition preludes for the backends). That change could be made across all backends if we want to look into it after this PR, maybe?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 16:46):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 16:46):

cfallin created PR review comment:

This one is actually load-bearing IIRC, from performance tuning!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 17:40):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 17:40):

cfallin created PR review comment:

Yep, I'd prefer to get this in-tree and then continue adding to the body of opts, rather than holding back even longer.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 18:19):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 18:20):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 18:21):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 18:21):

cfallin created PR review comment:

Ah, that's a good point; I had been imagining a bool flag on the egraph itself rather than the per-ISLE-invocation context. Happy to revisit this (and actually the whole reentrancy mechanism maybe) in followup work...

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 18:23):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 18:23):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 18:23):

cfallin created PR review comment:

Done (the latter)!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 18:50):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 19:57):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:00):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:00):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:00):

jameysharp created PR review comment:

How about replacing NodeCtx::with_capacity with a NodeCtx::for_dfg, and passing in &func.dfg? Right now the two usize arguments are easy to mix up. Even looking at the only caller, I can't immediately tell if func.dfg.num_values() is the right argument for types and func.dfg.value_lists.capacity() is the right argument for args.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:00):

jameysharp created PR review comment:

As far as I can tell, this Default impl for NodeCtx is unused. But if it were needed, it should be derivable.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:00):

jameysharp created PR review comment:

All the uses of NodeKey::node, like this one, would be a little nicer if type inference just worked. I think there are several ways to do that.

One is if, instead of using L: Language as the type parameter and using L::Node, we just let the node type N be the type parameter, without constraining it to come from any language. Callers already pass egraph.nodes, which uniquely determines the node type, just not the language type.

Another is to have that function take a reference to the EGraph<L, A>, and access the nodes field itself. That's a little tedious for NodeKeyCtx which would need to borrow the whole egraph instead of just the nodes, but that probably works.

The last option would be to give NodeKey and EClass a phantom type parameter for the language they come from, and thread that type parameter through everywhere. I think that's appealing in the long run for ensuring that these node and class IDs can't be misused, but I wouldn't want to do it in this PR.

I think either of the first two would be nice to do now.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:00):

jameysharp created PR review comment:

Nit: I'd prefer not to hard-code the type here so we could fiddle with it by just changing the definition of Node::Param.

Also, I recently noticed there's guidance on error message style in the Rust docs. I'm not entirely sure how to apply that advice here, but I intend to think about it more when I'm writing these sorts of things.

So maybe:

                            index: i.try_into().expect("blockparam index should fit in Node::Param"),

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:00):

jameysharp created PR review comment:

I was curious, so I checked: ops::Range has a derived implementation of Default, so 0..0 is what you'd get from SecondaryMap::new.

I keep harping on this because I kind of want to remove the SecondaryMap::with_default constructor. I believe every existing use either is redundant with an existing implementation of Default or would be natural to implement.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:00):

jameysharp created PR review comment:

I'd prefer to avoid the unwrap here by explicitly matching on the InstructionData:

                let is_readonly_load = match func.dfg[inst] {
                    InstructionData::Load {
                        opcode: Opcode::Load,
                        flags,
                        ..
                    } => flags.readonly() && flags.notrap(),
                    _ => false,
                };

I'm guessing that, unlike in the alias analysis, here it really isn't important that the opcode be exactly Opcode::Load. Sign- or zero-extending loads and vector loads should also be okay to treat as pure if they're both readonly and notrap, right?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:04):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:07):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:07):

cfallin created PR review comment:

I'm guessing that, unlike in the alias analysis, here it really isn't important that the opcode be exactly Opcode::Load. Sign- or zero-extending loads and vector loads should also be okay to treat as pure if they're both readonly and notrap, right?

In theory yeah, that's absolutely true. However making that change is a little more than op.can_load() because we want exactly those loads that are "normal" (take a value from memory with no special semantics or other side-effects), and .can_load() is a much broader predicate that includes e.g. atomics and CAS; and there's no "normal load but maybe extending" predicate. So I wanted to avoid untested, uncertain complexity.

Happy to address this in a followup if we find cases where it's important (or just in general), though!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:08):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:08):

cfallin created PR review comment:

(The suggestion is great though, applying that; thanks!)

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:08):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:10):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:18):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:18):

cfallin created PR review comment:

Indeed, that works!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:18):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:18):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:18):

cfallin created PR review comment:

Ah! I had fought with this for a bit when I first wrote it to try to get the type-inference to work; it never occurred to me to just parameterize directly on the node type. That indeed works. Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:18):

cfallin created PR review comment:

I like that; named it with_capacity_for_dfg to make its semantics clearer (just for_dfg might imply that it somehow translates the contents of the DFG or something like that).

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:18):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:18):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 20:18):

cfallin created PR review comment:

Yep, verified it's not needed; deleted!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 21:01):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 21:01):

jameysharp created PR review comment:

I believe this unwrap() will panic on StackLoad instructions.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 21:01):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 21:01):

jameysharp created PR review comment:

Assuming the post-processing of the new eclass ID doesn't use this side_effect flag, instead checking whether the added node was Node::Pure or not... Maybe this use of the flag can be replaced like so, and the definition of the flag above can be deleted:

                } else if has_side_effect(func, inst) || opcode.can_load() {

The flag is currently the disjunction of three cases:

  1. Does this instruction have side effects according to has_side_effect?
  2. Is this a non-readonly load?
  3. Can this instruction store?

Case 3 is already covered by case 1: opcode.can_store() implies has_side_effect. So that case should be deleted regardless.

<details>
<summary>The relationship between case 1 and 2 is confusing though.</summary>

If an instruction:
- can_load, and
- it is not a StackLoad, and
- it is either not the Load instruction format, or it does not have the notrap flag set,
then has_side_effect is implied.

At this else, we've already ruled out the possibility that the instruction is a load with both readonly and notrap flags set. So there are two remaining cases where case 2 is not implied by case 1:
- it could be a Load-format instruction with notrap but without readonly
- it could be a StackLoad

In those cases, this PR currently uses Node::Inst, but if we just use has_side_effect, those cases would use Node::Pure instead.

</details>

I think it makes sense that all loads need their order and execution preserved unless they fall into one of the two earlier special cases:
- either they're in read-only memory (so they can be re-ordered past stores) and they can't trap (so they can be deleted if unused),
- or alias analysis proves that it's okay to delete or forward a load.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 21:01):

jameysharp created PR review comment:

I'm not suggesting using op.can_load(). The version I proposed here checks that the instruction uses the Load instruction format, which excludes all the atomics and other magic loads. So that actually is a "normal load but maybe extending" predicate, as far as I can tell. I think you just need to delete the opcode: Opcode::Load pattern if you want to cover the other cases.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 21:01):

jameysharp created PR review comment:

I'm having trouble convincing myself that these conditions agree with corresponding conditionals above. So, let's say after the let node = ... block above, there's something like:

let is_pure = matches!(&node, Node::Pure { .. });

And assume id is mut above. Then this block could look something like the following. I'm choosing to call store_to_load on all new enodes: that function already has to guard against being handed an eclass ID that is not a load, so this doesn't affect correctness. It might affect performance, but that's not clear to me, and it's hard to be precise here without duplicating big parts of store_to_load.

                if let NewOrExisting::New(new_id) = id {
                    // Loads: do store-to-load forwarding.
                    let opt_id = crate::opts::store_to_load(new_id, self);
                    trace!("store_to_load: {} -> {}", new_id, opt_id);
                    if opt_id != new_id {
                        id = NewOrExisting::Existing(opt_id);
                    }
                }
                let id = match id {
                    NewOrExisting::Existing(id) => id,
                    NewOrExisting::New(id) if is_pure => {
                        // Apply all optimization rules immediately; the
                        // aegraph (acyclic egraph) works best when we do
                        // this so all uses pick up the eclass with all
                        // possible enodes.
                        crate::opts::optimize_eclass(id, self)
                    }
                    NewOrExisting::New(id) => {
                        self.side_effect_ids.push(id);
                        self.stats.side_effect_nodes += 1;
                        id
                    }
                };

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 21:17):

cfallin created PR review comment:

Unfortunately I think this will be a meaningful performance regression -- I remember specifically restricting store-to-load logic to run only on loads while tuning.

Happy to add comments here to clarify as needed -- any thoughts (while keeping the structure the same)?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 21:17):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 21:23):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 21:24):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 21:24):

cfallin created PR review comment:

Fixed!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 21:32):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 21:32):

cfallin created PR review comment:

I'm afraid I'm not completely following this suggestion, sorry. If I understand correctly, you're proposing a change, then going through a (fairly complex) case analysis to argue why it's correct? Or is the concern that the code as-written is incorrect? (I agree that the can_store() check is redundant and I've removed it, thanks!)

I think that at the point we are looking at the definition of predicates and their exhaustive cases, and effectively "inlining" that in our reasoning, we are doing something wrong here: we're writing more brittle code that has less modularity. E.g. I'd prefer not to rely on the intersection of specific load flags in two different files to reason about removing a condition here.

I think that as-written, the conditions are not too bad to understand from first principles: we want to handle the four categories of (i) readonly loads, (ii) all other loads, (iii) all side-effecting ops, and (iv) all pure ops. I like that there are boolean flags that indicate each of these (except the implicit Some-ness of mem_state for loads; I can add is_load if you'd prefer there). I don't like inlining e.g. a || can_load condition in this if-ladder: it makes the reasoning a lot muddier to me.

It's possible I've missed the whole point of this comment though; please do let me know if so!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 21:32):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 22:59):

cfallin created PR review comment:

OK, talked with @jameysharp offline about this one and I think we can get around the perf regression with another conditional; and the rest of this is actually nice and clean and I like it better. So I'll accept this then tweak it a bit.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 22:59):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 22:59):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 23:02):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 23:06):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 23:06):

cfallin created PR review comment:

OK, talked with @jameysharp about this one too and it makes more sense now -- accepting suggestion. Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 23:06):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 23:11):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 23:24):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 23:30):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:03):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:03):

jameysharp created PR review comment:

Alternatively:

                            .fold(Cost::zero(), Cost::add);

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:03):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:03):

jameysharp created PR review comment:

This return value isn't used in the only caller of this function (elaborate_block). Given that there's already a debug_assert, how about this? Instead of a conditional branch to check the condition, just write a 0 to the Vec's length field:

        self.elab_result_stack.clear();

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:03):

jameysharp created PR review comment:

There's an Opcode::is_terminator which is true for some of the same cases as is_term, but not all the varieties of branches. So I think this is a confusing name for this variable.

Also, is_terminator includes Trap, which this condition does not. Should it?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:03):

jameysharp created PR review comment:

You can make result_tys a slice in both cases and just loop over it later without having to re-check which case you're in:

        let (instdata, result_tys) = match node {
            Node::Pure { op, types, .. } | Node::Inst { op, types, .. } => (
                op.with_args(args, &mut self.func.dfg.value_lists),
                types.as_slice(&self.node_ctx.types),
            ),
            Node::Load { op, ty, .. } => (
                op.with_args(args, &mut self.func.dfg.value_lists),
                std::slice::from_ref(ty),
            ),

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:03):

jameysharp created PR review comment:

Out of curiosity: since block params are unique, each such enode _should_ be absent at this point, right?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:08):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:09):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:10):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:12):

cfallin created PR review comment:

Ah, here we are actually returning the result, so unfortunately we can't do that...

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:12):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:12):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:13):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:13):

cfallin created PR review comment:

Err, apparently I'm reading too fast because I see you noted this above and further noted it isn't used :-) Yes, actually, this is a nice cleanup, thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:14):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:23):

cfallin updated PR #4953 from mid-end to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:23):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:23):

cfallin created PR review comment:

This is different than is_terminator I think as it needs to cover conditional branches as well.

The mechanism actually has a pretty interesting and subtle reason for existing, which I've added in a longer comment here -- I had forgotten as well until I rediscovered it via test failures :-)

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:25):

cfallin created PR review comment:

Yep, but the only API we have is insert_if_absent, not insert_if_absent_and_assert_otherwise :-)

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:25):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 00:28):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 01:15):

cfallin merged PR #4953.


Last updated: Nov 22 2024 at 16:03 UTC