cfallin edited PR #4953 from mid-end
to main
:
This PR is a draft of an updated version of the egraph patch (and thus supersedes #4249) with the two parts already merged (multi-etors and the egraph crate proper) removed; it includes the Cranelift integration, the egraph build (CLIF to egraph) and elaboration (egraph to CLIF) algorithms, and rule application engine, as well as a set of rewrite rules that replaces the existing mid-end optimizations.
It still needs a bit more productionizing:
- removal of recursion in elaboration;
- removal of recursion in rule application (This one is trickier! Immediate rule application on the sub-nodes created from constructors means more than one ISLE invocation can be on the stack, in a reentrant way. My thought is to use a sort of workqueue to "unstack" it.);
- generalization of the several ad-hoc egraph analyses (loop depth, etc) into a framework.
The purpose of this draft PR is to be a place to do this work on a rebased and up-to-date basis. (Lots happened since the original egraph work branched off in May, including incremental compilation and a good number of smaller changes.)
While patch-wrangling this week, I tried pulling this apart into smaller pieces, but the remaining bits are pretty cyclically entangled, and/or some of the intermediate points that might make sense (e.g. egraph build and elaboration without rule application) require re-synthesizing some scaffolding that would then disappear in the final state, so that seems a bit counterproductive. Once we have a polished state I can try pulling it apart into separate logical commits at least.
cfallin updated PR #4953 from mid-end
to main
.
cfallin updated PR #4953 from mid-end
to main
.
cfallin updated PR #4953 from mid-end
to main
.
cfallin updated PR #4953 from mid-end
to main
.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
I'm a little confused about this
optimized
flag.I guess I'd like to at least see a comment about why
fn optimize
ispub
, perhaps similar to the one oncompile_stencil
.If
fn optimize
were not public, would it still be necessary to guard against it being called multiple times? Under what circumstances would it get called again?Should the "already optimized" check be at the beginning of
optimize
instead of incompile_stencil
?I wonder, would it make sense to set
self.optimized
at the beginning offn optimize
? If an error occurs, I wouldn't expect the optimizer would work any better if you run it a second time. Maybe after failed optimization one might want to retry compilation without optimization?
jameysharp created PR review comment:
Why is it necessary to re-run all of these passes after three of the four have just been run? Could you let the earlier DCE pass run if
isa.flags().use_egraphs() || opt_level != OptLevel::None
and then not need any of these?
jameysharp created PR review comment:
Is "at the cost of a longer compile time" still true?
I think the main "cost" of turning this flag on right now is that there's a lot of room for bugs still.
jameysharp created PR review comment:
I'm concerned about marking
Ieee32
andIeee64
asOrd
. TheirPartialOrd
implementation really is partial, since it's based on IEEE754 ordering. We might need to split these into two versions: one that respects IEEE754 except for preserving exact NaN representations, and another where ordering and equality are defined entirely by bitwise representation. I had a really hard time balancing those two competing goals recently.
jameysharp created PR review comment:
I really appreciate you making sure that
git grep 'enum InstructionData'
finds the right thing!
jameysharp created PR review comment:
Now you have cranelift-egraph in this list in two places. I think that would only be necessary if there's a cyclic dependency between crates, and I think Cargo would refuse to build that... right?
cfallin submitted PR review.
cfallin created PR review comment:
Different list actually! There is the
PUBLIC_CRATES
list (the line here) and theCRATES_TO_PUBLISH
list and they are different for reasons I don't fully grok, but the crate needs to be in both for the publish script to be happy.
jameysharp created PR review comment:
Ohhh, that makes sense.
jameysharp submitted PR review.
cfallin updated PR #4953 from mid-end
to main
.
cfallin created PR review comment:
Good point -- fixed!
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Yes, this was an attempt to make the API "smarter" in some sense, but I agree that the don't-re-optimize behavior is in the end just unnecessary complexity. I've marked
optimize()
as "only for testing" in the doc-comment and removed theoptimized
flag.
cfallin created PR review comment:
Removed! This was needed for an earlier egraph implementation but no longer used, so I've removed all of the now-extraneous
Ord
derivations on IR types.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Yep, that bit of text is outdated! Updated to "is currently considered experimental" instead.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
I'd have guessed that if you're handed a
DominatorTree
that says this block has an immediate dominator, then it would be an error if that dominating instruction is not anywhere in the layout. Is there a situation where that's actually expected? If not, I think I'd prefer anunwrap
on the call toinst_block
.BTW this module's use of
Default
makes me very happy, so thank you. :grin: Also I quite like the simplicity and elegance of the implementation.
jameysharp created PR review comment:
I think this should pass the borrow-checker; what do you think of it?
self.block.expand().map(|block| { self.block = self.domtree.nodes[block].next; block })
Otherwise I'd at least prefer this version, as I have a preference for avoiding calls to
unwrap
if it's easy to prove they'll never fail:if let Some(block) = self.block.expand() { self.block = self.domtree.nodes[block].next; Some(block) } else { None }
jameysharp created PR review comment:
The equivalent function on
Vec
is calleddedup
; would you mind renaming it to match that?
jameysharp created PR review comment:
Just to check, does
SourceLoc
needOrd
for something or was that also left over from earlier versions?
jameysharp created PR review comment:
Would you mind double-checking for me that you need
Analysis::Value
to implement all three ofDebug
,Clone
, andDefault
? I naively wouldn't have expectedDefault
to be necessary since initial analysis results should come fromAnalysis::for_node
.I assume both
Default
andClone
are because of the requirements forSecondaryMap
.Is it useful to decide at runtime whether to do analysis or not? If you know statically whether you'll need an analysis, then I'd prefer to have a trivial implementation of
Analysis
on()
(or perhaps onPhantomData<L>
, if necessary) where theValue
associated type is also()
. That way you can store bothA
andA::Value
without wrapping them inOption
, and they won't occupy any memory because they're zero-sized types.Then you could declare
EGraph::nodes
as a vector of(L::Node, A::Value)
, so that it's clear at the type level that if a node has been added, its analysis result exists too. And then you shouldn't needDefault
orClone
.
jameysharp created PR review comment:
Nit: missing space in
"other".Any
cfallin submitted PR review.
cfallin created PR review comment:
Would you mind double-checking for me that you need Analysis::Value to implement all three of Debug, Clone, and Default? I naively wouldn't have expected Default to be necessary since initial analysis results should come from Analysis::for_node.
I assume both Default and Clone are because of the requirements for SecondaryMap.
Indeed, they both come from
SecondaryMap
's constraints on its item typeV
; verified just now by attempting to remove them.It turns out
Debug
was only needed for, well, my debugging (somelog
s I had in place); those are no longer present so I've removed that constraint. We can always put it back if we want to check in some persistentlog
s, I think.Then you could declare EGraph::nodes as a vector of (L::Node, A::Value), so that it's clear at the type level that if a node has been added, its analysis result exists too. And then you shouldn't need Default or Clone.
There is a nice appeal to this for sure! I actually had built the thing this way originally. I'm hesitant to go back to it chiefly because it makes the API for
for_node
more awkward: we need to provide it access to the analysis values for prior nodes (so it can depend on e.g. arguments' analysis values) and I don't like the idea of exposing internal data-structure details, like a tuple-of-node-and-value. Unfortunately we can't parameterize it on the "getter" because we can't provide a function trait-constrained type parameter to the trait method. And we can't just pass in the wholeEGraph
because theEGraph
is parameterized on the particularAnalysis
and if I try to useSelf
there, I run into issues with unsized types (??). Basically, it becomes a type-system mess. So I'd kind of prefer to at least run with what we have for now and maybe explore more advanced solutions here later, if possible :-)
cfallin updated PR #4953 from mid-end
to main
.
cfallin created PR review comment:
Nope, removed!
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Fixed!
cfallin submitted PR review.
cfallin created PR review comment:
Much better, thanks!
cfallin submitted PR review.
cfallin created PR review comment:
Yep, I've added an
expect
instead; we shouldn't have dangling instructions at this point. Thanks!
cfallin submitted PR review.
cfallin created PR review comment:
Actually it turns out this was completely unused, as was
sort
above it (they were left over from an earlier approach that stored ... parent pointers, I think? ... in EntityLists, and used sort/dedup to do merging -- parent pointers are gone now!). Removed this part of the diff.
cfallin updated PR #4953 from mid-end
to main
.
jameysharp created PR review comment:
I just had a go at doing this, and while I think it's feasible (with some
?Sized
constraints to allow unsized types if needed), the current implementation is tangled up enough in assumptions about analysis working this way that I don't want to try harder right now. So yes, let's come back to this later. :laughing:
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
This confused me a moment because I mixed up
Inst
withInstructionData
, and I wondered why you were storing the whole instruction. I think it was the use of the phrase, "The original instruction", that threw me.
jameysharp created PR review comment:
I think I'd put
id_eq
andhash_id
onUnionFind
instead of onNodeCtx
since they don't use anything from the latter. It's only the plural versions (ids_eq
,hash_ids
) that needself.args
. Not important, just a thought.
jameysharp created PR review comment:
I'm not a huge fan of the pattern of stripping off a layer of reference by sticking an
&
at the beginning of a pattern, and then putting it back with theref
keyword. I rather likeNode::Pure { op, args, types }
here, for instance. But I don't feel anywhere near strong enough to, y'know, fight about it or whatever.
jameysharp created PR review comment:
I appreciated the comment above that
types
are fully determined byop
andargs
. I think it'd be nice if there's a corresponding comment here explaining why that isn't also true for Load ops. I assume it's that the type is the only thing indicating the width of the load; multiple loads with the same address can have different widths and must not be deduplicated.
jameysharp created PR review comment:
I don't see the same sort of guard here in
at_level
, for saturating at less than "infinity", that I see in the implementation foradd
. Is that okay?
jameysharp submitted PR review.
jameysharp created PR review comment:
I think this map/filter/map is equivalent to a single
.filter(|&lp| self.loop_header(lp) == block)
… isn't it?
jameysharp created PR review comment:
I think this rule should be deleted: you have below that x^x is zero, which is correct, but this rule says it's also equal to x.
I also was surprised to not find rules for multiplication or bitwise-and by zero. Would that be unsound because it could delete integer divisions that might otherwise trap? If so I'm curious if there's an easy way to preserve the traps while otherwise deleting all the arithmetic in subexpressions whose results get annihilated.
But also, then, is the x^x=0 rule okay? Or other rules like x^not(x), where the right-hand side is a constant?
jameysharp created PR review comment:
I'm tempted to write this as
LoopLevel::root().inc()
instead of enshrining the knowledge that the result is 1.More broadly, you could write this without a stack, though I'm not sure if the result is better. Do one loop to chase parents to the root or the first loop that's already been assigned. Count how many iterations that loop did and add to the level of the upper-most loop reached. Then start over at
lp
and chase the parent pointers again to fill in the loop depths for all the loops visited the first time.I think that might be a little easier for me to understand than the pseudo-recursion. Maybe?
jameysharp created PR review comment:
I guess this function does something reasonable even if self equals other: it doubles the list, right? I was thinking there should be an assertion that the two lists are different but I guess that isn't necessary.
I would be inclined to use
copy_within
rather than a hand-rolled loop full of bounds checks and unwraps. You can actually guarantee the copy is non-overlapping, but I don't think it's worth using an unsafe block here for whatever gains that brings.
jameysharp submitted PR review.
jameysharp created PR review comment:
How do you feel about using map_or instead of map+unwrap_or?
Also about maybe using innermost_loop, like in is_loop_header.
jameysharp created PR review comment:
I'm curious, why pick 0x80 as the invalid level, instead of 0xFF?
cfallin updated PR #4953 from mid-end
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Updated comment, thanks!
cfallin submitted PR review.
cfallin created PR review comment:
Good idea, done!
cfallin submitted PR review.
cfallin created PR review comment:
So in isolation I'd definitely agree! However I tend to like the explicit match-the-borrow form in other cases, so we don't get
Copy
types that nonetheless have to be dereferenced. E.g. elsewhere there is a&Node::Inst { inst, .. }
-- if we instead didNode::Inst { inst, .. }
then we'd need*inst
everywhere. Here I'm trying to remain consistent with that form elsewhere.(I also think that really confusing errors can arise from this bit of automatic/ergonomic magic and I tend to fall back to pre-2018 Rust out of habit as a result, but that's another discussion :-) )
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin created PR review comment:
Good point, fixed! I added the
finite()
helper here for this.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Actually this was another addition from earlier work that is no longer used so I've gone ahead and deleted it. Sorry about that!
cfallin submitted PR review.
cfallin created PR review comment:
Good catch --
x^x
does not actually equalx
at all, except in one rare case :-)Multiplication by zero
Added!
bitwise-and by zero
Added!
cfallin submitted PR review.
cfallin created PR review comment:
Goodness, yes, I don't know why I wrote it that way!
cfallin submitted PR review.
cfallin created PR review comment:
Ah, nice, I didn't know about
map_or
! Done (and using the helper).
cfallin submitted PR review.
cfallin created PR review comment:
Yep, I agree that using
inc()
once we've defined it is cleaner here.Re: alternative algorithm -- maybe, but the details aren't completely clear to me; happy to take a followup PR (or a suggestion here) if you can sketch it out? Although if I'm understanding what you've proposed correctly, it sounds like it may have quadratic worst-case (the algorithm here sets the level on all loops it visits as it comes back down the stack).
cfallin created PR review comment:
To be honest, I have no idea -- lost to the mists of time! I've reworked this to use a sentinel of
0xff
and hopefully be a little clearer.
cfallin submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
Would you add comments explaining what each of these states mean? In particular, the module comment doesn't talk about the before/after instruction states. After reading the whole module, I still don't understand exactly what invariant these states are supporting.
When I saw "Top", I assumed that would correspond to the "unknown" state described in the module comment, so I was a little surprised it wasn't called "Bottom". But apparently the "unknown" state is actually
BeforeInst
, judging by the implementation ofmeet_from
. I'd suggest renaming "Top" to something more descriptive and less lattice-y.Better yet, remove "Top" entirely. Use
Option<LastStores>
duringcompute_block_input_states
, and.unwrap()
it at the beginning of the loop, because before any block is added to the queue, it first has its state set to something else. Andcompute_load_last_stores
can just skip blocks where theblock_input
isNone
, because those blocks are unreachable.I'd also use a derived implementation of
Default
: just add the#[default]
attribute to whichever variant you want. If you remove "Top", then you could use "Entry" as the default variant, and remove theLastStores::entry
method in favor ofLastStores::default
.
jameysharp created PR review comment:
I see three opcodes for which
can_store
is true that don't have aMemFlags
field:Debugtrap
,DynamicStackStore
, andStackStore
. I can understand treatingDebugtrap
like a memory fence, but shouldn't stack stores just be categorized as "other"? I assume that's what a load whose address comes from aStackAddr
op would be, anyway.For that matter, should
has_memory_fence_semantics
be updated to return true forDebugtrap
? Then maybe we could treat as "other" any instruction that 1) is not a memory fence, 2) can store, and 3) has no memflags.I guess there's currently no must-alias analysis for stack loads and stores. I assume that would be useful for future work, right?
jameysharp created PR review comment:
compute_block_input_states
never usesself
, andcompute_load_last_stores
only usesself
to write into theload_mem_state
map. If you move both functions to top-level, drop theirself
parameters, and makecompute_load_last_stores
return a new map, then you can write this function this way, which I think is more clear:let block_input = compute_block_input_states(func, cfg); let load_mem_state = compute_load_last_stores(func, block_input); AliasAnalysis { load_mem_state, }
Also: why are those two functions marked
#[inline(never)]
?
jameysharp created PR review comment:
I'd like a comment here on why this is currently limited to the
Load
opcode specifically, rather than any instruction with that format. I assume it's something about store-to-load forwarding requiring that the load and store access the same memory width, and also have the same Value type, so that the value passed to the store can be substituted for the load. (I imagine future work could include replacing widening loads with sign- or zero-extend instructions given a suitable narrow store.) But it's not obvious to me how this interacts with redundant-load elimination.This is the only call to
get_load_input_state
. It looks up theInstructionData
twice: once to check that its opcode isLoad
, and a second time inget_load_input_state
to dig out the memflags.I'd rather see something like this, with corresponding changes to
get_load_input_state
. The main reason I'd prefer this is that it statically proves thatmem_state
can never beMemoryState::AfterInst(inst)
, which seems like it would be non-sensical. (The load sees the state after itself?)if let InstructionData::Load { opcode: Opcode::Load, flags, .. } = func.dfg[inst] { let mem_state = state.get_load_input_state(flags);
One step further would be to replace
get_load_input_state
with a function likefn for_flags(&mut self, flags: MemFlags) -> &mut MemoryState
, which can be shared between here andLastStores::update
. The use here doesn't need mutability and it's generally bad style to get a mutable borrow you'll only read from. Butstate
has to be mutable anyway to update it below, and this is a fairly small source file, so I think it's worth just eliminating the duplication.
jameysharp created PR review comment:
Does this work?
let succ_first_inst = func.layout.first_inst(succ).unwrap();
jameysharp submitted PR review.
jameysharp created PR review comment:
Nit: I prefer to use
.copied()
instead of.cloned()
for types that areCopy
. That serves as documentation, checked at compile-time, that we expected the copy to be cheap.There are several other calls to
.clone()
in this module that I'd prefer to remove for the same reason. They're all on eitherMemoryState
orLastStores
, which are bothCopy
.
jameysharp created PR review comment:
I think this should work just as well:
let mut block_input = SecondaryMap::with_capacity(func.dfg.num_blocks());
jameysharp created PR review comment:
After the end of this loop it'd be nice if
state
is not mutable, so the compiler will catch any bugs that would be caused by using different states in different successor blocks.We could use
Iterator::fold
for that:let state = func.layout.block_insts(block).fold(state, |mut state, inst| { state.update(func, inst); log::trace!("after {}: state is {:?}", inst, state); state });
Or just
let state = state;
after the loop.
jameysharp created PR review comment:
Nit: this
queue
is being used as a stack. I liketodo
or perhapsworklist
as names that don't have quite the same data structure connotations.
jameysharp created PR review comment:
I'm not sure this comment is still accurate. This alias analysis does do some sort of fixpoint analysis, and doesn't check dominance anywhere.
jameysharp created PR review comment:
Should this say something more like:
if at a given load or store, we look backward to the "last store" of the same type, *AND* we find that it has exactly the same address expression,
As it stands, it sounds like we can't conclude must-alias if there's been an intervening store of a different type.
jameysharp created PR review comment:
Nit: I believe the
Display
impl onInst
is equivalent to formatting "inst{}" withinst.index()
:"alias analysis: scanning at {} with state {:?} ({:?})", inst,
There are other
trace!
invocations in this module (and maybe elsewhere?) with the same pattern.Block
also formats with "block{}", for example. This applies to any type that uses the two-argument version of theentity_impl!
macro, as almost everything incranelift/
does. (Though nothing incrates/
does.)
jameysharp created PR review comment:
If you get rid of
MemoryState::Top
it becomes pretty straightforward to havemeet_from
return whether any of the states changed, which could simplifycompute_block_input_states
:fn update(&mut self, other: &LastStores, loc: Inst) -> bool { let meet = |a: &mut MemoryState, b: MemoryState| -> MemoryState { if *a == b { false } else { *a = MemoryState::BeforeInst(loc); true } }; meet(&mut self.heap, other.heap) | meet(&mut self.table, other.table) | meet(&mut self.vmctx, other.vmctx) | meet(&mut self.other, other.other) }
jameysharp submitted PR review.
jameysharp created PR review comment:
Wait, so it _is_ okay to delete instructions that might trap? Where would I find something describing whether Cranelift is allowed to do optimizations like that?
If so you could add
x|not(x)=-1
andx¬(x)=0
, I guess.
cfallin submitted PR review.
cfallin created PR review comment:
Oh, no, we definitely don't delete any side-effecting instructions (aside from redundant loads but that's separate). I'm not sureI understand why including
x*0 == 0
orx&0 == 0
has any effect on that?
cfallin edited PR review comment.
cfallin updated PR #4953 from mid-end
to main
.
cfallin updated PR #4953 from mid-end
to main
.
cfallin created PR review comment:
I think that updated version is actually subtly mismatching how the analysis works, and the version in the code is correct: the "last store" analysis doesn't consider store types, so we really are looking a the "last store" to the category, regardless of type or address. Then we compare address and type ("it has exactly the same address expression and type") and forward only if those match.
cfallin submitted PR review.
cfallin created PR review comment:
And specifically "an intervening store of a different type" would be caught by the logical condition described in the code, but not the modified text above.
cfallin submitted PR review.
cfallin updated PR #4953 from mid-end
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Right, it's not fully a meet-lattice (or a proper "meet" binary operator), because meet takes a location and generates a different "bottom" value for each location.
Or more properly, it is a family of meet semilattices, indexed on location; each location actually meets over a slightly different semilattice with its own private "bottom".
All that's a fancy and fairly useless way of saying that we want to not fall back to a generic "unknown, we can't conclude anything" state: we can actually still name the state (every program point has some symbolic state), we just don't have a better name for it or a proof that it is equal to any other state in the program.
So -- given that it's definitely not a meet-semilattice -- your suggestions are great actually and I've updated things (and added doc-comments). Thanks!
cfallin submitted PR review.
cfallin created PR review comment:
I'd like a comment here on why this is currently limited to the Load opcode specifically, rather than any instruction with that format. I assume it's something about store-to-load forwarding requiring that the load and store access the same memory width, and also have the same Value type, so that the value passed to the store can be substituted for the load. (I imagine future work could include replacing widening loads with sign- or zero-extend instructions given a suitable narrow store.) But it's not obvious to me how this interacts with redundant-load elimination.
Yep, that's exactly it. I've added a comment to that effect.
The interaction with RLE is that the RLE pass performs an action only if this analysis provides a "memory state" for an instruction, so if we withhold that state, then the analysis is effectively saying "this is not an analyzable load".
[removing redundant match, and
for_flags
update]Awesome, I like this much better -- done!
(I think this is a Lens in the Haskell sense, or at least a Functor, but I'm gonna look away from the abyss before it swallows me here)
cfallin submitted PR review.
cfallin created PR review comment:
[handle stack stores specially]
Yep, we definitely could do this. None of the benchmarks I had been developing with use them (actually,
cranelift-wasm
doesn't use them at all) so I didn't have an easy way to test or evaluate the effect of this; it can definitely be done as future work.[treat some instructions only as "other"]
Unfortunately I don't think this is correct in all circumstances: an instruction about which we don't know anything has to clobber all segments of the state, not just "other" (i.e., it could alias with any of the categories).
It's possible that reasoning about the specific instruction formats that exist could lead us to something more precise here, but I really don't want to go down that road, because it bakes in knowledge of the opcode set as static and unchanging, and thus sets up an action-at-a-distance timebomb for when we later add some other instruction kind that involves memory. I'd rather have fallback logic here that basically says "if we don't know, clobber everything" -- it's the safe and obviously-correct (for some value of "obvious") approach :-)
cfallin updated PR #4953 from mid-end
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
It's almost like these convenience functions exist to increase my convenience! I should know our APIs better; thanks :-)
cfallin submitted PR review.
cfallin created PR review comment:
Oh, neat, I didn't know about
.copied()
; new idiom noted!
cfallin submitted PR review.
cfallin created PR review comment:
Needs a constructor that takes a default and capacity, with refactors now; I was about to complain that such a thing doesn't exist on
SecondaryMap
but I just went to add it instead!
cfallin submitted PR review.
cfallin created PR review comment:
Ah, that's much cleaner!
I'm not sure why I had
#[inline(never)]
here; perhaps that's from some performance tweaking but I should have left a comment why. In the absence of such a comment, I'll delete these (it seems like it shouldn't matter); and can always add them back if something blows up in final measurements.
cfallin submitted PR review.
cfallin created PR review comment:
Fixed!
I also went ahead and converted these
log::trace!
's totrace!
's, to ensure we use our custom-defined macro that conditionally compiles (I often forget this!).
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
I think the following implementation is correct. Asymptotically, it has the same time complexity, but its space complexity is O(1) instead of O(n). Whether it's faster or slower in practice is another question entirely, of course. My guess is it's about the same.
for lp in self.loops.keys() { if self.loops[lp].level == LoopLevel::invalid() { let mut assigned = self.loops[lp].parent; let mut level = 0usize; while let Some(parent) = assigned.expand() { level += 1; if self.loops[parent].level != LoopLevel::invalid() { level += self.loops[parent].level.0.into(); break; } assigned = self.loops[parent].parent; } let mut cur = PackedOption::from(lp); while cur != assigned { self.loops[cur.unwrap()].level = level.into(); cur = self.loops[cur.unwrap()].parent; level -= 1; } } }
In writing this, though, I realized that
LoopLevel::root()
can't appear inloops[lp].level
after analysis is done, which means I think zero would be a better choice for an "invalid" value. This would all be a little simpler if a loop level were represented asOption<NonZeroU8>
. Then you can even replaceLoopLevel::inc
withNonZeroU8::saturating_add
, which apparently was just added in Rust 1.64.
jameysharp created PR review comment:
Okay, it's clear that these optimizations can potentially delete the only transitive data dependency on a potentially side-effecting instruction like integer division, right?
I guess the reason you're saying we don't delete those in spite of these optimizations is because they're kept even if there is no data dependency on them from any return instruction. Is that right?
In that case I think I'm satisfied. Though maybe a comment to that effect, in this source file specifically, would help somebody later.
jameysharp created PR review comment:
I weakly disagree about avoiding e.g.
*inst
, and weakly agree about the hazards of this particular magic. :laughing: Regardless, fair enough!
jameysharp created PR review comment:
This rematerialization test case seems like an odd example since
block2
is strictly worse on any machine that has... um... at least one register, right?It's a nice example of re-associating and constant-folding, though.
cfallin updated PR #4953 from mid-end
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Done!
jameysharp submitted PR review.
jameysharp created PR review comment:
But you're passing the, um, default
default
towith_default
. So the existingwith_capacity
is fine, right?
cfallin submitted PR review.
cfallin created PR review comment:
Ah, yes, good catch; updated.
cfallin created PR review comment:
The second version reads more confusingly to me; and it seems less error-prone to retain the top-level "if old != new" check than to thread a bool through the recomputation. If you really feel strongly here I'm happy to tweak further though...
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Fixed (with
worklist
).
cfallin updated PR #4953 from mid-end
to main
.
cfallin created PR review comment:
Right, side-effecting ops are kept in the top-level "side-effect skeleton", so they are never deleted.
Do you think we need a comment specifically on these rules, or ...? The reason I ask is that strictly speaking, any optimization at all that removes a reference to an input could trigger this confusion. I guess a top-level note on what rules are allowed to do (rewrite pure ops arbitrarily, remove references...) could help though.
cfallin submitted PR review.
cfallin updated PR #4953 from mid-end
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
(Added a comment; feel free to suggest better clarifications if you have more ideas)
cfallin updated PR #4953 from mid-end
to main
.
cfallin created PR review comment:
Oh, huh,
None
is indeed the default for anOption
. I guess I could have guessed that but it didn't occur to me. Thanks!
cfallin submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
Maybe I'm confused by the use of the word "type". In this paragraph, I read that as referring to heap/table/vmctx/other. If you meant Cranelift type, like I8/I16/etc, that would make sense. Maybe saying "value type" disambiguates enough that I wouldn't have tripped over it?
jameysharp submitted PR review.
jameysharp created PR review comment:
Nah, I don't feel strongly about this.
jameysharp created PR review comment:
Somehow I assumed cranelift-wasm used stack slots, but I guess there really is no reason for it to. Huh!
Well, I don't think it matters for this PR, but perhaps
has_memory_fence_semantics
should still be updated to includeDebugtrap
? It does seem like an instruction that shouldn't have memory accesses reordered across it.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
I think this should be
.get_or_insert(state)
, because we wantmeet_from
to leave the state unchanged if this is the first time we're setting it. Or just:if let Some(succ_state) = succ_state.as_mut() { succ_state.meet_from(&state, succ_first_inst); } else { *succ_state = state; }
cfallin submitted PR review.
cfallin created PR review comment:
Ah, yes, it is indeed a value type; I'll clarify the wording. Thanks!
cfallin updated PR #4953 from mid-end
to main
.
cfallin updated PR #4953 from mid-end
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, yes, I agree -- updated (and actually added
op if op.can_trap()
as an additional match arm).
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
opts/store_to_load.isle
doesn't exist. It looks like after writing this comment you changed things so the store-to-load optimization is called directly from egraph.rs, not from ISLE.
jameysharp created PR review comment:
Why is
at_loop_level
fallible?
jameysharp created PR review comment:
Why doesn't
eclass_type
return types found inParam
orResult
nodes? It seems like the only case where it ought to fail is if all nodes arePure
orInst
with multiple results. I'd also think that if any node in the class has multiple results, they all should.If all of that's true, then I think you can match on any node in the class and determine the result without looking further.
jameysharp created PR review comment:
This is an option now, right?
if egraph.egraph.unionfind.equiv_id_mut(store_addr, *load_addr) {
jameysharp created PR review comment:
Is there an implicit precondition that
egraph.subsume_ids
is empty on entry tooptimize_eclass
?For this sort of memory reuse, I like to call
.clear()
on entry to a function likeoptimize_eclass
, instead of on exit, because that's one less precondition I have to keep track of and verify in callers.Although, weird thought: couldn't you put a
subsume
bool in theIsleContext
and just check that after each call tonext
?
jameysharp created PR review comment:
Nice: having
can_trap
implyhas_memory_fence_semantics
seems like a good idea. Though actually, shouldDebugtrap
be incan_trap
as well?
jameysharp created PR review comment:
Why does
store_to_load
only fire if there's a Load as the _first_ enode encountered while traversing the eclass? Is this because the node was added to the egraph immediately before this function is called?I think that could use a comment.
Is it necessary for correctness to add the Load node to the egraph if the store gets forwarded to the load? Hypothetically, could the caller pass
load_op
,load_ty
, etc to this function, and entirely skip constructing theNode::Load
if it succeeds? I'm not asking you to change that in this PR, I just want to know if there are invariants that would break if someone did that.
jameysharp created PR review comment:
This is an intimidating amount of nesting for one function.
Studying it carefully, I think almost all of the branches are necessary, although I'd suggest
.unwrap()
on the two calls toget_node()
: in both cases, the eclass is provided from a source that just added the appropriate kind of node.The only other significant improvement I see is here: matching the
load_op
andstore_op
against appropriateInstructionImms
cases can be folded into the earlier matches againstNode::Load
andNode::Inst
, respectively.If you make this function return
Option<Id>
, I believe the?
syntax works for unwrappingOption
s. So that could help some too, although if you useunwrap
on theget_node
calls I believe there's only oneOption
left.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, I see, this is the same trick we saw in the famous single-predecessor-chain traversals :-) OK, yeah, avoiding the
SmallVec
might be worthwhile here. I agree that the shallow loop nests are unlikely to stress either algorithm (so e.g. pointer-chasing overhead of traversing twice is likely negligible) so on balance let's go with yours as it requires less storage.
cfallin updated PR #4953 from mid-end
to main
.
jameysharp submitted PR review.
jameysharp created PR review comment:
Yup, same trick! I might be over-eager to apply that trick now, but it's a pretty good trick. :grinning_face_with_smiling_eyes:
I hope my version works; I meant it to be illustrative, more than an exact solution. I typed it directly into the GitHub comment, so I didn't even know whether it would compile…
I was briefly tempted to suggest a different trick: temporarily overwrite the parent pointers during the traversal so they instead point to the child you need to return to, then put back the original parent pointers as you walk back up the stack. That gives you the stack traversal pattern in O(1) additional space. It's also too clever/confusing to justify using here, I think. :sweat_smile: I believe I first learned about it in the context of some garbage collector, where it makes more sense.
cfallin submitted PR review.
cfallin created PR review comment:
Right, it's testing specifically that remat is preferred even when GVN would do the opposite naturally. It looks silly here where register pressure is low, but it happened to help significantly with SpiderMonkey, so...
(The case where it's most interesting is when there are a whole bunch of values like
x+1
,x+2
, ... rematerialized inside a loop or otherwise deeper into a function, and when these all would otherwise be hoisted by LICM or GVN; then we have just one registerx
live instead of all of the values)
cfallin updated PR #4953 from mid-end
to main
.
cfallin updated PR #4953 from mid-end
to main
.
cfallin updated PR #4953 from mid-end
to main
.
cfallin updated PR #4953 from mid-end
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
It almost worked! The last
level -= 1
would underflow before we take the backedge and hit a loop-exiting condition; reconfigured things a bit to break before that.
cfallin updated PR #4953 from mid-end
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, right; I still had the
Top
default state in my mind and had figured that meeting with an initialTop
was harmless; but here the default isEntry
so we need to special-case the first set.
cfallin updated PR #4953 from mid-end
to main
.
cfallin created PR review comment:
No reason at all -- updated!
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin created PR review comment:
The optimization entrypoints are invoked for every new eclass node created (i.e., every time an enode is added to an eclass), so we only need to (and in fact should only, to avoid redundancy) examine just the latest enode in the eclass. I'll add a comment regarding this!
Is it necessary for correctness to add the Load node to the egraph if the store gets forwarded to the load? Hypothetically, could the caller pass load_op, load_ty, etc to this function, and entirely skip constructing the Node::Load if it succeeds? I'm not asking you to change that in this PR, I just want to know if there are invariants that would break if someone did that.
I think that's technically possible, yeah, although there's one important case it would pessimize (in compile time, not code quality): having the
Load
node in the egraph as well lets other laterLoad
s (with the same memory state) deduplicate with it. If we did not add theLoad
node to the egraph at all, then laterLoad
s would eventually still end up with the forwarded stored value if the original did, becausestore_to_load
is deterministic; but we would redo the work each time to determine that.
cfallin submitted PR review.
cfallin created PR review comment:
That's a great question! I think I simply did not hit these cases while experimenting with this extractor with rewrite rules, but they absolutely should exist.
cfallin created PR review comment:
Actually, thinking about this has made me realize why both this line (and the
subsume
bool also) don't work: reentrancy of the rewriting! From within the call tosimplify
we can create a new node and enteroptimize_eclass
recursively; I think both a clear-at-beginning and clear-at-end potentially prematurely clearsubsume
directives that should be seen by an outer invocation. The most precise memory-reuse approach is probably to clear it when exiting the top level (based on the reentrancy limiter / depth counter) but in order to make reasoning about this easier I think we can probably just let it live for the duration of the egraph (function) rewrite.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, I had missed that, thanks! (Yeah, I tried to write store-to-load forwarding as a rule at first, but it turned out to be too... bespoke?... to build a bunch of extractors and special support for side-effecting insts in the general rewrite framework to reason about loads and stores.)
cfallin submitted PR review.
cfallin updated PR #4953 from mid-end
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
I golfed this a bit more, starting from your suggestions (thanks!); I combined the three conditionals (type, offset, address) into one as well. Much better now!
I'm still noodling over the idea of making this an ISLE rule instead; maybe eventually it will be better that way, when we have more infrastructure to support non-pure ops, but I think I want to approach that idea incrementally / as a followup.
jameysharp submitted PR review.
jameysharp created PR review comment:
I think that means I got the edge cases wrong a different way: the smallest level it should assign is 1, so decrementing it an extra time in the last loop iteration should never underflow.
Also, I see you changed my usize to a u8 so you could directly construct a LoopLevel. I'd imagined a From impl that would saturate when converting from a usize that was too large. But this is fine too, at least in debug builds where overflow panics. Loop depth comes from untrusted input so I think that probably should be checked in release builds too, right? Restricting to 255-deep nested loops seems reasonable to me, anyway.
fitzgen submitted PR review.
fitzgen created PR review comment:
/// Last-store instruction (or none) for a given load. Use a hash map /// instead of a `SecondaryMap` because this is sparse.
fitzgen created PR review comment:
If we are computing this for ~every instruction, then a
SecondaryMap
should perform better here.
fitzgen created PR review comment:
Is there a reason we aren't using
std::ops::Range
here?
fitzgen created PR review comment:
Is
always
really needed over just#[inline]
? In excess,always
tends to slow down debug builds.
fitzgen created PR review comment:
Ah okay I see that this is just for load instructions.
fitzgen created PR review comment:
Are these meant to happen before we land this PR or in a follow up? Because IIUC right now if egraphs are enabled then we don't run simple preopt which means that we won't do either of these optimizations at all.
fitzgen submitted PR review.
fitzgen created PR review comment:
Default
is in thestd
prelude so the namespace shouldn't need to be fully elaborated here.impl Default for AnalysisValue {
jameysharp created PR review comment:
I think there should be a stat for max
rewrite_depth
, or for number of timesREWRITE_LIMIT
was exceeded, or both.
jameysharp created PR review comment:
I think letting
subsume_ids
live for the duration of the function rewriting is a fine idea; as you say, it's simplest to reason about.That said: I think putting a bool in the
IsleContext
is actually reentrancy-safe, because there's a separate instance of that context for every call tooptimize_eclass
. They all share the same reference to the egraph, which is why a sharedHashSet
there is _not_ safe, but we can add things that are local to each rewrite.
jameysharp submitted PR review.
jameysharp created PR review comment:
I'm not Chris, but I think they're intended for a follow-up. Compiling with egraphs is disabled by default for now, so it doesn't have to have every optimization yet. They're probably necessary before egraphs can be the default though, I'd think.
jameysharp submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
I'm not sure; it's pre-existing though (this definition was copied from the ISLE definition preludes for the backends). That change could be made across all backends if we want to look into it after this PR, maybe?
cfallin submitted PR review.
cfallin created PR review comment:
This one is actually load-bearing IIRC, from performance tuning!
cfallin submitted PR review.
cfallin created PR review comment:
Yep, I'd prefer to get this in-tree and then continue adding to the body of opts, rather than holding back even longer.
cfallin updated PR #4953 from mid-end
to main
.
cfallin updated PR #4953 from mid-end
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, that's a good point; I had been imagining a bool flag on the egraph itself rather than the per-ISLE-invocation context. Happy to revisit this (and actually the whole reentrancy mechanism maybe) in followup work...
cfallin updated PR #4953 from mid-end
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Done (the latter)!
cfallin updated PR #4953 from mid-end
to main
.
cfallin updated PR #4953 from mid-end
to main
.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
How about replacing
NodeCtx::with_capacity
with aNodeCtx::for_dfg
, and passing in&func.dfg
? Right now the twousize
arguments are easy to mix up. Even looking at the only caller, I can't immediately tell iffunc.dfg.num_values()
is the right argument fortypes
andfunc.dfg.value_lists.capacity()
is the right argument forargs
.
jameysharp created PR review comment:
As far as I can tell, this
Default
impl forNodeCtx
is unused. But if it were needed, it should be derivable.
jameysharp created PR review comment:
All the uses of
NodeKey::node
, like this one, would be a little nicer if type inference just worked. I think there are several ways to do that.One is if, instead of using
L: Language
as the type parameter and usingL::Node
, we just let the node typeN
be the type parameter, without constraining it to come from any language. Callers already passegraph.nodes
, which uniquely determines the node type, just not the language type.Another is to have that function take a reference to the
EGraph<L, A>
, and access thenodes
field itself. That's a little tedious forNodeKeyCtx
which would need to borrow the whole egraph instead of just the nodes, but that probably works.The last option would be to give
NodeKey
andEClass
a phantom type parameter for the language they come from, and thread that type parameter through everywhere. I think that's appealing in the long run for ensuring that these node and class IDs can't be misused, but I wouldn't want to do it in this PR.I think either of the first two would be nice to do now.
jameysharp created PR review comment:
Nit: I'd prefer not to hard-code the type here so we could fiddle with it by just changing the definition of
Node::Param
.Also, I recently noticed there's guidance on error message style in the Rust docs. I'm not entirely sure how to apply that advice here, but I intend to think about it more when I'm writing these sorts of things.
So maybe:
index: i.try_into().expect("blockparam index should fit in Node::Param"),
jameysharp created PR review comment:
I was curious, so I checked:
ops::Range
has a derived implementation ofDefault
, so0..0
is what you'd get fromSecondaryMap::new
.I keep harping on this because I kind of want to remove the
SecondaryMap::with_default
constructor. I believe every existing use either is redundant with an existing implementation ofDefault
or would be natural to implement.
jameysharp created PR review comment:
I'd prefer to avoid the
unwrap
here by explicitly matching on theInstructionData
:let is_readonly_load = match func.dfg[inst] { InstructionData::Load { opcode: Opcode::Load, flags, .. } => flags.readonly() && flags.notrap(), _ => false, };
I'm guessing that, unlike in the alias analysis, here it really isn't important that the opcode be exactly
Opcode::Load
. Sign- or zero-extending loads and vector loads should also be okay to treat as pure if they're both readonly and notrap, right?
cfallin updated PR #4953 from mid-end
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
I'm guessing that, unlike in the alias analysis, here it really isn't important that the opcode be exactly Opcode::Load. Sign- or zero-extending loads and vector loads should also be okay to treat as pure if they're both readonly and notrap, right?
In theory yeah, that's absolutely true. However making that change is a little more than
op.can_load()
because we want exactly those loads that are "normal" (take a value from memory with no special semantics or other side-effects), and.can_load()
is a much broader predicate that includes e.g. atomics and CAS; and there's no "normal load but maybe extending" predicate. So I wanted to avoid untested, uncertain complexity.Happy to address this in a followup if we find cases where it's important (or just in general), though!
cfallin submitted PR review.
cfallin created PR review comment:
(The suggestion is great though, applying that; thanks!)
cfallin updated PR #4953 from mid-end
to main
.
cfallin updated PR #4953 from mid-end
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Indeed, that works!
cfallin updated PR #4953 from mid-end
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Ah! I had fought with this for a bit when I first wrote it to try to get the type-inference to work; it never occurred to me to just parameterize directly on the node type. That indeed works. Thanks!
cfallin created PR review comment:
I like that; named it
with_capacity_for_dfg
to make its semantics clearer (justfor_dfg
might imply that it somehow translates the contents of the DFG or something like that).
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Yep, verified it's not needed; deleted!
jameysharp submitted PR review.
jameysharp created PR review comment:
I believe this
unwrap()
will panic onStackLoad
instructions.
jameysharp submitted PR review.
jameysharp created PR review comment:
Assuming the post-processing of the new eclass ID doesn't use this
side_effect
flag, instead checking whether the added node wasNode::Pure
or not... Maybe this use of the flag can be replaced like so, and the definition of the flag above can be deleted:} else if has_side_effect(func, inst) || opcode.can_load() {
The flag is currently the disjunction of three cases:
- Does this instruction have side effects according to
has_side_effect
?- Is this a non-
readonly
load?- Can this instruction store?
Case 3 is already covered by case 1:
opcode.can_store()
implieshas_side_effect
. So that case should be deleted regardless.<details>
<summary>The relationship between case 1 and 2 is confusing though.</summary>If an instruction:
-can_load
, and
- it is not aStackLoad
, and
- it is either not theLoad
instruction format, or it does not have thenotrap
flag set,
thenhas_side_effect
is implied.At this
else
, we've already ruled out the possibility that the instruction is a load with bothreadonly
andnotrap
flags set. So there are two remaining cases where case 2 is not implied by case 1:
- it could be aLoad
-format instruction withnotrap
but withoutreadonly
- it could be aStackLoad
In those cases, this PR currently uses
Node::Inst
, but if we just usehas_side_effect
, those cases would useNode::Pure
instead.</details>
I think it makes sense that all loads need their order and execution preserved unless they fall into one of the two earlier special cases:
- either they're in read-only memory (so they can be re-ordered past stores) and they can't trap (so they can be deleted if unused),
- or alias analysis proves that it's okay to delete or forward a load.
jameysharp created PR review comment:
I'm not suggesting using
op.can_load()
. The version I proposed here checks that the instruction uses theLoad
instruction format, which excludes all the atomics and other magic loads. So that actually is a "normal load but maybe extending" predicate, as far as I can tell. I think you just need to delete theopcode: Opcode::Load
pattern if you want to cover the other cases.
jameysharp created PR review comment:
I'm having trouble convincing myself that these conditions agree with corresponding conditionals above. So, let's say after the
let node = ...
block above, there's something like:let is_pure = matches!(&node, Node::Pure { .. });
And assume
id
ismut
above. Then this block could look something like the following. I'm choosing to callstore_to_load
on all new enodes: that function already has to guard against being handed an eclass ID that is not a load, so this doesn't affect correctness. It might affect performance, but that's not clear to me, and it's hard to be precise here without duplicating big parts ofstore_to_load
.if let NewOrExisting::New(new_id) = id { // Loads: do store-to-load forwarding. let opt_id = crate::opts::store_to_load(new_id, self); trace!("store_to_load: {} -> {}", new_id, opt_id); if opt_id != new_id { id = NewOrExisting::Existing(opt_id); } } let id = match id { NewOrExisting::Existing(id) => id, NewOrExisting::New(id) if is_pure => { // Apply all optimization rules immediately; the // aegraph (acyclic egraph) works best when we do // this so all uses pick up the eclass with all // possible enodes. crate::opts::optimize_eclass(id, self) } NewOrExisting::New(id) => { self.side_effect_ids.push(id); self.stats.side_effect_nodes += 1; id } };
cfallin created PR review comment:
Unfortunately I think this will be a meaningful performance regression -- I remember specifically restricting store-to-load logic to run only on loads while tuning.
Happy to add comments here to clarify as needed -- any thoughts (while keeping the structure the same)?
cfallin submitted PR review.
cfallin updated PR #4953 from mid-end
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Fixed!
cfallin submitted PR review.
cfallin created PR review comment:
I'm afraid I'm not completely following this suggestion, sorry. If I understand correctly, you're proposing a change, then going through a (fairly complex) case analysis to argue why it's correct? Or is the concern that the code as-written is incorrect? (I agree that the
can_store()
check is redundant and I've removed it, thanks!)I think that at the point we are looking at the definition of predicates and their exhaustive cases, and effectively "inlining" that in our reasoning, we are doing something wrong here: we're writing more brittle code that has less modularity. E.g. I'd prefer not to rely on the intersection of specific load flags in two different files to reason about removing a condition here.
I think that as-written, the conditions are not too bad to understand from first principles: we want to handle the four categories of (i) readonly loads, (ii) all other loads, (iii) all side-effecting ops, and (iv) all pure ops. I like that there are boolean flags that indicate each of these (except the implicit
Some
-ness ofmem_state
for loads; I can addis_load
if you'd prefer there). I don't like inlining e.g. a|| can_load
condition in this if-ladder: it makes the reasoning a lot muddier to me.It's possible I've missed the whole point of this comment though; please do let me know if so!
cfallin updated PR #4953 from mid-end
to main
.
cfallin created PR review comment:
OK, talked with @jameysharp offline about this one and I think we can get around the perf regression with another conditional; and the rest of this is actually nice and clean and I like it better. So I'll accept this then tweak it a bit.
cfallin submitted PR review.
cfallin updated PR #4953 from mid-end
to main
.
cfallin updated PR #4953 from mid-end
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
OK, talked with @jameysharp about this one too and it makes more sense now -- accepting suggestion. Thanks!
cfallin updated PR #4953 from mid-end
to main
.
cfallin updated PR #4953 from mid-end
to main
.
cfallin updated PR #4953 from mid-end
to main
.
cfallin updated PR #4953 from mid-end
to main
.
jameysharp submitted PR review.
jameysharp created PR review comment:
Alternatively:
.fold(Cost::zero(), Cost::add);
jameysharp submitted PR review.
jameysharp created PR review comment:
This return value isn't used in the only caller of this function (
elaborate_block
). Given that there's already adebug_assert
, how about this? Instead of a conditional branch to check the condition, just write a 0 to the Vec's length field:self.elab_result_stack.clear();
jameysharp created PR review comment:
There's an
Opcode::is_terminator
which is true for some of the same cases asis_term
, but not all the varieties of branches. So I think this is a confusing name for this variable.Also,
is_terminator
includesTrap
, which this condition does not. Should it?
jameysharp created PR review comment:
You can make
result_tys
a slice in both cases and just loop over it later without having to re-check which case you're in:let (instdata, result_tys) = match node { Node::Pure { op, types, .. } | Node::Inst { op, types, .. } => ( op.with_args(args, &mut self.func.dfg.value_lists), types.as_slice(&self.node_ctx.types), ), Node::Load { op, ty, .. } => ( op.with_args(args, &mut self.func.dfg.value_lists), std::slice::from_ref(ty), ),
jameysharp created PR review comment:
Out of curiosity: since block params are unique, each such enode _should_ be absent at this point, right?
cfallin updated PR #4953 from mid-end
to main
.
cfallin updated PR #4953 from mid-end
to main
.
cfallin updated PR #4953 from mid-end
to main
.
cfallin created PR review comment:
Ah, here we are actually returning the result, so unfortunately we can't do that...
cfallin submitted PR review.
cfallin updated PR #4953 from mid-end
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Err, apparently I'm reading too fast because I see you noted this above and further noted it isn't used :-) Yes, actually, this is a nice cleanup, thanks!
cfallin updated PR #4953 from mid-end
to main
.
cfallin updated PR #4953 from mid-end
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
This is different than
is_terminator
I think as it needs to cover conditional branches as well.The mechanism actually has a pretty interesting and subtle reason for existing, which I've added in a longer comment here -- I had forgotten as well until I rediscovered it via test failures :-)
cfallin created PR review comment:
Yep, but the only API we have is
insert_if_absent
, notinsert_if_absent_and_assert_otherwise
:-)
cfallin submitted PR review.
jameysharp submitted PR review.
cfallin merged PR #4953.
Last updated: Dec 23 2024 at 12:05 UTC