cfallin requested fitzgen for a review on PR #4163.
cfallin opened PR #4163 from alias-analysis
to main
:
This PR adds a basic alias analysis, and optimizations that use it.
This is a "mid-end optimization": it operates on CLIF, the
machine-independent IR, before lowering occurs.The alias analysis (or maybe more properly, a sort of memory-value
analysis) determines when it can prove a particular memory
location is equal to a given SSA value, and when it can, it replaces any
loads of that location.This subsumes two common optimizations:
Redundant load elimination: when the same memory address is loaded two
times, and it can be proven that no intervening operations will write
to that memory, then the second load is redundant and its result
must be the same as the first. We can use the first load's result and
remove the second load.Store-to-load forwarding: when a load can be proven to access exactly
the memory written by a preceding store, we can replace the load's
result with the store's data operand, and remove the load.Both of these optimizations rely on a "last store" analysis that is a
sort of coloring mechanism, split across disjoint categories of abstract
state. The basic idea is that every memory-accessing operation is put
into one of N disjoint categories; it is disallowed for memory to ever
be accessed by an op in one category and later accessed by an op in
another category. (The frontend must ensure this.)Then, given this, we scan the code and determine, for each
memory-accessing op, when a single prior instruction is a store to the
same category. This "colors" the instruction: it is, in a sense, a
static name for that version of memory.This analysis provides an important invariant: if two operations access
memory with the same last-store, then no other store can alias in the
time between that last store and these operations. This must-not-alias
property, together with a check that the accessed address is *exactly
the same* (same SSA value and offset), and other attributes of the
access (type, extension mode) are the same, let us prove that the
results are the same.Given last-store info, we scan the instructions and build a table from
"memory location" key (last store, address, offset, type, extension) to
known SSA value stored in that location. A store inserts a new mapping.
A load may also insert a new mapping, if we didn't already have one.
Then when a load occurs and an entry already exists for its "location",
we can reuse the value. This will be either RLE or St-to-Ld depending on
where the value came from.Note that this does work across basic blocks: the last-store analysis
is a full iterative dataflow pass, and we are careful to check dominance
of a previously-defined value before aliasing to it at a potentially
redundant load. So we will do the right thing if we only have a
"partially redundant" load (loaded already but only in one predecessor
block), but we will also correctly reuse a value if there is a store or
load above a loop and a redundant load of that value within the loop, as
long as no potentially-aliasing stores happen within the loop.Fixes #4131.
Passes tests and runs SpiderMonkey correctly locally; benchmarks TBD.
Creating this as a draft for early feedback; will likely refine the comments
and explanations a bit more, and benchmark this.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin updated PR #4163 from alias-analysis
to main
.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
It would be nice if all stack slots would be their own category until their address is leaked using stack_addr.
cfallin submitted PR review.
cfallin created PR review comment:
Sure, that would be nice, but entails an escape analysis, which is outside the scope of this PR. Happy to discuss further (and review PRs!) once this is in, of course.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Even stopping optimization entirely if stack_addr is used on a stack slot would likely help cg_clif a lot. Having each stack slot whose address is never leaked be optimized independently is the most important for cg_clif. I'm fine with leaving that for a later PR.
cfallin updated PR #4163 from alias-analysis
to main
.
cfallin updated PR #4163 from alias-analysis
to main
.
cfallin has marked PR #4163 as ready for review.
lqd submitted PR review.
lqd created PR review comment:
Tiny typo
//! to the abstract state category occurred, *and* no other trapping
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen created PR review comment:
For clarity:
/// Accesses only the "heap" part of abstract state. Used for alias
Same for the other abstract state parts.
fitzgen created PR review comment:
Slightly unfortunate that we can't easily differentiate between two different memories or two different tables. I don't think it will matter too much in practice, but unfortunate none the less.
File a follow up issue for having an alias analysis bit for each memory/table instead of lumping all memories and all tables together?
fitzgen created PR review comment:
I think this already exists as https://docs.rs/cranelift-codegen/latest/cranelift_codegen/ir/instructions/enum.InstructionData.html#method.memflags no?
fitzgen created PR review comment:
Could you add some comments about what cases this is testing and what is expected? That will be really helpful for folks updating tests for unrelated reasons and trying to determine whether the new precise output is valid or not. Also useful for reviewers trying to figure out which cases do or don't have tests ;)
fitzgen created PR review comment:
These are becoming very verbose. Can we make
set_table
et al take and returnSelf
so that we can chain the methods likeir::MemFlags::trusted().set_table()
fitzgen created PR review comment:
I think these tests would be a lot easier to read as CLIF to CLIF tests. That way we don't have to figure out what changes are from unrelated passes or just related to aarch64-specific lowering, or even have to know aarch64 to understand these tests that are logically portable and architecture-independent.
Adding
test alias_analysis
that just does alias analysis followed by GVN by callingContext::remove_redundant_loads
should be really straightforward. Should be able to pretty much copy+paste thetest simple_gvn
implementation:
fitzgen created PR review comment:
File a follow up issue to track/discuss dead-store elimination?
cfallin updated PR #4163 from alias-analysis
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Thanks!
cfallin updated PR #4163 from alias-analysis
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
That's a great idea, thanks -- done!
cfallin submitted PR review.
cfallin created PR review comment:
Done (here and in other new tests)!
cfallin updated PR #4163 from alias-analysis
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Ah yes, I had missed that; using that one now, thanks.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Sure, good idea; I added
with_<flag>
methods toMemFlags
.
cfallin submitted PR review.
cfallin created PR review comment:
Sure! There are efficiency reasons why this is not trivial -- bits in a bitset (
MemFlags
is au8
, but even with au32
oru64
that's still a small fixed number of heaps/tables), and the vector of last-stores (does it scale arbitrarily?). But maybe there's a hybrid approach (small number of privileged heaps/tables) or maybe we can find a way to scale it well enough.Filed #4166.
cfallin submitted PR review.
cfallin created PR review comment:
Sure, #4167.
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen created PR review comment:
Can we add a test for loads of the same address but of different types? I.e. something like
v1 = load.i32 v0 v2 = load.f32 v0 ;; check: ... the second load was not eliminated ...
Probably would be good to test signed vs unsigned extensions and 8-to-32 vs 16-to-32 extensions as well.
fitzgen created PR review comment:
Can we also have tests for the other fence-semantics instructions too? Just tiny functions for each of them like
v1 = load.i32 v0 <fence instruction> v2 = load.i32 v0 ;; check: ...
fitzgen created PR review comment:
It might make sense to reduce indentation here, since this is basically the rest of the function:
let (address, offset, ty) = match inst_addr_offset_type(pos.func, inst) { None => continue, Some(x) => x, }; // ...
Totally up to you.
fitzgen created PR review comment:
Just to double check ourselves
fn get_ext_opcode(op: Opcode) -> Option<Opcode> { debug_assert!(op.can_load() || op.can_store()); match op {
cfallin updated PR #4163 from alias-analysis
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Done! Added
filetests/alias/fence.clif
.
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin created PR review comment:
Done! Added
filetests/alias/extends.clif
.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, there's actually a final call to
update_state
below the long if-let, which this would skip, and factoring the body out is a bit complex because of mutable borrows of self. Left it as-is for now, but thanks in any case!
cfallin merged PR #4163.
Last updated: Dec 23 2024 at 12:05 UTC