cfallin opened PR #18 from cranelift-2022
to main
:
This is a roadmap for work on Cranelift in 2022, highlighting areas of interest and outlining the major projects that we want to focus on in the next year. It is meant to be part guiding document, part aspirational collection of interesting ideas and possibilities. As with last year's roadmap, we likely won't accomplish all of it, though if we're fortunate we may achieve a good amount of it.
Thanks to @fitzgen for feedback on a draft, and many others (see Acknowledgments at end) for general input on these topics.
Apologies if this misses anything important -- I tried to be as comprehensive as possible, while still coalescing into general themes. I'm very curious what the community thinks and how folks feel about these ideas!
bjorn3 submitted PR review.
bjorn3 created PR review comment:
There is also another design constraint: It shouldn't break parallel compilation to ensure that we don't drastically regress compilation times. Function compilation is currently entirely parallelizable as everything is local. Inlining however is an inherently non-local optimization.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
DWARF exception handling also requires a tail call to
_Unwind_Resume
.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Also the landingpad must be able to take parameters. The x86_64 DWARF exception handling allows up to 4. Rustc requires 2.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
You could also differentially fuzz between perturbations of the same input.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
A moonshot idea, but what about exposing these details to the user. Then they could try to make better decisions than the compiler while still being guaranteed (assuming no bugs in clif) that the output isn't a miscompilation.
cfallin submitted PR review.
cfallin created PR review comment:
This is definitely way-out-there, but I do like the idea of a programmable meta layer as a first-class user-exposed interface. In the EDA (electronic design automation) world, tools have pretty much all standardized on Tcl as a command language, and one can write scripts like "optimization1; optimization2; simplify x y z; ...". Similarly for the "strategy layer" in proof-automation/verification tools. This would need a lot of thought and is lower-priority than any of the work that directly benefits correctness testing, but, yes, cool idea!
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen created PR review comment:
I think we actually don't want to encourage the use of things like
ref_to_int
andint_to_ref
. This will still either suffer from the "whoops, no stack map entry for this value because its anint
and not aref
" problem, or we will have to start adding stack map entries forint
s derived from aref
, but then where do we stop? What aboutint
s derived from anint
derived from aref
? It gets messy really quick. Instead, I think we want to allow usingref
s as address operands for loads and stores, and maybe a few other places whereint
operands are currently allowed, butref
s aren't (maybe even makeref
usable everywhereint
is, basically making them exactly the same except thatref
s get stack map entries).But all of this is a bit too much in the weeds for a roadmap RFC. I think it suffices to say that we should make
r{32,64}
usable for non-opaque GC objects where we can load from/store to their fields (including fields that are themselvesr{32,64}
s) inline instead of only with out-of-line VM functions like what would be required now.
cfallin submitted PR review.
cfallin created PR review comment:
Yeah, that's a good point. I guess I had been imagining the lowering of a field access to be a "cast to int, mask off the tag bits, and load, while carefully avoiding safepoints" sort of operation (because the ref is dead/unrooted after the cast). But that's a big footgun. First-class operators that use refs as pointers are almost certainly better in that regard.
One consideration I had in mind is that the conversion from ref to actual pointer actually needs to be a little bit flexible to work in various runtimes: I'm aware of at least low-bit tagging (classical "pointers are aligned so use 2/3 tag bits"), NaN boxing or upper-bit tagging generally (top 16 bits), and the JVM's "compressed oop" design where a 32-bit pointer is used in 64-bit space with
base + oop << 3
(allowing a 2^35 / 32 GiB heap). We could in theory have a load_ref/store_ref that takes a mask, a shift, and an addend, but maybe there's something better.Another possibility that occurs to me: allow casts to ints, and allow the user to generate CLIF as needed, but also take the ref itself as an arg to the load/store. This requires keeping the ref alive up to the point of use, at least at the CLIF level. (If the load/store isn't a safepoint for the particular runtime then the lowering can drop it.) So we'd have
int := ref_to_int r; actual_ptr := arbitrary_conversion(int); value := load_ref actual_ptr, r
.Anyway, yeah, I'll update the RFC to be more generic on this point but this will be an interesting design discussion to have later!
eqrion submitted PR review.
eqrion created PR review comment:
I’d like to add another point on the design space of deriving pointers from GC objects, although agreed that it warrants a deeper discussion elsewhere.
An issue SpiderMonkey will have with the Wasm-GC proposal when we add optimizing support is that some of our GC objects are represented as a GC ref to a cell which contains a regular heap pointer to the actual data. The outline data’s lifetime is tied to the GC object and is freed with a finalizer. When we access a field/element for an object then, we need to acquire the data pointer and then get the field/element from the data pointer.
There’s a potential hazard here however if the GC ref’s lifetime ends before the use of the data pointer and there is a safepoint/collection after the GC ref’s lifetime ends and before the use of the data pointer. This can’t be solved with just allowing r32/r64 to be manipulated as integers as the derived data pointer is not a ref that can be traced and shouldn’t get a stackmap entry. SM solves this by adding a synthetic use of the original GC ref to any loads using the data pointer. This extends the lifetime of the GC ref to match all uses of the data pointer. There may be other solutions here as well.
Additionally, the outline data representation for GC objects is a pain for codegen efficiency and ideally would be eliminated. But that will require some work in the SM GC that may not be feasible right away.
akirilov-arm submitted PR review.
akirilov-arm created PR review comment:
Should we mention the flexible vectors proposal (given that @sparker-arm is looking at it)?
cfallin updated PR #18 from cranelift-2022
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
That's a good point, thanks! I added a paragraph that describes the tradeoff. The three design points I can think of right away are
- Some sort of snapshotting, so that each function can be mutated separately while also accessing the function-to-be-inlined in a read-only way; or
- Some kind of partially-parallel design, where we "stratify" the callgraph into a DAG of SCCs then scan a parallel frontier of inlining up the DAG, from leafs to roots, processing serially within each SCC (this is maximally parallel while still allowing the inliner to see already-inlined versions of functions it might in turn inline); or
- Make inlining a special, optional, more costly pass that just operates serially, and takes a mutable borrow of the whole module or modules (collection of functions). This might be the best way to start.
cfallin submitted PR review.
cfallin created PR review comment:
Added "and dataflow", thanks!
cfallin submitted PR review.
cfallin created PR review comment:
Yes, that's a good idea; added a mention of this.
cfallin submitted PR review.
cfallin created PR review comment:
Updated this description to note that there is a design space to explore, and some constraints (avoiding rooting issues, also being flexible across different reference representations, such as indirection or tagging approaches).
cfallin submitted PR review.
cfallin created PR review comment:
Added, thanks!
bjorn3 submitted PR review.
abrown submitted PR review.
radu-matei submitted PR review.
mingqiusun closed without merge PR #18.
mingqiusun reopened PR #18 from cranelift-2022
to main
.
pchickey submitted PR review.
cfallin merged PR #18.
cfallin edited PR #18 from cranelift-2022
to main
:
This is a roadmap for work on Cranelift in 2022, highlighting areas of interest and outlining the major projects that we want to focus on in the next year. It is meant to be part guiding document, part aspirational collection of interesting ideas and possibilities. As with last year's roadmap, we likely won't accomplish all of it, though if we're fortunate we may achieve a good amount of it.
Thanks to @fitzgen for feedback on a draft, and many others (see Acknowledgments at end) for general input on these topics.
Apologies if this misses anything important -- I tried to be as comprehensive as possible, while still coalescing into general themes. I'm very curious what the community thinks and how folks feel about these ideas!
Last updated: Nov 22 2024 at 16:03 UTC