bnjbvr opened PR #2742 from mac-aarch64
to main
:
This bumps target-lexicon and adds support for the AppleAarch64 calling
convention. Specifically for WebAssembly support, we only have to worry
about the new stack slots convention. Stack slots don't need to be at
least 8-bytes, they can be as small as the data type's size. For
instance, if we need stack slots for (i32, i32), they can be located at
offsets (+0, +4). Note that they still need to be properly aligned on
the data type they're containing, though, so if we need stack slots for
(i32, i64), we can't start the i64 slot at the +4 offset (it must start
at the +8 offset).Added one test that was failing on the Mac M1, as well as other tests
stressing different yet similar situations.Fixes #2734.
(Note: more work will likely be needed to accommodate non-wasm uses: sign- or zero- extension of < 32 bits arguments + i128 registers proper register passing. Happy to try PRs and confirm they work or not here.)
bnjbvr updated PR #2742 from mac-aarch64
to main
.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
This bumps more than target-lexicon. Could the other updates be split into a new PR.
bnjbvr created PR Review Comment:
Yeah, just ran
cargo update
. I'm happy to do so, or revert the updates unrelated to target-lexicon. But in general patch bumps shouldn't bring in breaking changes (that's unfortunately not the case for this target-lexicon patch bump, which brings a breaking API change), what advantages do you see in doing so?
bnjbvr submitted PR Review.
bjorn3 created PR Review Comment:
When depending on new features of a dependency could you also update the dependency requirement in
Cargo.toml
? That would be useful for users like cg_clif that only update a single crate at a time and keep the rest pinned usingCargo.lock
.
bjorn3 submitted PR Review.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Could the fast and cold call conv use this too? They are unstable anyway and this saves stack usage.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
For
0.x.y
,y
bumps may still add features. Only breaking changes are not allowed.
bnjbvr updated PR #2742 from mac-aarch64
to main
.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
I've opened https://github.com/bytecodealliance/target-lexicon/issues/71 to discuss this, since in fact I think that adding the new enum variant is a breaking API change (this may break users' match statements, since the enum isn't marked non-exhaustive), but it's subtle. In any case, I've reverted the other packages updates.
alexcrichton submitted PR Review.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Mind adding an additional test here that uses
get_typed_func
to ensure that ABI of Rust-calling-wasm works as well?
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
sure!
cfallin submitted PR Review.
cfallin created PR Review Comment:
One thing I noticed here: "If the total number of bytes for stack-based arguments is not a multiple of 8 bytes, insert padding on the stack to maintain the 8-byte alignment requirements." However below we align the final stack-arg area size up to a 16-byte alignment.
(Related to this, my understanding is that the trap-on-not-16-aligned-SP behavior of aarch64 is configurable with a mode bit as well; maybe this means Apple runs with only an 8-aligned stack?)
I think this is OK as it should be fine to reserve extra space at a callsite, but we should document that we diverge and why it's OK (and verify that it is!).
cfallin submitted PR Review.
cfallin created PR Review Comment:
apple_aarch64
to be consistent here?
bnjbvr created PR Review Comment:
Maybe? It's a bit out of scope for this PR, so I'll try not to accidentally introduce new changes there. Plus, I am not sure if these conventions are used; I seem to recall that there fast implies the default calling convention in some cases, and if we're not being careful that might mean subtly breaking other calling conventions. We should probably audit the "fast"/"cold" calling conventions at some point, and re-design them from the ground up.
bnjbvr submitted PR Review.
cfallin submitted PR Review.
cfallin created PR Review Comment:
+1 -- I actually think we should do something about the "fast" and "cold" conventions, but we should be a little more explicit in designing them, and probably give them better names. I'm not a huge fan of generic terms like "fast ABI" because the ambiguity is confusing -- both on the user/embedder side ("what guarantees does this have? how fast is fast? when can I use it?") and on the implementer side (do we just choose an array of features that lead to better speed, and add more as we think of them? IMHO an evolving ABI is a recipe for subtle bugs as we mutate invariants over time).
So I'd rather design a "fast internal Cranelift ABI v1", implement it, and then keep that as a first-class, well-defined ABI alongside the others, and retire names like "fast" and "cold", just for clarity's sake. But, that's a deeper discussion for another day, I think!
bnjbvr updated PR #2742 from mac-aarch64
to main
.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Yes, and it maintains our invariant that the stack is always allocated in 16-bytes chunked, thus always aligned, which is nice. (Otherwise would require more changes when generating prologues and epilogues.)
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
I've bumped target-lexicon to 0.12.0, which is the new pre-major release for this new enum variant.
bnjbvr merged PR #2742.
Last updated: Dec 23 2024 at 12:05 UTC