Hi! My colleagues are telling me that I should align my fields in the structs of my language and I should probably listen to them. I can do the alignment myself, but I'd need to know what the alignment should be on a given architecture. Does cranelift expose this information anywhere?
Alignments for various types are not just architecture specific, but also OS specific. Cranelift doesn't have any knowledge about this. TargetFrontendConfig
only contains the default calling convention and the pointer size. target_lexicon::CDataModel
only contains the sizes for various primitive C types on the given target, not the alignments.
In rustc_codegen_cranelift I'm getting all this information from rustc itself which parses the LLVM data layout specifications stored in the target spec, but the relevant crate (rustc_target) doesn't compile on stable, so it probably won't be too useful for you.
rustc also handles a fair part of the calling convention for me. (cranelift doesn't have a way to represent struct arguments, instead expecting that the lowering to primitive types is done by the producer of clif ir)
@Terts Diepraam a pretty good heuristic to start with in a new design IMHO is "natural alignment": align each type to its own size (so u32s are 4-aligned, u64s are 8-aligned, etc). Most platforms/OSes are pretty close to that too, with weirdness occurring primarily around larger types (u128s, SIMD vectors) or unexpectedly smaller than usual alignments
Ah I see, that's unfortunate. Is there any interest in porting some of that information from that rust crate over to cranelift? I could maybe work on that with a bit of help, since I might need this anyway.
Also, I'm curious, what part of alignment is up to the OS?
Natural alignment is a good start. I'll start out with that! Thanks!
it can be up to the OS in the same sense that calling convention can be -- just a standard set for all programs/libraries to interop (e.g. under Windows the standard calling convention is fastcall and so DLLs and EXEs agree on that interface, whereas SysV is used on Linux). A "tool historian" could probably say more about why but my sense is it's a combination of some backward compat (platforms that evolved from smaller word sizes, etc) and arbitrary choices :-)
That does raise the point that it's completely up to you on the internal part of your system but one has to be a little careful if you have direct calling (FFI) to functions in e.g. libc or your runtime -- to match the layout expected by the system compiler
Right, I guess then mostly what I'm asking about is the FFI-less part, where it's mostly about performance of memory accesses. Nonetheless, good to know that this is something to keep in mind if I start doing FFI.
The natural alignment worked great! At least, my code now runs with the aligned
memflags set, so seems to be alright. I could still work on porting rustc_target (in some form) to cranelift if there's demand for that. Just let me know. I could open an issue if we need to discuss this further (where that could live, which parts are relevant etc.).
I've got another question :) I just realized that I hadn't thought about the alignment of the stack slot itself. Looking at the docs[1], I would assume that I would be able to pass the desired alignment to cranelift, but I don't see that option anywhere. But it also says:
For example, the alignment of these stack memory accesses can be inferred from the offsets and stack slot alignments.
Does that mean that it will be aligned automatically?
1: https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/docs/ir.md#explicit-stack-slots
Stack slots should be aligned according to natural alignment, I believe
(And if not in some case, that'd be a Cranelift bug)
Ah, interesting, they're actually only machine word-aligned: https://github.com/bytecodealliance/wasmtime/blob/91ec9a589cc6c7f031ef4cacdb295331c07b6063/cranelift/codegen/src/machinst/abi.rs#L1181-L1183 (so 8 bytes on all our 64-bit targets)
@bjorn3 does this cause problems for cg_clif with 128-bit values?
we could easily fix this if so
(or, probably should regardless)
We've had an issue open for a while to add the capability of supporting stack slot alignment
https://github.com/bytecodealliance/wasmtime/issues/6716
cg_clif gets around this by overallocating stack slots to align the values internally
https://github.com/rust-lang/rustc_codegen_cranelift/blob/64c73d0b3c4b91fee0e9a840be30e1d6faac7957/src/abi/pass_mode.rs#L187-L195
ah, there we go! so the full idea there is to take alignment as a parameter, but I see no reason not to do natural alignment as a baseline
which would solve the issue as well I think?
I think it would solve this particular instance, yes! But there are some other issues open that require for example 32byte alignment. But At least the 16byte would help here I think
(I haven't looked at this in a while, so @bjorn3 would probably be better qualified to answer)
Chris Fallin said:
bjorn3 does this cause problems for cg_clif with 128-bit values?
Yes! I am over allocating and manually aligning to work around this.
I guess natural alignment would work, but also might not be very precise. If I use a stack slot to create a record with a couple of i8
for example, the only alignment that I need is 1 byte right? (Still new to this, please correct me if I'm wrong :sweat_smile: )
Right, actually I'm just paging this bit of CLIF back in now and seeing the size is arbitrary, not one CLIF type (obviously in hindsight, for aggregates etc)
so really it does call for an alignment parameter
https://github.com/bytecodealliance/wasmtime/pull/8635
Hi! It's me again :)
I've finally hit a case where I need to align stack slots for 128 bit values. Is it correct that this is not (easily) possible yet because of https://github.com/bytecodealliance/wasmtime/issues/6716? If so, I could try to pick that up.
I believe it should be correct up to 16-byte (128-bit) alignment; it's limited by the stackframe alignment, as we align the offset from start of stackframe per the user's request but stackframe may not be (e.g.) 64-byte-aligned if that's what the user requests. But both x86-64 and aarch64 16-align stackframes
It seems like 16 bit alignment is not working. I have the following CLIF code as a reproduction:
function %foo() -> i64 {
ss0 = explicit_slot 8
ss1 = explicit_slot 32, align=16
block0():
v1 = stack_addr.i64 ss1
return v1
}
; run: %foo() == 0
(I don't want to check whether it's zero, just an error message so I can inspect the pointer)
When I run that with clif-util
, I see (for example) that the pointer is 140735794843864, which is 0x7ffc38674aa8 in hex. That's not aligned to 16 bytes right? I might very well be using this wrong, so please correct me if I'm wrong. This is on main
by the way, I just pulled with git.
In my actual application, I give pointers to stack slots to some Rust code, which panics with a message about unaligned data, which is why I'm interested in this in the first place.
hmm, indeed, that address isn't properly aligned... I don't have spare cycles to look at this at the moment but perhaps someone else does, or if you're interested in diving into the ABI code yourself...
I dove into it :big_smile: I think this should fix it: https://github.com/bytecodealliance/wasmtime/pull/9279
I'm working on a separate fix for larger alignments. Where I truncate rsp
to the largest requested alignment. I think that should work for x86_64 since rbp
is used to restore the frame there, but on aarch64 (and maybe others) that doesn't seem to be the case, so I'll need some help there. I'll post a draft PR for that in a bit.
I'll take a look, thanks a ton for diving in to this!
re: SP alignment on non-x86, I believe we do the "leaf function" optimization where if no other calls happen and no stack storage / spill slots need to be allocated, we don't save SP in FP; but I think/suspect that if we have stackslots, that should already be disabled, unless we've further optimized this since I last looked :-)
Last updated: Jan 24 2025 at 00:11 UTC