sezna opened issue #8730:
Hello, thank you very much for all of the incredible software y'all are writing. I'm currently using Cranelift to generate native binaries for my project, and I'm continually impressed at the level of detail and effort that goes into these things. Not to mention the super helpful Zulip chat.
In the aforementioned chat, I was receiving help from @bjorn3 and @cfallin on generating executables for an Apple silicon Mac. The following code does not successfully execute on an M-series processor Mac: https://gist.github.com/sezna/e358a9167d4ef3063f6cfe4ce4c2ad37
Using the legacy linker, that is, adding
-Wl -ld_classic
to the clang invocation, _does_ work. However, as the legacy linker will likely be going away someday, I'm filing this issue in the hopes that we can figure out why it doesn't work with the current linker. Note thatcg_clif
also uses-Wl -ld_classic
, so when the legacy linker is dropped,cg_clif
could also have issues.
philipc commented on issue #8730:
I've reproduced the issue with arm64. If I change to use x86_64 it works with either linker.
The linker error is:
0 0x100fa2f2c __assert_rtn + 72 1 0x100ee9ec4 ld::InputFiles::SliceParser::parseObjectFile(mach_o::Header const*) const + 22976 2 0x100ef6404 ld::InputFiles::parseAllFiles(void (ld::AtomFile const*) block_pointer)::$_7::operator()(unsigned long, ld::FileInfo const&) const + 420 3 0x1975dc440 _dispatch_client_callout2 + 20 4 0x1975f1544 _dispatch_apply_invoke_and_wait + 224 5 0x1975f084c _dispatch_apply_with_attr_f + 1180 6 0x1975f0a38 dispatch_apply + 96 7 0x100f741e0 ld::AtomFileConsolidator::parseFiles(bool) + 292 8 0x100f12b08 main + 9252 ld: Assertion failed: (pattern[0].addrMode == addr_other), function addFixupFromRelocations, file Relocations.cpp, line 700. clang: error: linker command failed with exit code 1 (use -v to see invocation)
clang generates relocations like this:
18: 90000000 adrp x0, 0x0 <ltmp0+0x18> 0000000000000018: ARM64_RELOC_PAGE21 l_.str 1c: 91000000 add x0, x0, #0 000000000000001c: ARM64_RELOC_PAGEOFF12 l_.str 20: 94000000 bl 0x20 <ltmp0+0x20> 0000000000000020: ARM64_RELOC_BRANCH26 _puts
cranelift generates:
1c: 90000000 adrp x0, 0x0 <_main+0xc> 000000000000001c: ARM64_RELOC_GOT_LOAD_PAGE21 _hello_world 20: f9400000 ldr x0, [x0] 0000000000000020: ARM64_RELOC_GOT_LOAD_PAGEOFF12 _hello_world 24: 90000002 adrp x2, 0x0 <_main+0x14> 0000000000000024: ARM64_RELOC_GOT_LOAD_PAGE21 _puts 28: f9400042 ldr x2, [x2] 0000000000000028: ARM64_RELOC_GOT_LOAD_PAGEOFF12 _puts 2c: d63f0040 blr x2
If I hack cranelift to generate this:
1c: 90000000 adrp x0, 0x0 <_main+0xc> 000000000000001c: ARM64_RELOC_PAGE21 _hello_world 20: 91000000 add x0, x0, #0 0000000000000020: ARM64_RELOC_PAGEOFF12 _hello_world 24: 90000002 adrp x2, 0x0 <_main+0x14> 0000000000000024: ARM64_RELOC_PAGE21 _puts 28: 91000042 add x2, x2, #0 0000000000000028: ARM64_RELOC_PAGEOFF12 _puts 2c: d63f0040 blr x2
then the linker error changes to:
ld: invalid use of ADRP in '_main' to '_puts'
If I hack cranelift to generate this:
1c: 90000000 adrp x0, 0x0 <_main+0xc> 000000000000001c: ARM64_RELOC_PAGE21 _hello_world 20: 91000000 add x0, x0, #0 0000000000000020: ARM64_RELOC_PAGEOFF12 _hello_world 24: 94000000 bl 0x24 <_main+0x14> 0000000000000024: ARM64_RELOC_BRANCH26 _puts
then it links and runs successfully.
Complete hack:
diff --git a/cranelift/codegen/src/isa/aarch64/abi.rs b/cranelift/codegen/src/isa/aarch64/abi.rs index 922e4ebfc..ea0bc22f3 100644 --- a/cranelift/codegen/src/isa/aarch64/abi.rs +++ b/cranelift/codegen/src/isa/aarch64/abi.rs @@ -1027,7 +1027,8 @@ impl ABIMachineSpec for AArch64MachineDeps { ) -> SmallVec<[Inst; 2]> { let mut insts = SmallVec::new(); match &dest { - &CallDest::ExtName(ref name, RelocDistance::Near) => insts.push(Inst::Call { + //&CallDest::ExtName(ref name, RelocDistance::Near) => insts.push(Inst::Call { + &CallDest::ExtName(ref name, _) => insts.push(Inst::Call { info: Box::new(CallInfo { dest: name.clone(), uses, @@ -1039,6 +1040,7 @@ impl ABIMachineSpec for AArch64MachineDeps { callee_pop_size, }), }), +/* &CallDest::ExtName(ref name, RelocDistance::Far) => { insts.push(Inst::LoadExtName { rd: tmp, @@ -1058,6 +1060,7 @@ impl ABIMachineSpec for AArch64MachineDeps { }), }); } +*/ &CallDest::Reg(reg) => insts.push(Inst::CallInd { info: Box::new(CallIndInfo { rn: *reg, diff --git a/cranelift/codegen/src/isa/aarch64/inst/emit.rs b/cranelift/codegen/src/isa/aarch64/inst/emit.rs index f9df1a8d2..0f89903a4 100644 --- a/cranelift/codegen/src/isa/aarch64/inst/emit.rs +++ b/cranelift/codegen/src/isa/aarch64/inst/emit.rs @@ -3177,6 +3177,7 @@ impl MachInstEmit for Inst { let inst = Inst::Adrp { rd, off: 0 }; inst.emit(sink, emit_info, state); +/* // ldr rd, [rd, :got_lo12:X] sink.add_reloc(Reloc::Aarch64Ld64GotLo12Nc, &**name, 0); let inst = Inst::ULoad64 { @@ -3185,6 +3186,17 @@ impl MachInstEmit for Inst { flags: MemFlags::trusted(), }; inst.emit(sink, emit_info, state); +*/ + // add rd, rd, :lo12:X + sink.add_reloc(Reloc::Aarch64Ld64GotLo12Nc, &**name, 0); + Inst::AluRRImm12 { + alu_op: ALUOp::Add, + size: OperandSize::Size64, + rd, + rn: rd.to_reg(), + imm12: Imm12::maybe_from_u64(0).unwrap(), + } + .emit(sink, emit_info, state); } else { // With absolute offsets we set up a load from a preallocated space, and then jump // over it. diff --git a/cranelift/object/src/backend.rs b/cranelift/object/src/backend.rs index 2d48e9b61..921c5273e 100644 --- a/cranelift/object/src/backend.rs +++ b/cranelift/object/src/backend.rs @@ -764,7 +764,8 @@ impl ObjectModule { r_type: object::elf::R_AARCH64_ADR_GOT_PAGE, }, object::BinaryFormat::MachO => RelocationFlags::MachO { - r_type: object::macho::ARM64_RELOC_GOT_LOAD_PAGE21, + //r_type: object::macho::ARM64_RELOC_GOT_LOAD_PAGE21, + r_type: object::macho::ARM64_RELOC_PAGE21, r_pcrel: true, r_length: 2, }, @@ -775,7 +776,8 @@ impl ObjectModule { r_type: object::elf::R_AARCH64_LD64_GOT_LO12_NC, }, object::BinaryFormat::MachO => RelocationFlags::MachO { - r_type: object::macho::ARM64_RELOC_GOT_LOAD_PAGEOFF12, + //r_type: object::macho::ARM64_RELOC_GOT_LOAD_PAGEOFF12, + r_type: object::macho::ARM64_RELOC_PAGEOFF12, r_pcrel: false, r_length: 2, },
So it appears that cranelift isn't generating the correct relocations for ARM64 on macOS. I'll let those who know more about this stuff fix it properly.
bjorn3 commented on issue #8730:
A simple
#[inline(never)] pub unsafe fn foo() -> (unsafe extern "C" fn(), &'static u8) { bar(); (bar, &FOO) } extern "C" { fn bar(); static FOO: u8; }
produces
rust_out.o: file format mach-o arm64 Disassembly of section __TEXT,__text: 0000000000000000 <ltmp0>: 0: a9bf7bfd stp x29, x30, [sp, #-16]! 4: 910003fd mov x29, sp 8: 94000000 bl 0x8 <ltmp0+0x8> 0000000000000008: ARM64_RELOC_BRANCH26 _bar c: 90000000 adrp x0, 0x0 <ltmp0+0xc> 000000000000000c: ARM64_RELOC_GOT_LOAD_PAGE21 _bar 10: f9400000 ldr x0, [x0] 0000000000000010: ARM64_RELOC_GOT_LOAD_PAGEOFF12 _bar 14: 90000001 adrp x1, 0x0 <ltmp0+0x14> 0000000000000014: ARM64_RELOC_GOT_LOAD_PAGE21 _FOO 18: f9400021 ldr x1, [x1] 0000000000000018: ARM64_RELOC_GOT_LOAD_PAGEOFF12 _FOO 1c: a8c17bfd ldp x29, x30, [sp], #16 20: d65f03c0 ret
with the LLVM backend. If I try to link it I get a linker error as expected:
Undefined symbols for architecture arm64: "_FOO", referenced from: rust_out::foo::he5a9b1ed9cb33bc0 in rust_out.o "_bar", referenced from: rust_out::foo::he5a9b1ed9cb33bc0 in rust_out.o rust_out::foo::he5a9b1ed9cb33bc0 in rust_out.o "_main", referenced from: <initial-undefines> ld: symbol(s) not found for architecture arm64
If I apply the following patch to Cranelift:
diff --git a/cranelift/codegen/src/isa/aarch64/abi.rs b/cranelift/codegen/src/isa/aarch64/abi.rs index 922e4ebfc..6d43c1db6 100644 --- a/cranelift/codegen/src/isa/aarch64/abi.rs +++ b/cranelift/codegen/src/isa/aarch64/abi.rs @@ -1027,7 +1027,7 @@ impl ABIMachineSpec for AArch64MachineDeps { ) -> SmallVec<[Inst; 2]> { let mut insts = SmallVec::new(); match &dest { - &CallDest::ExtName(ref name, RelocDistance::Near) => insts.push(Inst::Call { + &CallDest::ExtName(ref name, _/*RelocDistance::Near*/) => insts.push(Inst::Call { info: Box::new(CallInfo { dest: name.clone(), uses,
Relocations that print identical by
objdump -dr
are generated by Cranelift:rust_out.o: file format mach-o arm64 Disassembly of section __TEXT,__text: 0000000000000000 <__ZN8rust_out3foo17he5a9b1ed9cb33bc0E>: 0: a9bf7bfd stp x29, x30, [sp, #-16]! 4: 910003fd mov x29, sp 8: d10043ff sub sp, sp, #16 c: 94000000 bl 0xc <__ZN8rust_out3foo17he5a9b1ed9cb33bc0E+0xc> 000000000000000c: ARM64_RELOC_BRANCH26 _bar 10: 90000000 adrp x0, 0x0 <__ZN8rust_out3foo17he5a9b1ed9cb33bc0E+0x10> 0000000000000010: ARM64_RELOC_GOT_LOAD_PAGE21 _bar 14: f9400000 ldr x0, [x0] 0000000000000014: ARM64_RELOC_GOT_LOAD_PAGEOFF12 _bar 18: 90000007 adrp x7, 0x0 <__ZN8rust_out3foo17he5a9b1ed9cb33bc0E+0x18> 0000000000000018: ARM64_RELOC_GOT_LOAD_PAGE21 _FOO 1c: f94000e7 ldr x7, [x7] 000000000000001c: ARM64_RELOC_GOT_LOAD_PAGEOFF12 _FOO 20: 910003e8 mov x8, sp 24: f9000107 str x7, [x8] 28: 910003e8 mov x8, sp 2c: f9400101 ldr x1, [x8] 30: 910043ff add sp, sp, #16 34: a8c17bfd ldp x29, x30, [sp], #16 38: d65f03c0 ret
Yet the object file produced by LLVM gives a regular linker error due to undefined symbols, while the one produced by Cranelift crashes the linker:
0 0x100e02074 __assert_rtn + 72 1 0x100d3edb8 ld::InputFiles::SliceParser::parseObjectFile(mach_o::Header const*) const + 22712 2 0x100d4b830 ld::InputFiles::parseAllFiles(void (ld::AtomFile const*) block_pointer)::$_8::operator()(unsigned long, ld::FileInfo const&) const + 440 3 0x189fa6428 _dispatch_client_callout2 + 20 4 0x189fba850 _dispatch_apply_invoke3 + 336 5 0x189fa63e8 _dispatch_client_callout + 20 6 0x189fa7c68 _dispatch_once_callout + 32 7 0x189fbaeec _dispatch_apply_invoke_and_wait + 372 8 0x189fb9e9c _dispatch_apply_with_attr_f + 1212 9 0x189fba08c dispatch_apply + 96 10 0x100dd0564 ld::AtomFileConsolidator::parseFiles(bool) + 292 11 0x100d6bee8 main + 9532 ld: Assertion failed: (pattern[0].addrMode == addr_other), function addFixupFromRelocations, file Relocations.cpp, line 701.
This means there has to be a difference somewhere. If I run bingrep on both binaries, the relocations are shown in opposite order for both executable. For LLVM:
Relocations(5): Segment Section Count __TEXT __text 5 Type Offset Length PIC Extern SymbolNum Symbol ARM64_RELOC_GOT_LOAD_PAGEOFF12 0x18 2 false true 0x2 _FOO ARM64_RELOC_GOT_LOAD_PAGE21 0x14 2 true true 0x2 _FOO ARM64_RELOC_GOT_LOAD_PAGEOFF12 0x10 2 false true 0x3 _bar ARM64_RELOC_GOT_LOAD_PAGE21 0xc 2 true true 0x3 _bar ARM64_RELOC_BRANCH26 0x8 2 true true 0x3 _bar
and for Cranelift:
Relocations(5): Segment Section Count __TEXT __text 5 Type Offset Length PIC Extern SymbolNum Symbol ARM64_RELOC_BRANCH26 0xc 2 true true 0x2 _bar ARM64_RELOC_GOT_LOAD_PAGE21 0x10 2 true true 0x2 _bar ARM64_RELOC_GOT_LOAD_PAGEOFF12 0x14 2 false true 0x2 _bar ARM64_RELOC_GOT_LOAD_PAGE21 0x18 2 true true 0x1 _FOO ARM64_RELOC_GOT_LOAD_PAGEOFF12 0x1c 2 false true 0x1 _FOO
As hack I tried to reverse the order of relocations emitted by Cranelift too, which worked.
diff --git a/cranelift/codegen/src/isa/aarch64/abi.rs b/cranelift/codegen/src/isa/aarch64/abi.rs index 922e4ebfc..6d43c1db6 100644 --- a/cranelift/codegen/src/isa/aarch64/abi.rs +++ b/cranelift/codegen/src/isa/aarch64/abi.rs @@ -1027,7 +1027,7 @@ impl ABIMachineSpec for AArch64MachineDeps { ) -> SmallVec<[Inst; 2]> { let mut insts = SmallVec::new(); match &dest { - &CallDest::ExtName(ref name, RelocDistance::Near) => insts.push(Inst::Call { + &CallDest::ExtName(ref name, _/*RelocDistance::Near*/) => insts.push(Inst::Call { info: Box::new(CallInfo { dest: name.clone(), uses, diff --git a/cranelift/object/src/backend.rs b/cranelift/object/src/backend.rs index 2d48e9b61..c5a84f14c 100644 --- a/cranelift/object/src/backend.rs +++ b/cranelift/object/src/backend.rs @@ -501,7 +501,7 @@ impl ObjectModule { ref name, flags, addend, - } in &symbol.relocs + } in symbol.relocs.iter().rev() { let target_symbol = self.get_symbol(name); self.object
bjorn3 commented on issue #8730:
Looks like the change to cranelift/codegen/src/isa/aarch64/abi.rs isn't even necessary at all. Only the relocation order reversing is required.
philipc commented on issue #8730:
That sounds better. I was comparing clang output instead of looking at what rustc with llvm backend did, so you can probably ignore my results.
bjorn3 commented on issue #8730:
What would be the best place to add this relocation reversing code? object or cranelift-object? On the one hand it is something that would affect other projects too, so object makes more sense, on the other hand reversing the relocation order right before writing would require anyone who wants to round trip an object file through the object crate to reverse relocation order again before the
add_relocation
calls to produce a working object file.
philipc commented on issue #8730:
LLVM does the reverse in its object writer. I'm not certain which place is best, for the same reasons you give, but I'm inclined to fix it in
object
, since its primary use is for generating objects, not for round trips. It already does at least one other change that round trips need to account for (symbol name decorations).
bjorn3 commented on issue #8730:
bjorn3 commented on issue #8730:
This has been fixed in object 0.36.1.
sezna commented on issue #8730:
Thank you all who helped!
sezna edited a comment on issue #8730:
Thank you all who looked into this!
bjorn3 commented on issue #8730:
This issue can be closed now, right?
camelid commented on issue #8730:
I'm now getting
ld: warning: no platform load command found in '/path/to/cranelift/output.o', assuming: macOS
. This happens only with the default modern linker, but not the legacy linker. The executable still works, but it's unfortunate to get this warning. It seems like Julia encountered a similar issue and fixed it by including the macOS version number in the OS segment of the triple: julialang/julia#51830. I'm usingtarget_lexicon::Triple::host
, so it could be an issue with that crate rather than cranelift?
bjorn3 commented on issue #8730:
We will need to call set_macho_build_version in cranelift-object to emit the LC_BUILD_VERSION load command.
camelid commented on issue #8730:
It seems like rustc has implemented a fix for the same issue: rust-lang/rust#111384.
camelid commented on issue #8730:
It's unclear to me if cranelift-object should be exposing some function to set the macho build version or if it should handle it automatically. Do we have enough information in cranelift-object to target the exact version?
madsmtm commented on issue #8730:
I've looked into this recently, see https://github.com/rust-lang/rust/issues/129432 for the broader issue of deployment targets and SDKs.
I'm pretty sure the correct behaviour for Cranelift would be to set the values on
MachOBuildVersion
as follows:
platform
, depending on the target triple'sOperatingSystem
andEnvironment
.minos
(deployment target), extracted from target_lexicon::OperatingSystem::MacOSX.
- Will need some work in
target_lexicon
to better support iOS/tvOS/watchOS/visionOS here too.- I think you might need to error if the version isn't present in the given triple?
- The system tooling (linker, assembler,
otool
etc.) requires a version here.- At the very least don't emit the
LC_BUILD_VERSION
command in this case, and warn very strongly.sdk
(SDK version) to0
. This is also what LLVM does by default if no SDK version is given, and the tooling showsn/a
in that case.Note that this will not support older tooling that uses
LC_VERSION_MIN_IPHONEOS
,LC_VERSION_MIN_MACOSX
etc. (we're talking somewhere around Xcode 9 or 10), but that's probably _fine_, we still support older OS versions the linker is going to patch it up such that you can still run the binaries on older OS versions.
madsmtm edited a comment on issue #8730:
I've looked into this recently, see https://github.com/rust-lang/rust/issues/129432 for the broader Rust issue of deployment targets and SDKs.
I'm pretty sure the correct behaviour for Cranelift would be to set the values on
MachOBuildVersion
as follows:
platform
, depending on the target triple'sOperatingSystem
andEnvironment
.minos
(deployment target), extracted from target_lexicon::OperatingSystem::MacOSX.
- Will need some work in
target_lexicon
to better support iOS/tvOS/watchOS/visionOS here too.- I think you might need to error if the version isn't present in the given triple?
- The system tooling (linker, assembler,
otool
etc.) requires a version here.- At the very least don't emit the
LC_BUILD_VERSION
command in this case, and warn very strongly.sdk
(SDK version) to0
. This is also what LLVM does by default if no SDK version is given, and the tooling showsn/a
in that case.Note that this will not support older tooling that uses
LC_VERSION_MIN_IPHONEOS
,LC_VERSION_MIN_MACOSX
etc. (we're talking somewhere around Xcode 9 or 10), but that's probably _fine_, we still support older OS versions the linker is going to patch it up such that you can still run the binaries on older OS versions.
madsmtm commented on issue #8730:
It's unclear to me if cranelift-object should be exposing some function to set the macho build version or if it should handle it automatically. Do we have enough information in cranelift-object to target the exact version?
So to answer this directly, Cranelift should have enough information to set this _if_ it requires the deployment target / minimum OS version to be set in the target triple (e.g. that users pass
aarch64-apple-macosx12.0.0
, and not justaarch64-apple-macosx
).
bjorn3 commented on issue #8730:
Cranelift uses target-lexicon for target triples, which aims to math rustc. Rustc target triples do not allow passing in the OS version. Instead rustc has a default for each target and additionally allows setting the same env vars that clang reads. There is no way currently to pass this information to Cranelift.
madsmtm commented on issue #8730:
Cranelift uses target-lexicon for target triples, which aims to math rustc. Rustc target triples do not allow passing in the OS version. Instead rustc has a default for each target and additionally allows setting the same env vars that clang reads. There is no way currently to pass this information to Cranelift.
Hmm, isn't
rustc
passing the LLVM target to Cranelift? https://github.com/rust-lang/rust/blob/612796c42077605fdd3c6f7dda05745d8f4dc4d8/compiler/rustc_codegen_cranelift/src/lib.rs#L262
bjorn3 commented on issue #8730:
Yes, but as it turns out target-lexicon is just lenient with parsing to parse most LLVM triples fine too. I'm pretty sure it would parse aarch64-apple-macosx12.0.0 as having the OS field set to macosx12.0.1, which is not an OS known to rustc or target-lexicon and thus would fail parsing.
madsmtm commented on issue #8730:
Yes, but as it turns out target-lexicon is just lenient with parsing to parse most LLVM triples fine too. I'm pretty sure it would parse aarch64-apple-macosx12.0.0 as having the OS field set to macosx12.0.1, which is not an OS known to rustc or target-lexicon and thus would fail parsing.
Nope, they do parse the version number, see https://github.com/bytecodealliance/target-lexicon/blob/7c80d459a9fdd121e9f23feb680c3db13c1baa39/src/targets.rs#L1393-L1421.
In any case, I have opened https://github.com/bytecodealliance/target-lexicon/issues/111 for figuring out how to resolve the
rustc
/LLVM discrepancy intarget-lexicon
.
But if it is as you say, that Cranelift wants to accept
rustc
target triples, then it needs _some_ way to specify the deployment target on Apple platforms too.A simple way to do this would be to add
ObjectBuilder::apple_deployment_target(&mut self, major: u16, minor: u8, patch: u8) -> &mut Self
, but I'd argue that it might be worthwhile to add such a setter to theTriple
itself directly? At least, if Cranelift wants to at some point do an optimization based on the deployment target (which LLVM does all the time, though mostly for Objective-C code), then it'd be nice to have available earlier in the compilation pipeline.
Last updated: Nov 22 2024 at 17:03 UTC