Hello all!
I've been trying out cranelift for codegen and really like it. I'm able to produce CLIF and verify it with clif-util
. I'm now trying to figure out how to take the results and produce an executable on aarch64 MacOS. I was wondering if anybody here had any wisdom or perhaps a minimal example of what's required to just generate a main function that returns 0 or 1, or similar.
I've used cranelift_object to create an object file, and verified with hexdump that it is not empty (although I'm not sure how to verify the content), but when I try to link it with clang -o output output.o
, I get:
ld: warning: no platform load command found in 'codegen/output.o', assuming: macOS
Undefined symbols for architecture arm64:
"_main", referenced from:
<initial-undefines>
ld: symbol(s) not found for architecture arm64
I interpreted this as "no main function exists".
This is my current minimal code for generating an object file: https://gist.github.com/sezna/9bbd1001a3170adcb99e6d524830706b
Thanks for your time and help. I look forward to getting deeper with cranelift.
You need to define the main
function, not _main
on the Cranelift side. Mach-O adds an _
prefix to all exported symbols, but you still need to use the unprefixed name on the Cranelift side.
Oh my goodness that did it. Hours were spent on this last night, haha. Thank you so much!
Ok, i have another question, but I fear it might be dumb/misinformed. I'm probably missing some basic knowledge here.
Using cranelift IR, how do I produce a binary which has access to stdout? I'm not too worried about different arches and OSs yet, I'm just on aarch64 macos. I'm looking to compile a println-type function into cranelift IR.
I tried to search the docs, but to be honest, I'm not sure what I'm looking for -- I've never interacted with stdout at this low of a level before, without a VM, so I'm not even sure what type of interface to look for.
If anybody here has any idea how this works, or has any pointers to resources about this kind of thing, I'd greatly appreciate it. Thanks.
This is an operating-system-specific question, but each OS will define a canonical way to access its set of system calls; stdout is ultimately an open file descriptor that one writes to using system calls
On macOS, the usual way is to link to (iirc) libSystem; libc in turn uses that so one can link to libc and call the C-level functions as well if one wants
On Linux, for example, it's a bit different: the kernel guarantees that the syscall ABI is stable, so one can write syscall
instructions (on x86-64,or svc
on aarch64, or ...) directly with the appropriate values
Cranelift doesn't give an interface for the latter, it expects you to emit calls to outside functionality
so a safe cross-platform bet would be to expect to link to libc directly, or else write a "runtime library" for your language that you call and that in turn does whatever needed
(the reason none of this is described in Cranelift documentation is because the operating system is an orthogonal concept to the compiler; Cranelift only knows about "calls to some other function")
It makes sense that it is orthogonal. I wasn't sure if there'd be an abstraction layer here or not. So if I link libc when I invoke clang, and I lower println calls to cranelift external function calls...that...should work? Do you know if there's an example of that kind of linking to libc anywhere floating around? Not sure what the calling convention is.
Once again...might be out of my depth here...but hey this is how we learn. Thanks for your time and response by the way.
For sure happy to help; I don't know of any examples off the top of my head (maybe others do?). But yeah, calling arbitrary functions in libc should just work, provided the linking details are done right (func defined as external, gets emitted as relocation, etc). printf
will be tricky because we don't support the variadic calling convention but you could call puts
, or write your own runtime function in C and call that
My language is not rust-like, and it doesn't support variadic macros, i'm just hoping be able to print out ASCII (unicode if I'm ambitious?) given a pointer and len. Nothing fancier than that (although as I understand it, that's apparently pretty fancy :) )
if you want to use a ptr/len, using write
or fwrite
may be better than using puts
since that way you don't need to come up with a nul-terminated string
if you're on non-windows normal unix, printing unicode is easy, just write utf-8 just like any other text
(if you're on an ebcdic system, i feel sorry for you)
with windows you may have to use special functions and/or put special things in your .exe's manifest because it's weird like that
if you use write
don't forget to retry it if you get EINTR
Thanks @Jacob Lifshay ! For now I'm still figuring out how to call libc functions. I've been bouncing around the API in the rust docs.
I thought I'd use import_function, and that requires an ExternalName. Of the options for ExternalName, it seems LibCall and KnownSymbol are a no-go (they're not present in the docs), I'd rather not use the one for testing only, so that leaves UserExternalNameRef
to be created via declare_imported_user_function
. Does this sound like I'm barking up the correct tree?
If so, my next question follows pretty quickly. I'm not sure how to provide a user-defined symbol table corresponding to libc
...
So this runs into the layering of functionality: cranelift-codegen
is the compiler core, and knows about "references to external functions", but basically just reflects those right back out the other end in the relocations. It's intentionally not knowledgeable about any particular symbol or object file format; it just says "these bytes here are your machine code for this function, and patch in the address of what you call function 23
here"
The next layer up includes at least cranelift-object
and cranelift-jit
, for generation of object files and in-process JIT, respectively; each of them have their own ways of defining references to external symbols (or in-process symbols) and using them
or perhaps cranelift-module
for AOT, I always mix them up; I think what you want is declare_function
(with "external linkage")
Oh thank you. That makes sense. Is it expected that most cranelift consumers will use the JIT? I thought "I'm not building a JIT compiler so I don't need that" -- but most resources reference it.
I was able to get this to compile, but I'm producing invalid binaries. I am guessing this is because I messed up the data section?
If anybody has the time, here's exactly what I've come up with: https://gist.github.com/sezna/e358a9167d4ef3063f6cfe4ce4c2ad37
A lot of the API calls I'm guessing on, because I don't fully understand the docs. Do you happen to see anything obviously wrong?
FWIW, my error is:
ld: warning: alignment (4) of atom '_main' from '/Users/alexanderhansen/code/swim/swim-codegen/output.o' is too small and may result in unaligned pointers
ld: warning: no platform load command found in '/Users/alexanderhansen/code/swim/swim-codegen/output.o', assuming: macOS
ld: building fixups: pointer not aligned at _main+0x14 from /Users/alexanderhansen/code/swim/swim-codegen/output.o
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Alignment is part of the function metadata, I think the module builder has a way of specifying that? it looks like that's the only error (I'm not sure what the 'platform load command' warning implies or how to fix it)
re: using cranelift-jit vs. cranelift-module, both are valid uses, just depends on whether you want to run code right away or produce a .o / binary! the good news is that the majority of the logic you need to write (the CLIF generation) is the same either way so you can build out both eventually...
Things were "working" (producing a binary that NOOPs) with the "no platform command" thing, so i'm not concerned about that -- the new thing is the clang error and I think it's coming ffrom "building fixups"
So i think ld: building fixups: pointer not aligned at _main+0x14 from codegen/output.o
is what's getting me now
-v
on clang
unfortunately didn't give any new information
can you disassemble (objdump -d if you have GNU binutils, otherwise otool -tv on macOS) and see what's at that offset?
Ohhh, a new tool, yay.
output.o: file format mach-o arm64
Disassembly of section __TEXT,__text:
0000000000000010 <_main>:
10: d503237f pacibsp
14: a9bf7bfd stp x29, x30, [sp, #-16]!
18: 910003fd mov x29, sp
1c: 58000040 ldr x0, 0x24 <_main+0x14>
20: 14000003 b 0x2c <_main+0x1c>
...
2c: 58000042 ldr x2, 0x34 <_main+0x24>
30: 14000003 b 0x3c <_main+0x2c>
...
3c: d63f0040 blr x2
40: a8c17bfd ldp x29, x30, [sp], #16
44: d65f0fff retab
huh, weird, _main+0x14
is 0x24
, just after the branch at 0x20
; whatever it is is unconditionally branched around
the disassembler seems to be too smart, skipping dead code; I'm not sure if there's an option to show it?
I don't see anything in the module builder, but is this what you were referring to? https://docs.rs/cranelift-codegen/latest/cranelift_codegen/isa/struct.FunctionAlignment.html
maybe some combination of -D
(disassemble all) and --disassemble-zeroes
?
ah, yes, exactly, that's what we need to set somehow
reading objdump --help
...one sec, i'll figure it out
for what it is worth, here's otool
's output:
output.o:
(__TEXT,__text) section
_main:
0000000000000010 pacibsp
0000000000000014 stp x29, x30, [sp, #-0x10]!
0000000000000018 mov x29, sp
000000000000001c ldr x0, #0x8
0000000000000020 b 0x2c
0000000000000024 udf #0x0
0000000000000028 udf #0x0
000000000000002c ldr x2, #0x8
0000000000000030 b 0x3c
0000000000000034 udf #0x0
0000000000000038 udf #0x0
000000000000003c blr x2
0000000000000040 ldp x29, x30, [sp], #0x10
0000000000000044 retab
the --help
on objdump
is immense, give me a minute to read through it haha
objdump -Dd --disassemble-zeroes output.o
:
output.o: file format mach-o arm64
Disassembly of section __TEXT,__const:
0000000000000000 <_hello_world>:
0: 6c6c6548 ldnp d8, d25, [x10, #-320]
4: 77202c6f <unknown>
8: 646c726f <unknown>
c: 21 0a 00 <unknown>
Disassembly of section __TEXT,__text:
0000000000000010 <_main>:
10: d503237f pacibsp
14: a9bf7bfd stp x29, x30, [sp, #-16]!
18: 910003fd mov x29, sp
1c: 58000040 ldr x0, 0x24 <_main+0x14>
20: 14000003 b 0x2c <_main+0x1c>
24: 00000000 udf #0
28: 00000000 udf #0
2c: 58000042 ldr x2, 0x34 <_main+0x24>
30: 14000003 b 0x3c <_main+0x2c>
34: 00000000 udf #0
38: 00000000 udf #0
3c: d63f0040 blr x2
40: a8c17bfd ldp x29, x30, [sp], #16
44: d65f0fff retab
ok, udf
("undefined", intentionally traps, used to implement unreachable
sometimes)
not sure why that would cause ld problems (it shouldn't require any fixups?)
at this point it might be necessary to page @bjorn3 to the courtesy phone for help (they have done way more with object-file generation in cg_clif than I have, I only write core compiler bits, sorry!)
I don't think I have the sufficient knowledge to debug this. Is a fixup just a pass that fixes invalid code generation? And I'm guessing udf
is just there because those are null bytes, maybe padding for alignment?
No need to say sorry...you've been super helpful. This zulip page is awesome
Here I was debating if it was worth it to make an account just to ask a question
patiently awaits bjorn3
Also, is the alignment we are referring to here the offset counter on the left? I notice it is incrementing by 4 in hex -- not sure if that's just the word size, or if that's a property of the alignment
alignment in this context could either mean alignment of a location being patched, which as you've observed on aarch64 is always at least 4-aligned because instructions are all 32 bits; or it could be alignment of some value being patched in (also I'm assuming that fixup means something like a relocation, I don't actually know though)
and, yes indeed, I'm remembering now that the encoding on aarch64 for udf is zeroes so ... yep, those are just zeroes
oh! you know what, these may be PLT relocations
can you dump relocs from objdump too?
--reloc
I think
(the udfs in that case would be placeholders for other instructions to be patched in)
with --reloc
:
output.o: file format mach-o arm64
Disassembly of section __TEXT,__const:
0000000000000000 <_hello_world>:
0: 6c6c6548 ldnp d8, d25, [x10, #-320]
4: 77202c6f <unknown>
8: 646c726f <unknown>
c: 21 0a 00 <unknown>
Disassembly of section __TEXT,__text:
0000000000000010 <_main>:
10: d503237f pacibsp
14: a9bf7bfd stp x29, x30, [sp, #-16]!
18: 910003fd mov x29, sp
1c: 58000040 ldr x0, 0x24 <_main+0x14>
20: 14000003 b 0x2c <_main+0x1c>
24: 00000000 udf #0
0000000000000024: ARM64_RELOC_UNSIGNED _hello_world
28: 00000000 udf #0
2c: 58000042 ldr x2, 0x34 <_main+0x24>
30: 14000003 b 0x3c <_main+0x2c>
34: 00000000 udf #0
0000000000000034: ARM64_RELOC_UNSIGNED _puts
38: 00000000 udf #0
3c: d63f0040 blr x2
40: a8c17bfd ldp x29, x30, [sp], #16
44: d65f0fff retab
Relocations are just moving bits of code around to be more optimal, I'm guessing?
Chris Fallin said:
and, yes indeed, I'm remembering now that the encoding on aarch64 for udf is zeroes so ... yep, those are just zeroes
ah yeah I was reading the hex code output. Thanks for all the explanations btw, I'm having a great time working through this
ah, a relocation is a record that says "patch in a reference to this thing I don't have the address of yet", and the relocation kind says how to patch
so ARM64_RELOC_UNSIGNED
here has some definition of semantics ("add the offset" or "give a relative offset" or "absolute address" or ... I'm not sure, it's defined in some tool conventions spec) and it means patch the machine code there to refer to the given symbol
a linker's main job is to concatenate object files then process all the relocations so code properly refers to stuff in other object files
Chris Fallin said:
a linker's main job is to concatenate object files then process all the relocations so code properly refers to stuff in other object files
I have wondered this for literally a decade, and it just clicked haha. Linking has always been magic to me
Seeing the actual objdump plus your explanation makes it make sense
cool, yep, no magic only grungy details :-)
it's possible we have the wrong kind of relocation here, or maybe teh kind specified to cranelift-module is wrong, I'm not sure
So it is unable to find either _hello_world
or _puts
, then?
And the question now is why?
my strategy at this point would be to objdump a file produced by compiling C that invokes some external func, and see what kind of reloc that produces, then work backward (look in the Cranelift source to see where the reloc kind comes from)
unable to find or unable to process; it's possible those reloc kinds have other requirements (at a guess, alignment) which we don't satisfy either because we're accidentally generating the wrong kind of reloc or ... some other reason
I see. Uhhhh. Another dumb question incoming. Is puts
considered an external func in C, considering it comes from libc? If i just do:
#include <stdio.h>
int main() {
puts("Hello World!");
return 0;
}
would that give me a reasonable .o
to compare to?
Yep, for sure, sorry by external I just meant not defined in the same .o that calls it :-)
here's objdump -Dd --disassemble-zeroes --reloc a.out
for the C file above:
a.out: file format mach-o arm64
Disassembly of section __TEXT,__text:
0000000100003f58 <_main>:
100003f58: d10083ff sub sp, sp, #32
100003f5c: a9017bfd stp x29, x30, [sp, #16]
100003f60: 910043fd add x29, sp, #16
100003f64: 52800008 mov w8, #0
100003f68: b9000be8 str w8, [sp, #8]
100003f6c: b81fc3bf stur wzr, [x29, #-4]
100003f70: 90000000 adrp x0, 0x100003000 <_main+0x18>
100003f74: 913e6000 add x0, x0, #3992
100003f78: 94000005 bl 0x100003f8c <_puts+0x100003f8c>
100003f7c: b9400be0 ldr w0, [sp, #8]
100003f80: a9417bfd ldp x29, x30, [sp, #16]
100003f84: 910083ff add sp, sp, #32
100003f88: d65f03c0 ret
Disassembly of section __TEXT,__stubs:
0000000100003f8c <__stubs>:
100003f8c: b0000010 adrp x16, 0x100004000 <__stubs+0x4>
100003f90: f9400210 ldr x16, [x16]
100003f94: d61f0200 br x16
Disassembly of section __TEXT,__cstring:
0000000100003f98 <__cstring>:
100003f98: 6c6c6548 ldnp d8, d25, [x10, #-320]
100003f9c: 6f57206f umlal2.4s v15, v3, v7[1]
100003fa0: 21646c72 <unknown>
100003fa4: 00 <unknown>
Disassembly of section __TEXT,__unwind_info:
0000000100003fa8 <__unwind_info>:
100003fa8: 00000001 udf #1
100003fac: 0000001c udf #28
100003fb0: 00000000 udf #0
100003fb4: 0000001c udf #28
100003fb8: 00000000 udf #0
100003fbc: 0000001c udf #28
100003fc0: 00000002 udf #2
100003fc4: 00003f58 udf #16216
100003fc8: 00000040 udf #64
100003fcc: 00000040 udf #64
100003fd0: 00003f8c udf #16268
100003fd4: 00000000 udf #0
100003fd8: 00000040 udf #64
100003fdc: 00000000 udf #0
100003fe0: 00000000 udf #0
100003fe4: 00000000 udf #0
100003fe8: 00000003 udf #3
100003fec: 0001000c <unknown>
100003ff0: 00010010 <unknown>
100003ff4: 00000000 udf #0
100003ff8: 04000000 add z0.b, p0/m, z0.b, z0.b
100003ffc: 00000000 udf #0
Disassembly of section __DATA_CONST,__got:
0000000100004000 <__got>:
100004000: 00000000 udf #0
100004004: 80000000 <unknown>
This looks very different. For starters, the offsets are all way larger?
but no relocations!
Oh, that's for an a.out
which is the whole linked program, I guess build with cc -c -o test.o test.c
to get a .o?
(I have to head out for the day but hopefully others can continue helping! best of luck)
Thank so much for your help today Chris,
Here's the correct objdump:
test.o: file format mach-o arm64
Disassembly of section __TEXT,__text:
0000000000000000 <ltmp0>:
0: d10083ff sub sp, sp, #32
4: a9017bfd stp x29, x30, [sp, #16]
8: 910043fd add x29, sp, #16
c: 52800008 mov w8, #0
10: b9000be8 str w8, [sp, #8]
14: b81fc3bf stur wzr, [x29, #-4]
18: 90000000 adrp x0, 0x0 <ltmp0+0x18>
0000000000000018: ARM64_RELOC_PAGE21 l_.str
1c: 91000000 add x0, x0, #0
000000000000001c: ARM64_RELOC_PAGEOFF12 l_.str
20: 94000000 bl 0x20 <ltmp0+0x20>
0000000000000020: ARM64_RELOC_BRANCH26 _puts
24: b9400be0 ldr w0, [sp, #8]
28: a9417bfd ldp x29, x30, [sp, #16]
2c: 910083ff add sp, sp, #32
30: d65f03c0 ret
Disassembly of section __TEXT,__cstring:
0000000000000034 <ltmp1>:
34: 6c6c6548 ldnp d8, d25, [x10, #-320]
38: 6f57206f umlal2.4s v15, v3, v7[1]
3c: 21646c72 <unknown>
40: 00 <unknown>
Disassembly of section __LD,__compact_unwind:
0000000000000048 <ltmp2>:
48: 00000000 udf #0
0000000000000048: ARM64_RELOC_UNSIGNED __text
4c: 00000000 udf #0
50: 00000034 udf #52
54: 04000000 add z0.b, p0/m, z0.b, z0.b
58: 00000000 udf #0
5c: 00000000 udf #0
60: 00000000 udf #0
64: 00000000 udf #0
Is my help still needed? If so please ping me again. Was asleep when I got pinged.
Hi @bjorn3 ! I have not yet figured this out, if you have any wisdom to impart it would be very appreciated.
To recap, I'm trying to produce a runable macOS binary from cranelift IR, and my current code is here.
Linking is currently failing, and Chris helped me figure out it is likely alignment related to relocations. Above, I've pasted the objdump for my binary, as well as a comparable objdump for a C .o
that does the same thing, for comparison.
I think I might need to use this function alignment struct but I'm not sure
I see you don't have the is_pic flag enabled. I do have it enabled in cg_clif: https://github.com/rust-lang/rustc_codegen_cranelift/blob/ab10da27a11133add161bc6f9b2b7580ba455d58/src/lib.rs#L266 Can you try enabling it?
trying it now -- will report back
Ok, I added that to the flags builder. Here's the verbose clang output, which is indeed different than before, but it still seems to have failed:
Apple clang version 15.0.0 (clang-1500.3.9.4)
Target: arm64-apple-darwin23.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
"/Library/Developer/CommandLineTools/usr/bin/ld" -demangle -lto_library /Library/Developer/CommandLineTools/usr/lib/libLTO.dylib -dynamic -arch arm64 -platform_version macos 14.0.0 14.4 -syslibroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -o output -L/usr/local/lib output.o -lSystem /Library/Developer/CommandLineTools/usr/lib/clang/15.0.0/lib/darwin/libclang_rt.osx.a
0 0x100136074 __assert_rtn + 72
1 0x100072db8 ld::InputFiles::SliceParser::parseObjectFile(mach_o::Header const*) const + 22712
2 0x10007f830 ld::InputFiles::parseAllFiles(void (ld::AtomFile const*) block_pointer)::$_8::operator()(unsigned long, ld::FileInfo const&) const + 440
3 0x192836428 _dispatch_client_callout2 + 20
4 0x19284a850 _dispatch_apply_invoke3 + 336
5 0x1928363e8 _dispatch_client_callout + 20
6 0x192837c68 _dispatch_once_callout + 32
7 0x19284aeec _dispatch_apply_invoke_and_wait + 372
8 0x192849e9c _dispatch_apply_with_attr_f + 1212
9 0x19284a08c dispatch_apply + 96
10 0x100104564 ld::AtomFileConsolidator::parseFiles(bool) + 292
11 0x10009fee8 main + 9532
ld: Assertion failed: (pattern[0].addrMode == addr_other), function addFixupFromRelocations, file Relocations.cpp, line 701.
clang: error: linker command failed with exit code 1 (use -v to see invocation)
the objdump of the new output.o:
output.o: file format mach-o arm64
Disassembly of section __TEXT,__const:
0000000000000000 <_hello_world>:
0: 6c6c6548 ldnp d8, d25, [x10, #-320]
4: 77202c6f <unknown>
8: 646c726f <unknown>
c: 21 0a 00 <unknown>
Disassembly of section __TEXT,__text:
0000000000000010 <_main>:
10: d503237f pacibsp
14: a9bf7bfd stp x29, x30, [sp, #-16]!
18: 910003fd mov x29, sp
1c: 90000000 adrp x0, 0x0 <_main+0xc>
000000000000001c: ARM64_RELOC_GOT_LOAD_PAGE21 _hello_world
20: f9400000 ldr x0, [x0]
0000000000000020: ARM64_RELOC_GOT_LOAD_PAGEOFF12 _hello_world
24: 90000002 adrp x2, 0x0 <_main+0x14>
0000000000000024: ARM64_RELOC_GOT_LOAD_PAGE21 _puts
28: f9400042 ldr x2, [x2]
0000000000000028: ARM64_RELOC_GOT_LOAD_PAGEOFF12 _puts
2c: d63f0040 blr x2
30: a8c17bfd ldp x29, x30, [sp], #16
34: d65f0fff retab
May I ask why position independent code might help here?
Alex Hansen said:
May I ask why position independent code might help here?
I'm not sure if non-PIC executables are still supported on macOS at all.
I see, that makes sense. Thanks. clang does seem to have gotten further in the process, at least the verbose output has more info
I'm currently thinking about what other differences there are between what cg_clif does and what you are doing. I know for a fact that what cg_clif does works.
Just to check can you pass -Wl,-ld_classic
to clang to use the old linker?
Is the file you linked what you do for all OSes? I could just copy it and try
While that might unblock me, we wouldn't get the same feeling of enlightenment :)
With those flags, it worked!!!
./output
Hello, world!
Yes, that is how I build the TargetIsa
for all targets. It is not quite copy-paste-able though as it depends on rustc internals. I don't think the difference in TargetIsa
is the issue here though.
Is it generally unpreferred to use the old linker? I'm in very unfamiliar territory, not sure if I should just keep the flags, or dig to find out why it isn't working with the new linker
I know for a fact that what cg_clif does works.
Or at least it does with the old linker. There was a bug in the object crate that was fixed relatively recently: https://github.com/rust-lang/rustc_codegen_cranelift/issues/1456 I haven't tested it since.
Is it generally unpreferred to use the old linker?
It is likely going away at some point.
I'm in very unfamiliar territory, not sure if I should just keep the flags, or dig to find out why it isn't working with the new linker
It may well be that there is a bug in cranelift or object. It would be great to know if that is the case, but I don't have a mac myself to dig deeper. So I think using the old linker for now is your best option.
Ok, that makes sense. Thank you for your help. Should I file an issue or otherwise put the above Gist somewhere more public, so there's a reproducible way to dig into this issue? In case somebody more advanced with a Mac would like to look into it
Yeah, opening an issue on the wasmtime repo would make sense.
https://github.com/bytecodealliance/wasmtime/issues/8730
Thanks for all the help, both of you. Have a great weekend! I'm sure I'll be back in the near future.
Happy to help!
Last updated: Dec 23 2024 at 12:05 UTC