a general architecture question i've had about Cranelift: why does Cranelift have embedded assemblers for its architectures, rather than using assemblers maintained elsewhere (like XED, for x86/x86-64)?
my assumption is "assemblers as part of gas
or xed
are slower than we'd want", but i've never seen that actually stated. i'm sure there are good reasons, i'm just not sure what they are :D
the followup then is: does it make sense to parcel out the assemblers in Cranelift as their own crates? i'm sure there are other projects that would like high-performance assemblers, like dynasm-rs (which itself has an x86 assembler embedded in it!)
why does Cranelift have embedded assemblers for its architectures
External assemblers need to be installed independently, which is hard on for example Windows. External assemblers also often don't support cross-compilation. They also emit object files, rather than machine code + relocations, which makes them pretty much useless for JIT compilation. Finally we have to model all instructions anyway to correctly perform regalloc. Doing the final assembly step is relatively easy once you have modelled all instructions you want to emit.
One other really important consideration is latency -- we really wouldn't want to shell out to as
(or even use a library in-process that parses text) when JIT'ing small Wasm modules in a browser, for example
i don't mean "external" in the "shell out to gas
" sense - i mean a library or crate we could depend on, so that we don't need to spell out how to encode simd instructions and prefix emission for the dozen'th time. even so, exposing an API to emit machine code and relocations is an interesting point, and exposing instruction constraints in a library-friendly way doesn't seem straightforward
(another assumption i'm making here: an in-process library to assemble instructions ought to have an interface that lets you bypass parsing a string of text. but then you'd have to translate the in-compiler instruction representation to whatever the assembler expects, and that does add latency either way)
Another part of the answer is that Cranelift needs to know a lot about ISAs, the opcodes and their operands, how many bytes of encoding they take, including things like x86 encodings where the size depends on which registers you choose anyway, and once it has all that, additionally adding the information and logic to actually encode the instructions is, relatively speaking, not that much more work.
Also a built-in assembler only needs to handle the usually relatively small subset of the machine's insns that the insn selector can actually produce. Generalizing it would mean you'd have to have more complete coverage, but that doesn't benefit any single user.
FYI there is already an assembler and disassembler library for the Arm architecture (A-profile, both 32- and 64-bit) that provides most (if not all) of the features discussed here - VIXL:
https://git.linaro.org/arm/vixl.git
Since it has been developed precisely with the JIT use case in mind, it doesn't generate object files, but operates on memory buffers that contain machine code; there's also no text parsing involved. It is actively maintained by Arm (currently there is ongoing work on the Scalable Vector Extension support, for instance) and is used by other open source projects, the most high profile of which being the Android Runtime (ART) probably. Since cross-compilation was mentioned - VIXL also includes an AArch64 simulator, so machine code could be emitted and tested in an x86 environment, for example.
VIXL is a C++ library. It doesn't export a C api, which means that it can't be used directly from Rust. Also I can't find any API to query the size of an instruction.
VIXL also includes an AArch64 simulator
From the README:
The VIXL simulator supports only those instructions that the VIXL assembler can
generate.
This means that it will likely not be able to emulate Wasmtime. Also qemu-aarch64
works fine already.
Also from the README:
Limited support for synchronisation instructions.
This will make multithreading support harder.
I don't think Anton was suggesting Cranelift use it, just that such a library exists.
Yes, exactly - just thought that it was worth mentioning.
The simulator does not serve the same purpose as the QEMU userspace emulation - the idea is that you generate a code buffer and then use the simulator to execute the generated code, not that you take an arbitrary executable and run it. Arguably that's more suited to the JIT use case and has considerably less overhead. As for an API to query the size of an instruction - that's not necessary for AArch64 because the architecture has fixed-size instructions. However, there's a constant that defines the size.
To wit, VIXL is what we use in Spidermonkey to run most of our aarch64 tests, it has been quite useful and helped not requiring aarch64 hardware in the past. Knowing that it doesn't well support synchronization instructions might be helpful when we want to implement wasm SIMD in the new backend.
Last updated: Jan 24 2025 at 00:11 UTC