Stream: cranelift

Topic: assemblers


view this post on Zulip iximeow (Jun 08 2020 at 16:55):

a general architecture question i've had about Cranelift: why does Cranelift have embedded assemblers for its architectures, rather than using assemblers maintained elsewhere (like XED, for x86/x86-64)?

my assumption is "assemblers as part of gas or xed are slower than we'd want", but i've never seen that actually stated. i'm sure there are good reasons, i'm just not sure what they are :D

the followup then is: does it make sense to parcel out the assemblers in Cranelift as their own crates? i'm sure there are other projects that would like high-performance assemblers, like dynasm-rs (which itself has an x86 assembler embedded in it!)

view this post on Zulip bjorn3 (Jun 08 2020 at 17:00):

why does Cranelift have embedded assemblers for its architectures

External assemblers need to be installed independently, which is hard on for example Windows. External assemblers also often don't support cross-compilation. They also emit object files, rather than machine code + relocations, which makes them pretty much useless for JIT compilation. Finally we have to model all instructions anyway to correctly perform regalloc. Doing the final assembly step is relatively easy once you have modelled all instructions you want to emit.

view this post on Zulip Chris Fallin (Jun 08 2020 at 17:04):

One other really important consideration is latency -- we really wouldn't want to shell out to as (or even use a library in-process that parses text) when JIT'ing small Wasm modules in a browser, for example

view this post on Zulip iximeow (Jun 08 2020 at 17:05):

i don't mean "external" in the "shell out to gas" sense - i mean a library or crate we could depend on, so that we don't need to spell out how to encode simd instructions and prefix emission for the dozen'th time. even so, exposing an API to emit machine code and relocations is an interesting point, and exposing instruction constraints in a library-friendly way doesn't seem straightforward

view this post on Zulip iximeow (Jun 08 2020 at 17:06):

(another assumption i'm making here: an in-process library to assemble instructions ought to have an interface that lets you bypass parsing a string of text. but then you'd have to translate the in-compiler instruction representation to whatever the assembler expects, and that does add latency either way)

view this post on Zulip Dan Gohman (Jun 08 2020 at 17:07):

Another part of the answer is that Cranelift needs to know a lot about ISAs, the opcodes and their operands, how many bytes of encoding they take, including things like x86 encodings where the size depends on which registers you choose anyway, and once it has all that, additionally adding the information and logic to actually encode the instructions is, relatively speaking, not that much more work.

view this post on Zulip Julian Seward (Jun 08 2020 at 18:01):

Also a built-in assembler only needs to handle the usually relatively small subset of the machine's insns that the insn selector can actually produce. Generalizing it would mean you'd have to have more complete coverage, but that doesn't benefit any single user.

view this post on Zulip Anton Kirilov (Jun 09 2020 at 00:44):

FYI there is already an assembler and disassembler library for the Arm architecture (A-profile, both 32- and 64-bit) that provides most (if not all) of the features discussed here - VIXL:
https://git.linaro.org/arm/vixl.git
Since it has been developed precisely with the JIT use case in mind, it doesn't generate object files, but operates on memory buffers that contain machine code; there's also no text parsing involved. It is actively maintained by Arm (currently there is ongoing work on the Scalable Vector Extension support, for instance) and is used by other open source projects, the most high profile of which being the Android Runtime (ART) probably. Since cross-compilation was mentioned - VIXL also includes an AArch64 simulator, so machine code could be emitted and tested in an x86 environment, for example.

view this post on Zulip bjorn3 (Jun 09 2020 at 09:47):

VIXL is a C++ library. It doesn't export a C api, which means that it can't be used directly from Rust. Also I can't find any API to query the size of an instruction.

view this post on Zulip bjorn3 (Jun 09 2020 at 09:53):

VIXL also includes an AArch64 simulator

From the README:

The VIXL simulator supports only those instructions that the VIXL assembler can
generate.

This means that it will likely not be able to emulate Wasmtime. Also qemu-aarch64 works fine already.

Also from the README:

Limited support for synchronisation instructions.

This will make multithreading support harder.

view this post on Zulip Joey Gouly (Jun 09 2020 at 10:04):

I don't think Anton was suggesting Cranelift use it, just that such a library exists.

view this post on Zulip Anton Kirilov (Jun 09 2020 at 12:40):

Yes, exactly - just thought that it was worth mentioning.
The simulator does not serve the same purpose as the QEMU userspace emulation - the idea is that you generate a code buffer and then use the simulator to execute the generated code, not that you take an arbitrary executable and run it. Arguably that's more suited to the JIT use case and has considerably less overhead. As for an API to query the size of an instruction - that's not necessary for AArch64 because the architecture has fixed-size instructions. However, there's a constant that defines the size.

view this post on Zulip Benjamin Bouvier (Jun 09 2020 at 13:04):

To wit, VIXL is what we use in Spidermonkey to run most of our aarch64 tests, it has been quite useful and helped not requiring aarch64 hardware in the past. Knowing that it doesn't well support synchronization instructions might be helpful when we want to implement wasm SIMD in the new backend.


Last updated: Oct 23 2024 at 20:03 UTC