Stream: cranelift

Topic: Basic block level code generation


view this post on Zulip Madushan Nishantha (Oct 11 2024 at 12:55):

I'm trying to write a dynamically recompiling emulator like qemu. Qemu jit compiler generates code at the basic block level, not at a subroutine/function level.
I'm trying to figure out how to do this with cranelift, all the tutorials/examples I see uses a function level generator. Is it possible at all?

view this post on Zulip Chris Fallin (Oct 11 2024 at 14:50):

Unfortunately not in a simple way — Cranelift is a whole-function compiler and that’s a very deeply embedded fundamental design point. However: potentially you could compile blocks as small functions and end with tail calls to the next block; you’ll have a little overhead with prologue/epilogue but maybe not too bad

view this post on Zulip Madushan Nishantha (Oct 11 2024 at 14:56):

Thank you for the clarification :)
It there an easy way to patch out the prologue/epilogue? Qemu also just patch the generated epilogue directly to jump to next basic block after the next block is generated to speed up the emulation.

view this post on Zulip bjorn3 (Oct 11 2024 at 15:14):

The prologue is necessary to setup stack space and save clobbered registers.

view this post on Zulip Madushan Nishantha (Oct 11 2024 at 15:18):

I'm planning to do that part manually when entering the emulation and exiting it. so having it in each basic block is redundant if I use small functions to represent a basic block.

view this post on Zulip Madushan Nishantha (Oct 11 2024 at 15:25):

hmm after thinking about it a bit more, I guess I wouldn't know about what registers the generated code is going to clobber. so I will have to keep the prologue and possibly the epilogue.

view this post on Zulip Chris Fallin (Oct 11 2024 at 17:56):

Right, the issue is that they're tightly coupled; Cranelift expects that it can generate a blob of code and you'll execute all of it, and you're violating the interface if you don't (any patching would be ad-hoc and would be liable to break with any Cranelift update at all, or in a corner case you haven't tested)

view this post on Zulip Chris Fallin (Oct 11 2024 at 17:57):

on all non-x86 platforms we skip generating the frame-pointer setup in leaf functions (those with no other calls) with no clobbers or spills; so very small functions in many cases will just be "<body> ; ret"

view this post on Zulip Chris Fallin (Oct 11 2024 at 17:57):

on x86 we can do this optimization eventually too, I don't remember what the issue was but probably minor

view this post on Zulip Madushan Nishantha (Oct 11 2024 at 18:57):

Okay, I will do some experiments and figure out basic block as a function pattern will work well with my usecase. Thank you again for the help.

view this post on Zulip Afonso Bordado (Oct 11 2024 at 20:21):

I don't remember what the issue was but probably minor

The last time we discussed this was here:
https://github.com/bytecodealliance/wasmtime/pull/8516

We just need to figure out if we are going to emit a call in the function body to know if we can skip the prologue

This makes cranelift omit the function prologue and epilogue on the x64 architecture if possible, like it already does for aarch64. However, this is not quite right yet, because (if I understand co...

view this post on Zulip Amanieu (Oct 12 2024 at 08:42):

@Madushan Nishantha You may want to have a look at https://github.com/Amanieu/a-tale-of-binary-translation which does just this.

Simple RISC-V emulator presented at Rust Nation 2023 - Amanieu/a-tale-of-binary-translation

view this post on Zulip Amanieu (Oct 12 2024 at 08:43):

There's more details in the presentation.

view this post on Zulip Amanieu (Oct 12 2024 at 08:44):

At the time Cranelift didn't support tail calls. Rust still doesn't.

view this post on Zulip Madushan Nishantha (Oct 12 2024 at 11:20):

Nice! thank you. I'll have a look.


Last updated: Oct 23 2024 at 20:03 UTC