Stream: cranelift

Topic: patching generated code for JIT block linking


view this post on Zulip vx (Nov 19 2025 at 22:42):

hi! i'm currently writing a GameCube emulator and using cranelift to JIT the PowerPC Gekko CPU. currently, i've been able to boot my first few games (Luigi's Mansion, Zelda Wind Waker, etc) and while that's awesome, i've found out my JIT performance is insufficient - games run around 20-30% speed in the more intensive parts.

now, i'm implementing optimizations on my JIT and one of the most common ones is block linking: most block exit points jump to fixed guest addresses, so instead of returning to the dispatcher and searching the block mappings for the block of the target, the executed block could be patched to instead just jump (or tail call) to the block of the target directly.

i'm looking for ways to implement this within cranelift, since i'd preferably want to not do patching manually (i.e. inspecting the generated machine code and writing out bytes, that sort of thing). i have thought of two possibilities:

  1. keep the block's Function in memory, modify it and recompile the whole block when i need to patch one of the exits. this seems like the easiest option, but also the worst performing - compiling blocks multiple times for a few changes to call targets isn't ideal...
  2. create a "trampoline" function for each exit point which could then be easily patched, as it's outside the block itself. i'm worried the indirection would hurt performance.

i'm looking for some insights on other ways to approach the problem, or maybe improve the ones i've listed.

view this post on Zulip Amanieu (Nov 19 2025 at 22:49):

You may be interested in my talk at Rust Nation UK where I show how to build a binary translator JIT based on Cranelift. You can find the code here.

Simple RISC-V emulator presented at Rust Nation 2023 - Amanieu/a-tale-of-binary-translation

view this post on Zulip Amanieu (Nov 19 2025 at 22:50):

Essentially you want to have a function pointer in global memory for each exit out of a function that initially points to a resolver function. The resolver function will then JIT the block if needed and store the resulting function pointer in the global.

view this post on Zulip Amanieu (Nov 19 2025 at 22:51):

That's as much as you can do without modifying Cranelift itself.

view this post on Zulip vx (Nov 19 2025 at 22:54):

that's pretty similar to the second idea i listed, except better! indeed, there's no need for a trampoline function - calling a function pointer and then modifying it is a great idea.

view this post on Zulip vx (Nov 19 2025 at 22:55):

i also think this is would be pretty good performance-wise. i'll take a look at your talk, too. thanks for the insight :)


Last updated: Dec 06 2025 at 07:03 UTC