Stream: git-wasmtime

Topic: wasmtime / issue #1074 Cranelift: Patchpoint instruction?


view this post on Zulip Wasmtime GitHub notifications bot (May 04 2022 at 20:44):

cfallin edited issue #1074:

In order to support tiering jits efficiently, it may be useful to support a "patchpoint" instruction that is lowered to a call + some nops if the required space is larger than is needed for a call, but emits some data about its location in the compiled code so that it can be patched later on.

view this post on Zulip Wasmtime GitHub notifications bot (May 04 2022 at 20:44):

cfallin labeled issue #1074:

In order to support tiering jits efficiently, it may be useful to support a "patchpoint" instruction that is lowered to a call + some nops if the required space is larger than is needed for a call, but emits some data about its location in the compiled code so that it can be patched later on.

view this post on Zulip Wasmtime GitHub notifications bot (May 04 2022 at 20:44):

cfallin labeled issue #1074:

In order to support tiering jits efficiently, it may be useful to support a "patchpoint" instruction that is lowered to a call + some nops if the required space is larger than is needed for a call, but emits some data about its location in the compiled code so that it can be patched later on.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 26 2025 at 16:11):

jgarvin commented on issue #1074:

For security reasons and for performance on some platforms, it's desirable to avoid dynamically patching live code when we don't need to, so I'm kind of inclined to want Cranelift to steer users away from it

Should cranelift just be avoided for anyone wanting to JIT compile? Or is there some sort of alternative on those platforms? AFAICT the main reason those platforms forbid it is to make portability more difficult for monopolistic reasons, there is usually more guaranteed security at higher levels nowadays.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 26 2025 at 16:30):

bjorn3 commented on issue #1074:

I don't think that comment is talking about platforms where the patching would be impossible (on those any kind of JIT is impossible), but rather about platforms who enforce W^X for security reasons. These platforms do allow toggling between the two with a syscall (SELinux) or a cpu instruction (arm64 macOS, stably exposed as a libc function call), but this is much slower than a plain write. Enforcing W^X is genuinely useful. Back when an executable stack was common for example as soon as you had a stack buffer overflow, getting arbitrary code execution with shell code was trivial. Nowadays because of W^X you have to do much more effort for something like ROP.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 26 2025 at 16:32):

cfallin commented on issue #1074:

Hi @jgarvin -- not that the discussion you're replying to is six years old. Cranelift does work well in JIT contexts; Wasmtime uses it this way to great effect.

Patchpoints are distinct from JIT: one can JIT in a way that publishes code once and does not modify it later. Patchpoints are useful mostly for avoiding any dynamic cost of branches when a branch changes very rarely -- e.g., turning on debugging mode and adding breakpoint-check logic, or adding a new type-specialized behavior.

The point made above about performance implications still holds -- publishing code (turning it from writable to executable) is very expensive on some platforms, e.g. aarch64 requires a syscall and an IPI to all cores to do an icache synchronization barrier (and possibly a full icache flush on cheaper microarchitectures). Even modern x86 does W^X (as bjorn3 says above) and that implies a TLB shootdown across all cores. That said, all those are system design concerns, and Cranelift as a compiler library probably isn't the place to issue an edict and ban this technique for those reasons.

My main consideration for a design of this feature would be the IR semantics -- it's not clear what it would look like, other than a giant "insert code here" hole; then one would have to carefully specify what the ABI is / how that code interacts with the rest of code that Cranelift generates. That's a hard thing to do in general, rather than designing in concert with a specific VM.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 26 2025 at 16:32):

fitzgen commented on issue #1074:

Backing up a bit: you can always explicitly guard against your speculative optimizations becoming invalid, and if so call out to a native function, passing a reference to a stack slot containing all live variables for your program that need to be preserved across OSRing (or, better yet, use stack maps), and have the native function replace the Cranelift code's stack frame and do the final OSR jump. This doesn't require any changes to Cranelift and avoids needing to do the trickiest bits in the generated code directly, where things are generally just harder than in plain Rust (or whatever) code.

This is what I would do, rather than patch the Cranelift's generated code at runtime. Patching code is tricky in multi-threaded scenarios and involves icache flushing and such that makes it a more expensive operation than it might first seem.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 26 2025 at 16:49):

jgarvin commented on issue #1074:

I understand that on W^X platforms you have to jump through extra hoops to make the memory writable again, but I don't see how you can do JIT that allows redefining things at runtime (typical benefit of JIT languages) without some version of this? If I start with a function f, that is called by a function g, and I now want to change the definition of f, either I must overwrite f in place or overwrite g to call a new version, no? Assuming I don't want to introduce the overhead of making everything indirect with function pointers.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 26 2025 at 17:01):

bjorn3 commented on issue #1074:

Or you have a table in which functions are looked up when you make a call. This is required for dynamic languages like JS anyway. (inline caches can be used to speed up the repeated lookups, but those aren't generated on the first call, but only for hot code as writing them is slow for the reasons Chris mentioned)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 26 2025 at 17:31):

cfallin commented on issue #1074:

Assuming I don't want to introduce the overhead of making everything indirect with function pointers.

That's actually how many real JITs do dynamism -- e.g., in SpiderMonkey (which I've worked on, so I know well), the inline-cache slots are function pointers in an array. Likewise you'd want to use a PLT-like structure to allow function updates.

Indirect branch predictors are quite good on modern CPUs -- I'd encourage you to benchmark!


Last updated: Dec 13 2025 at 19:03 UTC