Stream: cranelift

Topic: how to use stack map for GC


view this post on Zulip dwuggh (Sep 12 2025 at 19:01):

hello, I'm writing an JIT lisp intepreter using cranelift, which is a Gc'd language. I see that cranelift has declare_var_in_stack_map and UserStackMap struct, but how should I actually use it? I only find that MachBuffer has some functions and fields about it, but I still don't know how to get the stack map information.

view this post on Zulip Till Schneidereit (Sep 13 2025 at 13:46):

@fitzgen (he/him) would probably be the best person to give good answers, but one thing you might want to look at is the PR that introduced Wasmtime's usage of Cranelift's user stack maps, as well as the current state of the files introduced by that PR

This moves Wasmtime over from the old, regalloc-based stack maps system to the new "user" stack maps system. Removing the old regalloc-based stack maps system is left for follow-up work. ...

view this post on Zulip dwuggh (Sep 13 2025 at 14:43):

thanks, I'll look into it right away

view this post on Zulip fitzgen (he/him) (Sep 15 2025 at 17:08):

you can get them from the MachBufferFinalized compilation output via the user_stack_maps and take_user_stack_maps methods

https://docs.rs/cranelift-codegen/latest/cranelift_codegen/struct.MachBufferFinalized.html#method.user_stack_maps

view this post on Zulip fitzgen (he/him) (Sep 15 2025 at 17:09):

happy to answer more specific questions if you have them

view this post on Zulip dwuggh (Oct 02 2025 at 09:31):

@fitzgen (he/him) Hi, could you please check whether my following statement is correct:

My understandings:

builder.ins().call()
builder.ins().call_indirect()
// in foo
builder().declare_var_in_stack_map(var); // record local variable
builder.ins().call_indirect(bar, &[var]); // pesudo code, call bar with value of var
// this call will emit a stackmap used by calling bar
ra = func_ptr + CodeOffset

then the stack map is found. Then I can get the entries by

parent_fp = fp.parent() // the fp linked list, accessed by (fp)
sp = parent_fp - span;
(sp + sp_offset) // access stack slot

question:

thanks in advance!

view this post on Zulip dwuggh (Oct 02 2025 at 09:51):

another question:

// inside foo
declare_variable_needs_stack_map(a); // a is some local variable here
call(bar, a)
// or it should just be:
call(bar, a)
// and inside bar, suppose bar is defined like
// def bar(arg):
//      # codes
declare_variable_needs_stack_map(arg)

in my understanding, I need to use the first approach because a and arg are different variables, and cranelift's compilation do not across functions, so the stack map in approach 2 cannot be found in foo. I currently declares everything so both approach 1 and 2. Is this redundant, that I only have to declare those local variables which act as arguments?

view this post on Zulip fitzgen (he/him) (Oct 02 2025 at 17:00):

dwuggh said:

Correct

dwuggh said:

I mean you can also implement your own DWARF-based stack walker without using libunwind, but if you are using frame pointers to walk the stack then yeah you need to get the first frame pointer somehow. Note that, unless you can guarantee that all code on the stack (rust, C, whatever shared libraries, etc) are using frame pointers, then you need to take care to only load from the FP when you 100% know it is actually a frame pointer, otherwise you risk UB (treating the contents of a general purpose register as a pointer and derefing it). The way wasmtime handles this is with trampolines at the host<-->wasm boundary that record FP/SP in a runtime structure on the side and let us bound the stack walk (and its FP-derefing) to within only the cranelift-emitted frames

dwuggh said:

// in foo
builder().declare_var_in_stack_map(var); // record local variable
builder.ins().call_indirect(bar, &[var]); // pesudo code, call bar with value of var
// this call will emit a stackmap used by calling bar

we will only emit a stack map entry for var if it is live across the call; if it is only passed in to bar as an argument, then it is bar's responsibility to keep it alive (via inclusion in stack maps or the host rooting it or whatever) or not (if the program doesn't actually keep using the value)

dwuggh said:

these details are architecture and ABI dependent, but yes that describes e.g. x86-64 with sys-v

dwuggh said:

there are roughly two cases:

  1. you are calling from a source language function to another source language function (eg wasm function calling another wasm function in wasmtime) and the callee triggers a GC somehow. in this case, the caller doesn't need to do anything, the GC will walk the stack and see the caller frame and its stack maps at the call site
  2. you are calling from a source language function to a runtime function (e.g. a wasm function calling the wasmtime internal helper for allocating a GC object). in this case, if the allocation cannot be satisfied because there isn't space in the gc heap or whatever, then the host routine should trigger a GC. again, the caller function doesn't need to do anything special, it just makes the call to the host routine and the GC will walk the stack and see the stack maps at the call site.

a sub-case of (2) is "I want to call the GC routine directly from a source function, for whatever reason". I think this is what you mean be gc_collect. In this sub-case, you do the exact same thing as the general (2) case: just call the host routine and then the host routine will trigger GC, walk the stack, see the stack maps at the caller's call site, and after it has collected all the on-stack GC roots can proceed with garbage collection

dwuggh said:

roughly. it is the region of the stack that is covered by the stack map. in theory the end could be trimmed to stop after the last stack slot containing a gc ref, rather than cover the whole stack frame.

dwuggh said:

I'm not sure there is a great reason. this code has evolved organically over the years and it is possible that we've ended up in some local maximums. I don't off the top of my head remember what things are relative to, I'd review the doc comments and if that doesn't give you an answer, then look at what wasmtime does (and use rust-analyzer or github search to follow function and type references to get a larger picture of how things are used):

https://github.com/bytecodealliance/wasmtime/blob/7cebfa206fe4a40ab54e9862f30b05c5fefb9043/crates/wasmtime/src/runtime/store.rs#L1867

dwuggh said:

almost definitely both. you technically can try and do global reasoining about "I don't need to keep this in stack maps because all callees that I call will already do so" but this is pretty tricky and fragile. best to follow the discipline of "is this a gc value? then put it in stack maps"

A lightweight WebAssembly runtime that is fast, secure, and standards-compliant - bytecodealliance/wasmtime

view this post on Zulip dwuggh (Oct 04 2025 at 10:47):

thanks! I've succeed on using frame pointers to retrieve stackmap.

The way wasmtime handles this is with trampolines at the host<-->wasm boundary that record FP/SP in a runtime structure on the side and let us bound the stack walk (and its FP-derefing) to within only the cranelift-emitted frames

Is this the technique using in unwinder crate? I've read it from https://github.com/bytecodealliance/wasmtime/pull/11710, and there seems to be some relevant code. but cfallin there also said that

I mean you can also implement your own DWARF-based stack walker without using libunwind

not familiar with this for now, adding it to my roadmap

Adds a documentation entry for how stack maps might be used to implement a garbage collector. Adds an example project which shows off how a simple garbage collector might actually be implemented. C...

view this post on Zulip Chris Fallin (Oct 04 2025 at 19:27):

Is this the technique [saving FP in a trampoline] using in unwinder crate?

It's what Wasmtime does, yes. The actual save/restore happens in Cranelift-generated trampolines -- see here.

I'll note as well re this

cfallin there also said that
* No, Wasmtime's unwinder has nothing to do with native stack frames
I mean you can also implement your own DWARF-based stack walker without using libunwind
not familiar with this for now, adding it to my roadmap

that in the context I said that, someone was asking about observing frames of functions compiled by rustc; and I was stating that Wasmtime's functionality has nothing to offer here unfortunately, as system-native compilers (like rustc) have different ways of describing frame layouts.

That said, in most language-runtime situations, you would probably use a different technique to root GC refs in your runtime/host-side code anyway (e.g. see our Rooted abstraction in Wasmtime) so it's not necessary to directly trace host frames. In other words, the direct stack-walk may only have to visit frames from Cranelift-compiled code, depending on how you architect things.

A lightweight WebAssembly runtime that is fast, secure, and standards-compliant - bytecodealliance/wasmtime

view this post on Zulip dwuggh (Oct 04 2025 at 19:37):

thanks, I understand it now, probably because I misunderstood what "native" actually means due to my bad english

view this post on Zulip Chris Fallin (Oct 04 2025 at 19:46):

sorry, really it was my imprecision -- "native" in the sense of "native system compilers" like rustc/clang as opposed to a language runtime's builtin one. Cranelift code's frames are still "native" in that they are frames created by real machine code :-)

view this post on Zulip dwuggh (Oct 04 2025 at 20:01):

just saw https://github.com/bytecodealliance/wasmtime/commit/2a2e8f62b9f20606f89a2e6619d6ece22eb57001, is enable_safepoints option deprecated/removed due to the new user stack map?

* Remove unused shared flags * Get rid of predicate settings They were important in the old backend framework, but with the new backend framework if we need a combination of multiple settings, th...

view this post on Zulip Chris Fallin (Oct 04 2025 at 21:42):

Right, at a high level, "user stack maps" means that Cranelift doesn't have a concept of safepoints anymore. It only has the ability to lower (user) stack maps to refer to compiled function frame offsets. Nick's post at https://bytecodealliance.org/articles/new-stack-maps-for-wasmtime details more of the reasoning for this

As part of implementing the WebAssembly garbage collection proposal in Wasmtime,which is an ongoing process, we’ve overhauled the stack map infrastructure inCranelift. This post will explain what stack maps are, why we needed to changeth...

Last updated: Dec 06 2025 at 07:03 UTC