hello, I'm writing an JIT lisp intepreter using cranelift, which is a Gc'd language. I see that cranelift has declare_var_in_stack_map and UserStackMap struct, but how should I actually use it? I only find that MachBuffer has some functions and fields about it, but I still don't know how to get the stack map information.
@fitzgen (he/him) would probably be the best person to give good answers, but one thing you might want to look at is the PR that introduced Wasmtime's usage of Cranelift's user stack maps, as well as the current state of the files introduced by that PR
thanks, I'll look into it right away
you can get them from the MachBufferFinalized compilation output via the user_stack_maps and take_user_stack_maps methods
happy to answer more specific questions if you have them
@fitzgen (he/him) Hi, could you please check whether my following statement is correct:
My understandings:
builder.ins().call()
builder.ins().call_indirect()
to do stack walking(without libunwind), I need to pass fp to gc_collect, and walk through
the fp linked list
the user stack maps can be saved globally, with form [(CodeOffset, span, ir::UserStackMap)],
the CodeOffset here is the offset to the function pointer retrieved by get_finalized_function.
this list records all the functions' stack map that are called inside the function we are compiling, in other words, the stack maps record its callee's stack map. Specifically, suppose I'm compiling foo which called bar:
// in foo
builder().declare_var_in_stack_map(var); // record local variable
builder.ins().call_indirect(bar, &[var]); // pesudo code, call bar with value of var
// this call will emit a stackmap used by calling bar
ra = func_ptr + CodeOffset
then the stack map is found. Then I can get the entries by
parent_fp = fp.parent() // the fp linked list, accessed by (fp)
sp = parent_fp - span;
(sp + sp_offset) // access stack slot
question:
gc_collect at safepoint? Should I call them right before call or call_indirect instruction, or place them at the beginning of a function? The document said that safepoints cannot be skipped, but if we have to call gc on our own, then obviously we are able to choose to skip it.span just the callee's frame size? In the source code, it is computed via emit_state.frame_layout().active_size(), I assume they are identicalfp or parent_fp? Why are we using SP-relative offsets, instead of FP-relative which saves span?thanks in advance!
another question:
// inside foo
declare_variable_needs_stack_map(a); // a is some local variable here
call(bar, a)
// or it should just be:
call(bar, a)
// and inside bar, suppose bar is defined like
// def bar(arg):
// # codes
declare_variable_needs_stack_map(arg)
in my understanding, I need to use the first approach because a and arg are different variables, and cranelift's compilation do not across functions, so the stack map in approach 2 cannot be found in foo. I currently declares everything so both approach 1 and 2. Is this redundant, that I only have to declare those local variables which act as arguments?
dwuggh said:
- safepoint is placed at every (non-tail) call, in cranelift, it is
Correct
dwuggh said:
- to do stack walking(without libunwind), I need to pass fp to
gc_collect, and walk through
the fp linked list
I mean you can also implement your own DWARF-based stack walker without using libunwind, but if you are using frame pointers to walk the stack then yeah you need to get the first frame pointer somehow. Note that, unless you can guarantee that all code on the stack (rust, C, whatever shared libraries, etc) are using frame pointers, then you need to take care to only load from the FP when you 100% know it is actually a frame pointer, otherwise you risk UB (treating the contents of a general purpose register as a pointer and derefing it). The way wasmtime handles this is with trampolines at the host<-->wasm boundary that record FP/SP in a runtime structure on the side and let us bound the stack walk (and its FP-derefing) to within only the cranelift-emitted frames
dwuggh said:
// in foo builder().declare_var_in_stack_map(var); // record local variable builder.ins().call_indirect(bar, &[var]); // pesudo code, call bar with value of var // this call will emit a stackmap used by calling bar
we will only emit a stack map entry for var if it is live across the call; if it is only passed in to bar as an argument, then it is bar's responsibility to keep it alive (via inclusion in stack maps or the host rooting it or whatever) or not (if the program doesn't actually keep using the value)
dwuggh said:
- during stack walking, I can get ip by (fp + 8) (loading from address), which is the return address(RA) of this funcall
these details are architecture and ABI dependent, but yes that describes e.g. x86-64 with sys-v
dwuggh said:
- How to properly call
gc_collectat safepoint? Should I call them right beforecallorcall_indirectinstruction, or place them at the beginning of a function? The document said that safepoints cannot be skipped, but if we have to call gc on our own, then obviously we are able to choose to skip it.
there are roughly two cases:
a sub-case of (2) is "I want to call the GC routine directly from a source function, for whatever reason". I think this is what you mean be gc_collect. In this sub-case, you do the exact same thing as the general (2) case: just call the host routine and then the host routine will trigger GC, walk the stack, see the stack maps at the caller's call site, and after it has collected all the on-stack GC roots can proceed with garbage collection
dwuggh said:
- is
spanjust the callee's frame size? In the source code, it is computed viaemit_state.frame_layout().active_size(), I assume they are identical
roughly. it is the region of the stack that is covered by the stack map. in theory the end could be trimmed to stop after the last stack slot containing a gc ref, rather than cover the whole stack frame.
dwuggh said:
- I'm especially uncertain of the last understanding. should I use
fporparent_fp? Why are we using SP-relative offsets, instead of FP-relative which savesspan?
I'm not sure there is a great reason. this code has evolved organically over the years and it is possible that we've ended up in some local maximums. I don't off the top of my head remember what things are relative to, I'd review the doc comments and if that doesn't give you an answer, then look at what wasmtime does (and use rust-analyzer or github search to follow function and type references to get a larger picture of how things are used):
dwuggh said:
- I should mark the variables on caller or callee?
almost definitely both. you technically can try and do global reasoining about "I don't need to keep this in stack maps because all callees that I call will already do so" but this is pretty tricky and fragile. best to follow the discipline of "is this a gc value? then put it in stack maps"
thanks! I've succeed on using frame pointers to retrieve stackmap.
The way wasmtime handles this is with trampolines at the host<-->wasm boundary that record FP/SP in a runtime structure on the side and let us bound the stack walk (and its FP-derefing) to within only the cranelift-emitted frames
Is this the technique using in unwinder crate? I've read it from https://github.com/bytecodealliance/wasmtime/pull/11710, and there seems to be some relevant code. but cfallin there also said that
- No, Wasmtime's unwinder has nothing to do with native stack frames
I mean you can also implement your own DWARF-based stack walker without using libunwind
not familiar with this for now, adding it to my roadmap
Is this the technique [saving FP in a trampoline] using in unwinder crate?
It's what Wasmtime does, yes. The actual save/restore happens in Cranelift-generated trampolines -- see here.
I'll note as well re this
cfallinthere also said that
* No, Wasmtime's unwinder has nothing to do with native stack frames
I mean you can also implement your own DWARF-based stack walker without using libunwind
not familiar with this for now, adding it to my roadmap
that in the context I said that, someone was asking about observing frames of functions compiled by rustc; and I was stating that Wasmtime's functionality has nothing to offer here unfortunately, as system-native compilers (like rustc) have different ways of describing frame layouts.
That said, in most language-runtime situations, you would probably use a different technique to root GC refs in your runtime/host-side code anyway (e.g. see our Rooted abstraction in Wasmtime) so it's not necessary to directly trace host frames. In other words, the direct stack-walk may only have to visit frames from Cranelift-compiled code, depending on how you architect things.
thanks, I understand it now, probably because I misunderstood what "native" actually means due to my bad english
sorry, really it was my imprecision -- "native" in the sense of "native system compilers" like rustc/clang as opposed to a language runtime's builtin one. Cranelift code's frames are still "native" in that they are frames created by real machine code :-)
just saw https://github.com/bytecodealliance/wasmtime/commit/2a2e8f62b9f20606f89a2e6619d6ece22eb57001, is enable_safepoints option deprecated/removed due to the new user stack map?
Right, at a high level, "user stack maps" means that Cranelift doesn't have a concept of safepoints anymore. It only has the ability to lower (user) stack maps to refer to compiled function frame offsets. Nick's post at https://bytecodealliance.org/articles/new-stack-maps-for-wasmtime details more of the reasoning for this
Last updated: Dec 06 2025 at 07:03 UTC