Hello, i am considering using cranelift for the new backend of GRIN, a whole program compiler i am working on. I gather stackmaps is the canonical way to scan the stack with GC. So i would like to know how code is supposed to use it at runtime.
Eg: at runtime will stackmaps be stored as vector of indices into stack elements that are pointers? will it be stored as a bitmap that marks pointers?
We actually removed stackmaps at the Cranelift level recently! See @fitzgen (he/him) 's post here: https://bytecodealliance.org/articles/new-stack-maps-for-wasmtime
the idea is that the embedder/user of Cranelift has a better idea of exactly what the invariants are and what they need, so it's better to push it up one level
I was referring to that documentation. I gather the front end to cranelift is responsible to spill pointers and ask the stackmap to be generated at safepoints. But cranelift still generates it. Presumably for runtime code to embedd in a custom format such as the ones i mentioned
No, Cranelift no longer does any of that -- it doesn't have the concept of safepoints or stackmaps anymore, all of that lives at the Wasmtime level
it lives at the cranelift-frontend
level
ah! that's right, sorry, my brain is too often scoped to "cranelift-codegen
== universe"
so as long as you are using cranelift-frontend
to generate your clif, you can use declare_{value,var}_needs_stack_map
methods and get them spilled around safepoints appropriately
Chris Fallin said:
No, Cranelift no longer does any of that -- it doesn't have the concept of safepoints or stackmaps anymore, all of that lives at the Wasmtime level
;; NEW: stack map annotation on the safepoint.
call $f(), stack_map = [i64 @ ss0]
i thought this is cranelift generating stackmaps when asked to
Yeah, sorry, ignore me, I (i) need more sleep and (ii) have already paged much of this out
@fitzgen (he/him) is authoritative here
Diego Antonio Rosario Palomino said:
Eg: at runtime will stackmaps be stored as vector of indices into stack elements that are pointers? will it be stored as a bitmap that marks pointers?
you get the stack maps as part of the result of compilation from cranelift-codegen
:
it is a table of (code offset, number of words mapped, stack map)
triples
GRIN is written in haskell and ( soon ) idris2. To avoid having to write bindings, can i acess stackmap generation at the textual representation level. Generate a textual representation of cranelift that can ask stackmap generation. Or is that something that can only be done with the crate api?
let me double check
yeah you should be able to specify them in the CLIF text format: https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/filetests/filetests/parser/user_stack_maps.clif#L19-L20
fitzgen (he/him) said:
Diego Antonio Rosario Palomino said:
Eg: at runtime will stackmaps be stored as vector of indices into stack elements that are pointers? will it be stored as a bitmap that marks pointers?
you get the stack maps as part of the result of compilation from
cranelift-codegen
:
I see, but what are possible runtime representations?
( note : we are only targetting 64 bit platforms )
there is no stable serialized format defined by Cranelift, although the various bits do derive(serde::Serialize)
so you could write a little bit of Rust code to take them and encode them to json/bincode/whatever format you want
https://docs.rs/cranelift-codegen/latest/src/cranelift_codegen/ir/user_stack_maps.rs.html#126-129
I gather when i use a runtime with gc, i am supposed to convert the stackmap compile time representation, to a representation efficient for the combination of runtime and platforms used by my project. If so, i would like to know which ones are reasonable options
A vector of indexes would be faster to access and create than bitmaps, but would take more space
wasmtime doesn't really change the format too much, leaves it as a bitmap:
Thanks. I was curious whether bitmaps would be efficient
Another question. If user stackmaps already require cranelift code to spill and reload pointers at safepoints, why shouldnt i spill them to an structure dedicated to holding pointers? That way keeping track of stackmaps becomes unnecessary.
I think this is similar to the tecnique of using two stacks, one for values and another for pointers
it should be efficient, but if you want the absolute fastest stack-tracing possible, then you can avoid cranelift stack maps and just keep a custom shadow stack for GC refs that you push to / pop from at runtime and then your stack scanning is literally just iterating over flat memory with zero stack map interpretation or anything. this ofc has higher runtime overheads (classic mutator throughput vs GC latency style of trade off)
Diego Antonio Rosario Palomino said:
Another question. If user stackmaps already require cranelift code to spill and reload pointers at safepoints, why shouldnt i spill them to an structure dedicated to holding pointers? That way keeping track of stackmaps becomes unnecessary.
I think this is similar to the tecnique of using two stacks, one for values and another for pointers
this is basically what I just said above, replying to your "curious whether bitmaps would be efficient"
it is more runtime overhead because of less locality, needing to pop, higher register pressure, etc... but it is a very valid implementation choice. also pretty easy to implement and debug.
Locality could be preserved with a flat representation such as a vector. In think in principle this isnt slower than spilling and reloading to the canonical language stack ( commonly called c stack ? ) . Would it be slower because cranelift performs a wide array of optimizations to the language stack?
it is another stack, different from the current one, which is going to hurt locality and increase cache pressure. might not be visible all the time (depends on the rest of the working set and its cache usage). you also have to keep around a pointer to the GC ref shadow stack, which increases register pressure. compare this to the native stack, which is always right there in SP (and also sometimes in FP). again, this might not be visible if the rest of the program is only using 4 live values in its hot loop or something, but might have a larger impact on others.
but using a shadow stack is definitely a valid approach
honestly, I suggest just doing whatever you think will be easiest, and then coming back and revisiting if it is ever a perf issue. should be easy enough to switch from one approach to the other.
GRIN will need a gc fit for functional languages. Which probably means generational collection. Plus a longer term goal is becomming a full haskell backend. If we wanted to use the existing runtime this would impose heavy restrictions
To aid in targetting Cranelift from languages without tradicional bindings, could the IR support serialization from existing formats such as json and s expressions ?
A pretty printer wouldnt be needed from the many systems that support automatic (de)serialization
you mean instead of the .clif
text format?
As an alternative, yes
if you enable the serde
feature of cranelift-codegen
, then the IR all has derive(Serialize, Deserialize)
which can be used with serde_json
or bincode
or etc...
no guarantees that the serialization format won't change across cranelift versions, ofc, same as the .clif
text format
In that case an idris2 or haskell cranelift package would need integration testing to be practical
The usage of serde we have is incompatible with serde_json. In particular we use integer as map keys, which JSON doesn't allow. Also the encoding of the Layout
is funky. And the header contains the Cranelift version string, which needs to match the consumer. I also think we are using integers in the place of instruction names. It is mostly useful as format for serializing IR you already created using Cranelift's API rather than for external producers. The text format is more stable.
Last updated: Jan 24 2025 at 00:11 UTC