Using stackmaps at runtime · cranelift

Hello, i am considering using cranelift for the new backend of GRIN, a whole program compiler i am working on. I gather stackmaps is the canonical way to scan the stack with GC. So i would like to know how code is supposed to use it at runtime.

Eg: at runtime will stackmaps be stored as vector of indices into stack elements that are pointers? will it be stored as a bitmap that marks pointers?

Chris Fallin (Oct 30 2024 at 15:55):

New Stack Maps for Wasmtime and Cranelift

As part of implementing the WebAssembly garbage collection proposal in Wasmtime,which is an ongoing process, we’ve overhauled the stack map infrastructure inCranelift. This post will explain what stack maps are, why we needed to changeth...

Chris Fallin (Oct 30 2024 at 15:55):

the idea is that the embedder/user of Cranelift has a better idea of exactly what the invariants are and what they need, so it's better to push it up one level

Diego Antonio Rosario Palomino (Oct 30 2024 at 15:58):

I was referring to that documentation. I gather the front end to cranelift is responsible to spill pointers and ask the stackmap to be generated at safepoints. But cranelift still generates it. Presumably for runtime code to embedd in a custom format such as the ones i mentioned

Chris Fallin (Oct 30 2024 at 15:59):

No, Cranelift no longer does any of that -- it doesn't have the concept of safepoints or stackmaps anymore, all of that lives at the Wasmtime level

fitzgen (he/him) (Oct 30 2024 at 15:59):

Chris Fallin (Oct 30 2024 at 16:00):

ah! that's right, sorry, my brain is too often scoped to "cranelift-codegen == universe"

fitzgen (he/him) (Oct 30 2024 at 16:00):

so as long as you are using cranelift-frontend to generate your clif, you can use declare_{value,var}_needs_stack_map methods and get them spilled around safepoints appropriately

Diego Antonio Rosario Palomino (Oct 30 2024 at 16:01):

;; NEW: stack map annotation on the safepoint.
call $f(), stack_map = [i64 @ ss0]

Chris Fallin (Oct 30 2024 at 16:02):

Yeah, sorry, ignore me, I (i) need more sleep and (ii) have already paged much of this out

Chris Fallin (Oct 30 2024 at 16:02):

fitzgen (he/him) (Oct 30 2024 at 16:03):

you get the stack maps as part of the result of compilation from cranelift-codegen:

fitzgen (he/him) (Oct 30 2024 at 16:04):

Diego Antonio Rosario Palomino (Oct 30 2024 at 16:05):

GRIN is written in haskell and ( soon ) idris2. To avoid having to write bindings, can i acess stackmap generation at the textual representation level. Generate a textual representation of cranelift that can ask stackmap generation. Or is that something that can only be done with the crate api?

fitzgen (he/him) (Oct 30 2024 at 16:06):

fitzgen (he/him) (Oct 30 2024 at 16:07):

wasmtime/cranelift/filetests/filetests/parser/user_stack_maps.clif at main · bytecodealliance/wasmtime

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

Diego Antonio Rosario Palomino (Oct 30 2024 at 16:17):

I see, but what are possible runtime representations?
( note : we are only targetting 64 bit platforms )

fitzgen (he/him) (Oct 30 2024 at 16:19):

there is no stable serialized format defined by Cranelift, although the various bits do derive(serde::Serialize) so you could write a little bit of Rust code to take them and encode them to json/bincode/whatever format you want

Diego Antonio Rosario Palomino (Oct 30 2024 at 16:24):

I gather when i use a runtime with gc, i am supposed to convert the stackmap compile time representation, to a representation efficient for the combination of runtime and platforms used by my project. If so, i would like to know which ones are reasonable options

A vector of indexes would be faster to access and create than bitmaps, but would take more space

fitzgen (he/him) (Oct 30 2024 at 16:28):

wasmtime/crates/cranelift/src/compiler.rs at a2025f428b5836b34618a5392bb8636c8c60ff40 · bytecodealliance/wasmtime

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

wasmtime/crates/environ/src/module_artifacts.rs at main · bytecodealliance/wasmtime

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

wasmtime/crates/environ/src/stack_map.rs at main · bytecodealliance/wasmtime

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

wasmtime/crates/environ/src/stack_map.rs at main · bytecodealliance/wasmtime

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

wasmtime/crates/wasmtime/src/runtime/store.rs at main · bytecodealliance/wasmtime

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

Diego Antonio Rosario Palomino (Oct 30 2024 at 16:30):

Diego Antonio Rosario Palomino (Oct 30 2024 at 16:36):

Another question. If user stackmaps already require cranelift code to spill and reload pointers at safepoints, why shouldnt i spill them to an structure dedicated to holding pointers? That way keeping track of stackmaps becomes unnecessary.

I think this is similar to the tecnique of using two stacks, one for values and another for pointers

fitzgen (he/him) (Oct 30 2024 at 16:37):

it should be efficient, but if you want the absolute fastest stack-tracing possible, then you can avoid cranelift stack maps and just keep a custom shadow stack for GC refs that you push to / pop from at runtime and then your stack scanning is literally just iterating over flat memory with zero stack map interpretation or anything. this ofc has higher runtime overheads (classic mutator throughput vs GC latency style of trade off)

fitzgen (he/him) (Oct 30 2024 at 16:38):

this is basically what I just said above, replying to your "curious whether bitmaps would be efficient"

fitzgen (he/him) (Oct 30 2024 at 16:39):

it is more runtime overhead because of less locality, needing to pop, higher register pressure, etc... but it is a very valid implementation choice. also pretty easy to implement and debug.

Diego Antonio Rosario Palomino (Oct 30 2024 at 16:45):

Locality could be preserved with a flat representation such as a vector. In think in principle this isnt slower than spilling and reloading to the canonical language stack ( commonly called c stack ? ) . Would it be slower because cranelift performs a wide array of optimizations to the language stack?

fitzgen (he/him) (Oct 30 2024 at 16:59):

it is another stack, different from the current one, which is going to hurt locality and increase cache pressure. might not be visible all the time (depends on the rest of the working set and its cache usage). you also have to keep around a pointer to the GC ref shadow stack, which increases register pressure. compare this to the native stack, which is always right there in SP (and also sometimes in FP). again, this might not be visible if the rest of the program is only using 4 live values in its hot loop or something, but might have a larger impact on others.

fitzgen (he/him) (Oct 30 2024 at 17:00):

honestly, I suggest just doing whatever you think will be easiest, and then coming back and revisiting if it is ever a perf issue. should be easy enough to switch from one approach to the other.

Diego Antonio Rosario Palomino (Oct 31 2024 at 02:05):

GRIN will need a gc fit for functional languages. Which probably means generational collection. Plus a longer term goal is becomming a full haskell backend. If we wanted to use the existing runtime this would impose heavy restrictions

Diego Antonio Rosario Palomino (Oct 31 2024 at 02:09):

To aid in targetting Cranelift from languages without tradicional bindings, could the IR support serialization from existing formats such as json and s expressions ?
A pretty printer wouldnt be needed from the many systems that support automatic (de)serialization

fitzgen (he/him) (Oct 31 2024 at 17:10):

Diego Antonio Rosario Palomino (Oct 31 2024 at 17:10):

fitzgen (he/him) (Oct 31 2024 at 17:11):

if you enable the serde feature of cranelift-codegen, then the IR all has derive(Serialize, Deserialize) which can be used with serde_json or bincode or etc...

fitzgen (he/him) (Oct 31 2024 at 17:12):

no guarantees that the serialization format won't change across cranelift versions, ofc, same as the .clif text format

Diego Antonio Rosario Palomino (Nov 01 2024 at 02:58):

In that case an idris2 or haskell cranelift package would need integration testing to be practical

bjorn3 (Nov 01 2024 at 06:48):

The usage of serde we have is incompatible with serde_json. In particular we use integer as map keys, which JSON doesn't allow. Also the encoding of the Layout is funky. And the header contains the Cranelift version string, which needs to match the consumer. I also think we are using integers in the place of instruction names. It is mostly useful as format for serializing IR you already created using Cranelift's API rather than for external producers. The text format is more stable.

Stream: cranelift

Topic: Using stackmaps at runtime

Diego Antonio Rosario Palomino (Oct 30 2024 at 15:52):

Chris Fallin (Oct 30 2024 at 15:55):

Chris Fallin (Oct 30 2024 at 15:55):

Diego Antonio Rosario Palomino (Oct 30 2024 at 15:58):

Chris Fallin (Oct 30 2024 at 15:59):

fitzgen (he/him) (Oct 30 2024 at 15:59):

Chris Fallin (Oct 30 2024 at 16:00):

fitzgen (he/him) (Oct 30 2024 at 16:00):

Diego Antonio Rosario Palomino (Oct 30 2024 at 16:01):

Chris Fallin (Oct 30 2024 at 16:02):

Chris Fallin (Oct 30 2024 at 16:02):

fitzgen (he/him) (Oct 30 2024 at 16:03):

fitzgen (he/him) (Oct 30 2024 at 16:04):

Diego Antonio Rosario Palomino (Oct 30 2024 at 16:05):

fitzgen (he/him) (Oct 30 2024 at 16:06):

fitzgen (he/him) (Oct 30 2024 at 16:07):

Diego Antonio Rosario Palomino (Oct 30 2024 at 16:17):

fitzgen (he/him) (Oct 30 2024 at 16:19):

Diego Antonio Rosario Palomino (Oct 30 2024 at 16:24):

fitzgen (he/him) (Oct 30 2024 at 16:28):

Diego Antonio Rosario Palomino (Oct 30 2024 at 16:30):

Diego Antonio Rosario Palomino (Oct 30 2024 at 16:36):

fitzgen (he/him) (Oct 30 2024 at 16:37):

fitzgen (he/him) (Oct 30 2024 at 16:38):

fitzgen (he/him) (Oct 30 2024 at 16:39):

Diego Antonio Rosario Palomino (Oct 30 2024 at 16:45):

fitzgen (he/him) (Oct 30 2024 at 16:59):

fitzgen (he/him) (Oct 30 2024 at 17:00):

Diego Antonio Rosario Palomino (Oct 31 2024 at 02:05):

Diego Antonio Rosario Palomino (Oct 31 2024 at 02:09):

fitzgen (he/him) (Oct 31 2024 at 17:10):

Diego Antonio Rosario Palomino (Oct 31 2024 at 17:10):

fitzgen (he/him) (Oct 31 2024 at 17:11):

fitzgen (he/him) (Oct 31 2024 at 17:12):

Diego Antonio Rosario Palomino (Nov 01 2024 at 02:58):

bjorn3 (Nov 01 2024 at 06:48):