Stream: cranelift

Topic: Using stackmaps at runtime


view this post on Zulip Diego Antonio Rosario Palomino (Oct 30 2024 at 15:52):

Hello, i am considering using cranelift for the new backend of GRIN, a whole program compiler i am working on. I gather stackmaps is the canonical way to scan the stack with GC. So i would like to know how code is supposed to use it at runtime.

Eg: at runtime will stackmaps be stored as vector of indices into stack elements that are pointers? will it be stored as a bitmap that marks pointers?

view this post on Zulip Chris Fallin (Oct 30 2024 at 15:55):

We actually removed stackmaps at the Cranelift level recently! See @fitzgen (he/him) 's post here: https://bytecodealliance.org/articles/new-stack-maps-for-wasmtime

As part of implementing the WebAssembly garbage collection proposal in Wasmtime,which is an ongoing process, we’ve overhauled the stack map infrastructure inCranelift. This post will explain what stack maps are, why we needed to changeth...

view this post on Zulip Chris Fallin (Oct 30 2024 at 15:55):

the idea is that the embedder/user of Cranelift has a better idea of exactly what the invariants are and what they need, so it's better to push it up one level

view this post on Zulip Diego Antonio Rosario Palomino (Oct 30 2024 at 15:58):

I was referring to that documentation. I gather the front end to cranelift is responsible to spill pointers and ask the stackmap to be generated at safepoints. But cranelift still generates it. Presumably for runtime code to embedd in a custom format such as the ones i mentioned

view this post on Zulip Chris Fallin (Oct 30 2024 at 15:59):

No, Cranelift no longer does any of that -- it doesn't have the concept of safepoints or stackmaps anymore, all of that lives at the Wasmtime level

view this post on Zulip fitzgen (he/him) (Oct 30 2024 at 15:59):

it lives at the cranelift-frontend level

view this post on Zulip Chris Fallin (Oct 30 2024 at 16:00):

ah! that's right, sorry, my brain is too often scoped to "cranelift-codegen == universe"

view this post on Zulip fitzgen (he/him) (Oct 30 2024 at 16:00):

so as long as you are using cranelift-frontend to generate your clif, you can use declare_{value,var}_needs_stack_map methods and get them spilled around safepoints appropriately

view this post on Zulip Diego Antonio Rosario Palomino (Oct 30 2024 at 16:01):

Chris Fallin said:

No, Cranelift no longer does any of that -- it doesn't have the concept of safepoints or stackmaps anymore, all of that lives at the Wasmtime level

;; NEW: stack map annotation on the safepoint.
call $f(), stack_map = [i64 @ ss0]

i thought this is cranelift generating stackmaps when asked to

view this post on Zulip Chris Fallin (Oct 30 2024 at 16:02):

Yeah, sorry, ignore me, I (i) need more sleep and (ii) have already paged much of this out

view this post on Zulip Chris Fallin (Oct 30 2024 at 16:02):

@fitzgen (he/him) is authoritative here

view this post on Zulip fitzgen (he/him) (Oct 30 2024 at 16:03):

Diego Antonio Rosario Palomino said:

Eg: at runtime will stackmaps be stored as vector of indices into stack elements that are pointers? will it be stored as a bitmap that marks pointers?

you get the stack maps as part of the result of compilation from cranelift-codegen:

view this post on Zulip fitzgen (he/him) (Oct 30 2024 at 16:04):

it is a table of (code offset, number of words mapped, stack map) triples

view this post on Zulip Diego Antonio Rosario Palomino (Oct 30 2024 at 16:05):

GRIN is written in haskell and ( soon ) idris2. To avoid having to write bindings, can i acess stackmap generation at the textual representation level. Generate a textual representation of cranelift that can ask stackmap generation. Or is that something that can only be done with the crate api?

view this post on Zulip fitzgen (he/him) (Oct 30 2024 at 16:06):

let me double check

view this post on Zulip fitzgen (he/him) (Oct 30 2024 at 16:07):

yeah you should be able to specify them in the CLIF text format: https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/filetests/filetests/parser/user_stack_maps.clif#L19-L20

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

view this post on Zulip Diego Antonio Rosario Palomino (Oct 30 2024 at 16:17):

fitzgen (he/him) said:

Diego Antonio Rosario Palomino said:

Eg: at runtime will stackmaps be stored as vector of indices into stack elements that are pointers? will it be stored as a bitmap that marks pointers?

you get the stack maps as part of the result of compilation from cranelift-codegen:

I see, but what are possible runtime representations?
( note : we are only targetting 64 bit platforms )

view this post on Zulip fitzgen (he/him) (Oct 30 2024 at 16:19):

there is no stable serialized format defined by Cranelift, although the various bits do derive(serde::Serialize) so you could write a little bit of Rust code to take them and encode them to json/bincode/whatever format you want

https://docs.rs/cranelift-codegen/latest/src/cranelift_codegen/ir/user_stack_maps.rs.html#126-129

view this post on Zulip Diego Antonio Rosario Palomino (Oct 30 2024 at 16:24):

I gather when i use a runtime with gc, i am supposed to convert the stackmap compile time representation, to a representation efficient for the combination of runtime and platforms used by my project. If so, i would like to know which ones are reasonable options

A vector of indexes would be faster to access and create than bitmaps, but would take more space

view this post on Zulip fitzgen (he/him) (Oct 30 2024 at 16:28):

wasmtime doesn't really change the format too much, leaves it as a bitmap:

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.
A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.
A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.
A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.
A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

view this post on Zulip Diego Antonio Rosario Palomino (Oct 30 2024 at 16:30):

Thanks. I was curious whether bitmaps would be efficient

view this post on Zulip Diego Antonio Rosario Palomino (Oct 30 2024 at 16:36):

Another question. If user stackmaps already require cranelift code to spill and reload pointers at safepoints, why shouldnt i spill them to an structure dedicated to holding pointers? That way keeping track of stackmaps becomes unnecessary.

I think this is similar to the tecnique of using two stacks, one for values and another for pointers

view this post on Zulip fitzgen (he/him) (Oct 30 2024 at 16:37):

it should be efficient, but if you want the absolute fastest stack-tracing possible, then you can avoid cranelift stack maps and just keep a custom shadow stack for GC refs that you push to / pop from at runtime and then your stack scanning is literally just iterating over flat memory with zero stack map interpretation or anything. this ofc has higher runtime overheads (classic mutator throughput vs GC latency style of trade off)

view this post on Zulip fitzgen (he/him) (Oct 30 2024 at 16:38):

Diego Antonio Rosario Palomino said:

Another question. If user stackmaps already require cranelift code to spill and reload pointers at safepoints, why shouldnt i spill them to an structure dedicated to holding pointers? That way keeping track of stackmaps becomes unnecessary.

I think this is similar to the tecnique of using two stacks, one for values and another for pointers

this is basically what I just said above, replying to your "curious whether bitmaps would be efficient"

view this post on Zulip fitzgen (he/him) (Oct 30 2024 at 16:39):

it is more runtime overhead because of less locality, needing to pop, higher register pressure, etc... but it is a very valid implementation choice. also pretty easy to implement and debug.

view this post on Zulip Diego Antonio Rosario Palomino (Oct 30 2024 at 16:45):

Locality could be preserved with a flat representation such as a vector. In think in principle this isnt slower than spilling and reloading to the canonical language stack ( commonly called c stack ? ) . Would it be slower because cranelift performs a wide array of optimizations to the language stack?

view this post on Zulip fitzgen (he/him) (Oct 30 2024 at 16:59):

it is another stack, different from the current one, which is going to hurt locality and increase cache pressure. might not be visible all the time (depends on the rest of the working set and its cache usage). you also have to keep around a pointer to the GC ref shadow stack, which increases register pressure. compare this to the native stack, which is always right there in SP (and also sometimes in FP). again, this might not be visible if the rest of the program is only using 4 live values in its hot loop or something, but might have a larger impact on others.

view this post on Zulip fitzgen (he/him) (Oct 30 2024 at 17:00):

but using a shadow stack is definitely a valid approach

honestly, I suggest just doing whatever you think will be easiest, and then coming back and revisiting if it is ever a perf issue. should be easy enough to switch from one approach to the other.

view this post on Zulip Diego Antonio Rosario Palomino (Oct 31 2024 at 02:05):

GRIN will need a gc fit for functional languages. Which probably means generational collection. Plus a longer term goal is becomming a full haskell backend. If we wanted to use the existing runtime this would impose heavy restrictions

view this post on Zulip Diego Antonio Rosario Palomino (Oct 31 2024 at 02:09):

To aid in targetting Cranelift from languages without tradicional bindings, could the IR support serialization from existing formats such as json and s expressions ?
A pretty printer wouldnt be needed from the many systems that support automatic (de)serialization

view this post on Zulip fitzgen (he/him) (Oct 31 2024 at 17:10):

you mean instead of the .clif text format?

view this post on Zulip Diego Antonio Rosario Palomino (Oct 31 2024 at 17:10):

As an alternative, yes

view this post on Zulip fitzgen (he/him) (Oct 31 2024 at 17:11):

if you enable the serde feature of cranelift-codegen, then the IR all has derive(Serialize, Deserialize) which can be used with serde_json or bincode or etc...

view this post on Zulip fitzgen (he/him) (Oct 31 2024 at 17:12):

no guarantees that the serialization format won't change across cranelift versions, ofc, same as the .clif text format

view this post on Zulip Diego Antonio Rosario Palomino (Nov 01 2024 at 02:58):

In that case an idris2 or haskell cranelift package would need integration testing to be practical

view this post on Zulip bjorn3 (Nov 01 2024 at 06:48):

The usage of serde we have is incompatible with serde_json. In particular we use integer as map keys, which JSON doesn't allow. Also the encoding of the Layout is funky. And the header contains the Cranelift version string, which needs to match the consumer. I also think we are using integers in the place of instruction names. It is mostly useful as format for serializing IR you already created using Cranelift's API rather than for external producers. The text format is more stable.


Last updated: Jan 24 2025 at 00:11 UTC