wasmtime / PR #10404 Store stack maps in an ELF section · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #10404 Store stack maps in an ELF section

Wasmtime GitHub notifications bot (Mar 15 2025 at 15:42):

alexcrichton requested wasmtime-compiler-reviewers for a review on PR #10404.

Wasmtime GitHub notifications bot (Mar 15 2025 at 15:42):

alexcrichton requested abrown for a review on PR #10404.

Wasmtime GitHub notifications bot (Mar 15 2025 at 15:42):

alexcrichton opened PR #10404 from alexcrichton:stack-maps-out-of-line to bytecodealliance:main:

This commit moves the storage of stack maps from being embedded within serde-encoded information to instead being stored in a separate ELF section in the final executable. The motivation for this is to make this more easily debuggable with a wasmtime objdump command in the future but this additionally should have the nice side effect of making non-stack-maps modules have smaller encoded information (no need to encode an empty list) and additionally make stack-maps-using-modules faster to decode (no serde decoding, it's already "decoded").

This implements a scheme similar to the address map section where there's a "builder" for the section and then a separate half to decode the section. The same basic encoding, a bit map, is used. This is likely going to make accessing stack maps slightly slower, but if that's an issue we can tweak the representation and align things and/or use usize or such.

Wasmtime GitHub notifications bot (Mar 15 2025 at 15:42):

alexcrichton requested pchickey for a review on PR #10404.

Wasmtime GitHub notifications bot (Mar 15 2025 at 15:42):

alexcrichton requested wasmtime-core-reviewers for a review on PR #10404.

Wasmtime GitHub notifications bot (Mar 15 2025 at 16:45):

github-actions[bot] commented on PR #10404:

Subscribe to Label Action

cc @saulecabrera

<details>
This issue or pull request has been labeled: "cranelift", "wasmtime:api", "winch"

Thus the following users have been cc'd because of the following labels:

saulecabrera: winch

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (Mar 17 2025 at 16:37):

pchickey submitted PR review.

Wasmtime GitHub notifications bot (Mar 17 2025 at 16:57):

abrown submitted PR review:

Makes sense to me! I just noted some doubts I had while reading this for those who are more familiar with stack maps and their usage.

Wasmtime GitHub notifications bot (Mar 17 2025 at 16:57):

abrown created PR review comment:

Are we sure we want to re-export U32Bytes in the public API? I suspect the answer is "yes" but just wanted to note this.

Wasmtime GitHub notifications bot (Mar 17 2025 at 16:57):

abrown created PR review comment:

/// * Stack map data as 4-byte little endian integers.
Without looking below to see what this should be... is this correct?

Wasmtime GitHub notifications bot (Mar 17 2025 at 18:45):

alexcrichton updated PR #10404.

Wasmtime GitHub notifications bot (Mar 17 2025 at 18:45):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 17 2025 at 18:45):

alexcrichton created PR review comment:

Whoops, yes!

Wasmtime GitHub notifications bot (Mar 17 2025 at 18:46):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 17 2025 at 18:46):

alexcrichton created PR review comment:

Yeah it's fine here in that wasmtime-environ is a "private crate" where we don't consider it part of Wasmtime's public API, just an internal implementation detail. We can probably do more to document that at the crate itself though.

Wasmtime GitHub notifications bot (Mar 17 2025 at 19:30):

fitzgen submitted PR review:

Overall, I really like this approach. Have opinions on some of the details tho.

Wasmtime GitHub notifications bot (Mar 17 2025 at 19:30):

fitzgen created PR review comment:

The idea here is that you want to use something of fixed size, rather than usize, for the bitsets serialized into the stack map section?

It shouldn't actually matter, since we can only ever run .cwasms that have the same size usize as the host (whether native or pulley) right? Given that, is it worth the complications that become necessary here to support non-usize scalars inside a compound bit set?

Wasmtime GitHub notifications bot (Mar 17 2025 at 19:30):

fitzgen created PR review comment:

Ah so the * Stack map data above is just a bag of bytes of the rest of the section, and when you get an offset into that bag of bytes, it is guaranteed to point at one of these? Could you clarify these binary format docs that the trailing data is a bunch of variably-sized data and that the initial PC and offset fields are essentially an index for searching through it without decoding every single stack map?

Wasmtime GitHub notifications bot (Mar 17 2025 at 19:30):

fitzgen created PR review comment:

This is kind of a strange API for this data structure to support. I'm not sure what it really is supposed to "mean". It feels like, if you want this API, you should actually just be working with a Vec<ScalarBitSet<T>>, because the underlying scalars used by this data structure really are just an implementation and data-representation detail and don't have any meaning on which values are vs are not in the set itself.

Wasmtime GitHub notifications bot (Mar 17 2025 at 19:30):

fitzgen created PR review comment:

Maybe avoid pushing to section if offsets is empty?

The name offsets was pretty confusing to me, since I was initially thinking of instruction offsets (i.e. the offset (ne code_offset) variable) and not frame offsets when reading this. Can we either keep the name code_offset for offset and/or rename offsets to {frame,stack}_offsets?

Wasmtime GitHub notifications bot (Mar 17 2025 at 19:30):

fitzgen created PR review comment:

Rename this BITS_PER_SCALAR?

Wasmtime GitHub notifications bot (Mar 17 2025 at 19:30):

fitzgen created PR review comment:

Something to consider for the future: if we frequently have multiple sequential entries for different PCs but which have the same stack slots, eg
...
0x1dc: offset of [8, 12]
0x124: offset of [8, 12, 24] copy 1
0x142: offset of [8, 12, 24] copy 2
0x15a: offset of [8, 12, 24] copy 3
0x166: offset of [8]
...
then it may make sense for each entry in the index to store non-overlapping PC ranges, rather than exact PCs, and we could effectively dedupe the index entries and the stack map data. That is, the previous example would become
...
0x1dc..0x1dd: offset of [8, 12]
0x124..0x15b: offset of [8, 12, 24] (only copy)
0x166..0x167: offset of [8]
...
The downsides are that

We would need to change Cranelift to actually emit empty stack maps for safepoints without any live GC refs, otherwise if we have (pc=0x1234, [8]); (pc=0x1238, []); (pc=0x123b, [8]) and we don't see that middle entry in this builder, then we risk using [8] as our stack map at pc 0x1238, which is extending a dead gc ref's lifetime at best and is giving the collector uninitialized data at worst.

Relatedly, we lose our ability to catch bugs where the return address PC we are tracing isn't an exact match for a stack map entry.

These are actually pretty scary, so maybe we don't want to do this, even if it would let us make these binary search indices much smaller.

All that said, we can actually already dedupe the stack map _data_ if we want to, and have multiple index entries point to the same stack map data (even if they aren't contiguous!) with the encoding scheme already in use in this PR. We just need to hash cons and cache stack-map-data to encoded offset in this builder. This doesn't have any of the downsides from above. Seems like it would be a pure win.

Wasmtime GitHub notifications bot (Mar 17 2025 at 19:30):

fitzgen created PR review comment:

In addition to what Andrew commented, how many integers? N doesn't seem right, it seems like we should logically have N compound bit sets, but each one can be variably sized. So something isn't clear here.

Wasmtime GitHub notifications bot (Mar 17 2025 at 19:30):

fitzgen created PR review comment:

Suggest adding "because our GC ref's stack slots are always 4-byte aligned" somewhere in this sentence.

Wasmtime GitHub notifications bot (Mar 17 2025 at 19:30):

fitzgen created PR review comment:

So back to iter_words: the usage here makes me think we should instead just expose a method like
impl CompoundBitSet {
    pub fn as_raw(&self) -> &[u32] { ... }
}
method or something, have it always operate on u32, and maybe (I haven't looked at the updated usage yet) add something like the following to allow zero-copy usage:
impl CompoundBitSet {
    pub fn ref_from_raw<F, T>(raw: &[u32], f: F) -> T
    where
        F: for<'a> FnOnce(&'a CompoundBitSet) -> T
    {
        // unsafely create a CompoundBitSet wrapping the raw data,
        // pass a ref to the bitset into the closure,
        // mem::forget the bitset to avoid Vec dtors
    }
}
rather than making it generic over the internal storage and doing a whole iter_words thing.

Wasmtime GitHub notifications bot (Mar 17 2025 at 19:30):

fitzgen created PR review comment:

Or alternatively, should we just fix the compound bit set's internal scalar bit set to u64 or u32 scalars, instead of usize?

Wasmtime GitHub notifications bot (Mar 17 2025 at 19:30):

fitzgen created PR review comment:

The CompoundBitSet::ref_from_raw thing would help tidy up some of this stuff too, so that it doesn't have to have a copy of the bitset logic inline here.

Wasmtime GitHub notifications bot (Mar 17 2025 at 19:30):

fitzgen created PR review comment:

Since we have range.end handy, it probably makes sense to debug assert that range.start <= range.start + code_offset < range.end

Wasmtime GitHub notifications bot (Mar 17 2025 at 19:36):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Mar 17 2025 at 19:36):

fitzgen created PR review comment:

Could avoid the branded lifetime (so that the thing could be stored in the usage site's iterator) by doing a std::cell::Ref-esque thing instead:

pub struct BorrowedCompoundBitSet<'a> {
    bitset: core::mem::ManuallyDrop<CompoundBitSet>,
    _borrow: core::marker::PhantomData<&'a [u32]>,
}

impl core::ops::Deref for BorrowedCompoundBitSet<'_> {
    type Target = CompoundBitSet;
    // ...
}

Wasmtime GitHub notifications bot (Mar 17 2025 at 19:37):

fitzgen edited PR review comment.

Wasmtime GitHub notifications bot (Mar 17 2025 at 21:51):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 17 2025 at 21:51):

alexcrichton created PR review comment:

This is an attempt to multiplex the (IMO correct) default behavior of pointer-sized-by-default with the cross-compilation behavior of "always use fixed-size things as it's easier". I don't want to use usize in the compiled ELF as that makes cross-compilation and debugging cross-compiled artifacts a bit harder.

Wasmtime GitHub notifications bot (Mar 17 2025 at 21:53):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 17 2025 at 21:53):

alexcrichton created PR review comment:

I'm not sure I understand your confusion, but I'm happy to rename this and/or document it better. IMO CompoundBitSet is by definition a Vec<ScalarBitSet<T>> and the point of it is to do all the vector indexing for you so you don't have to think about it. In that sense I wouldn't want to work with Vec<ScalarBitSet<T>> raw and it also makes sense to me to expose an iterator or slice accessor as that's the definition of the representation.

Wasmtime GitHub notifications bot (Mar 17 2025 at 21:54):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 17 2025 at 21:54):

alexcrichton created PR review comment:

For (1) agreed and that happens within StackMapSection building. For (2) I'll add some assertions, good point!

Wasmtime GitHub notifications bot (Mar 17 2025 at 21:57):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 17 2025 at 21:57):

alexcrichton created PR review comment:

I like your latter idea, having a "pool of sets" and when we insert a stack map we shove it in a hash map and dedupe based on that. Agreed it's a pure win with no real degree of complexity overhead. Are you thinking that should be done here? (I'd sort of prefer to defer that to analyze a "real world module" with a lot of stack maps)

Wasmtime GitHub notifications bot (Mar 17 2025 at 21:58):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 17 2025 at 21:58):

alexcrichton created PR review comment:

I realize I can probably use ScalarBitSet::iter to simplify a lot of this. I'd prefer to avoid too wonky of an abstraction to use CompoundBitSet literally somehow, but do you think that's an acceptable complexity tradeoff?

Wasmtime GitHub notifications bot (Mar 17 2025 at 21:59):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 17 2025 at 21:59):

alexcrichton created PR review comment:

Er, sorry.

To answer your original thoughts on this comment, I'd sort of defer to my response above. My literal response just now above this one is actually a response to your thought below

Wasmtime GitHub notifications bot (Mar 17 2025 at 21:59):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Mar 17 2025 at 21:59):

fitzgen created PR review comment:

Deduping the stack map data can definitely be a separate PR

Wasmtime GitHub notifications bot (Mar 17 2025 at 22:21):

alexcrichton updated PR #10404.

Wasmtime GitHub notifications bot (Mar 17 2025 at 22:21):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Mar 17 2025 at 22:21):

alexcrichton created PR review comment:

I've tried to document this better with some more art/boxes, but let me know if I can improve anything

Wasmtime GitHub notifications bot (Mar 17 2025 at 22:22):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Mar 17 2025 at 22:22):

fitzgen created PR review comment:

How about this:

we call this method iter_scalars

the iterator yields ScalarBitSet<T> items instead of T directly

Wasmtime GitHub notifications bot (Mar 17 2025 at 23:12):

alexcrichton updated PR #10404.

Wasmtime GitHub notifications bot (Mar 19 2025 at 19:17):

alexcrichton commented on PR #10404:

@fitzgen did you want to review this over again after the recent changes? Or have more thoughts?

Wasmtime GitHub notifications bot (Mar 20 2025 at 15:13):

fitzgen submitted PR review:

Thanks

Wasmtime GitHub notifications bot (Mar 20 2025 at 19:31):

alexcrichton updated PR #10404.

Wasmtime GitHub notifications bot (Mar 20 2025 at 19:31):

alexcrichton has enabled auto merge for PR #10404.

Wasmtime GitHub notifications bot (Mar 20 2025 at 20:08):

alexcrichton merged PR #10404.

Last updated: Apr 16 2025 at 22:03 UTC