Stream: git-wasmtime

Topic: wasmtime / PR #1832 externref: implement stack map-based ...


view this post on Zulip Wasmtime GitHub notifications bot (Jun 05 2020 at 23:20):

fitzgen opened PR #1832 from externref-stack-maps to master:

For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a VMExternRef avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the VMExternRef is cloned.

When passing a VMExternRef into compiled Wasm code, we don't want to do reference count mutations for every compiled local.{get,set}, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing VMExternRefs somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of VMExternRefs that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of VMExternRefs inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the VMExternRefs that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set.

The VMExternRefActivationsTable implements the over-approximized set of VMExternRefs referenced by Wasm activations. Calling a Wasm function and passing it a VMExternRef moves the VMExternRef into the table, and the compiled Wasm function logically "borrows" the VMExternRef from the table. Similarly, global.get and table.get operations clone the gotten
VMExternRef into the VMExternRefActivationsTable and then "borrow" the reference out of the table.

When a VMExternRef is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the VMExternRefActivationsTable and the reference count from the table will be dropped at the next GC).

For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804

Depends on https://github.com/rust-lang/backtrace-rs/pull/341

view this post on Zulip Wasmtime GitHub notifications bot (Jun 05 2020 at 23:20):

fitzgen requested alexcrichton for a review on PR #1832.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 06 2020 at 09:31):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 06 2020 at 09:31):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 06 2020 at 09:31):

bjorn3 created PR Review Comment:

Reminder to uncomment or remove.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 06 2020 at 09:31):

bjorn3 created PR Review Comment:

        self.infos.sort_unstable_by_key(|info| info.code_offset);

view this post on Zulip Wasmtime GitHub notifications bot (Jun 06 2020 at 09:31):

bjorn3 created PR Review Comment:

    /// The range of PCs that this module covers. Different modules must

view this post on Zulip Wasmtime GitHub notifications bot (Jun 06 2020 at 09:31):

bjorn3 created PR Review Comment:

Maybe panic in the drop implementation instead and add an unsafe finish function?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 06 2020 at 09:31):

bjorn3 created PR Review Comment:

            flag_builder.enable("enable_safepoints").unwrap();

view this post on Zulip Wasmtime GitHub notifications bot (Jun 06 2020 at 09:31):

bjorn3 created PR Review Comment:

        if enable {
            self.flags.enable("enable_safepoints").unwrap();
        }

view this post on Zulip Wasmtime GitHub notifications bot (Jun 08 2020 at 16:41):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 08 2020 at 16:41):

alexcrichton created PR Review Comment:

I think this line can now be removed

view this post on Zulip Wasmtime GitHub notifications bot (Jun 08 2020 at 16:41):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 08 2020 at 16:41):

alexcrichton created PR Review Comment:

We've got a lot of various registries floating around wasmtime, would it be possible to insert this into an existing registry somehow? For example most of the code here looks like it's copying what we do for frame information.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 08 2020 at 16:41):

alexcrichton created PR Review Comment:

FWIW I've been trying to clean this up over time where Instance doesn't do much memory management of its internals. Is it possible to package up this and the stack map registry elsewhere? I'm not really sure entirely what that would look like but I think it would be best if we could keep Instance having a pretty "raw" feeling rather than holding onto various items.

(e.g. shoving this into Store and passing it through for when we call wasm)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 08 2020 at 16:41):

alexcrichton created PR Review Comment:

Also, to keep adding to this, I'd like to get around to cleaning up the interrupts part eventually too

view this post on Zulip Wasmtime GitHub notifications bot (Jun 08 2020 at 16:41):

alexcrichton created PR Review Comment:

Reading over this, I also don't think that this needs to hold onto this information? It looks like the store can hold onto it for all instances?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 08 2020 at 16:41):

alexcrichton created PR Review Comment:

Sort of continuing my comment from earlier, but this is an example where it would be great to not have this duplication. Ideally there'd be one "register stuff" call, although I'm not sure yet if this is fundamentally required to be two separate steps.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 08 2020 at 16:41):

alexcrichton created PR Review Comment:

Looks like this file become executable?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 08 2020 at 16:41):

alexcrichton created PR Review Comment:

I think I may be missing something, but how come this is stored in an Engine? (vs a Store)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 09 2020 at 16:48):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 09 2020 at 16:48):

fitzgen created PR Review Comment:

I really need to figure out why the heck this is happening...

view this post on Zulip Wasmtime GitHub notifications bot (Jun 11 2020 at 18:24):

fitzgen updated PR #1832 from externref-stack-maps to master:

For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a VMExternRef avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the VMExternRef is cloned.

When passing a VMExternRef into compiled Wasm code, we don't want to do reference count mutations for every compiled local.{get,set}, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing VMExternRefs somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of VMExternRefs that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of VMExternRefs inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the VMExternRefs that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set.

The VMExternRefActivationsTable implements the over-approximized set of VMExternRefs referenced by Wasm activations. Calling a Wasm function and passing it a VMExternRef moves the VMExternRef into the table, and the compiled Wasm function logically "borrows" the VMExternRef from the table. Similarly, global.get and table.get operations clone the gotten
VMExternRef into the VMExternRefActivationsTable and then "borrow" the reference out of the table.

When a VMExternRef is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the VMExternRefActivationsTable and the reference count from the table will be dropped at the next GC).

For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804

Depends on https://github.com/rust-lang/backtrace-rs/pull/341

view this post on Zulip Wasmtime GitHub notifications bot (Jun 11 2020 at 21:48):

fitzgen updated PR #1832 from externref-stack-maps to master:

For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a VMExternRef avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the VMExternRef is cloned.

When passing a VMExternRef into compiled Wasm code, we don't want to do reference count mutations for every compiled local.{get,set}, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing VMExternRefs somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of VMExternRefs that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of VMExternRefs inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the VMExternRefs that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set.

The VMExternRefActivationsTable implements the over-approximized set of VMExternRefs referenced by Wasm activations. Calling a Wasm function and passing it a VMExternRef moves the VMExternRef into the table, and the compiled Wasm function logically "borrows" the VMExternRef from the table. Similarly, global.get and table.get operations clone the gotten
VMExternRef into the VMExternRefActivationsTable and then "borrow" the reference out of the table.

When a VMExternRef is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the VMExternRefActivationsTable and the reference count from the table will be dropped at the next GC).

For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804

Depends on https://github.com/rust-lang/backtrace-rs/pull/341

view this post on Zulip Wasmtime GitHub notifications bot (Jun 11 2020 at 21:50):

fitzgen updated PR #1832 from externref-stack-maps to master:

For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a VMExternRef avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the VMExternRef is cloned.

When passing a VMExternRef into compiled Wasm code, we don't want to do reference count mutations for every compiled local.{get,set}, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing VMExternRefs somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of VMExternRefs that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of VMExternRefs inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the VMExternRefs that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set.

The VMExternRefActivationsTable implements the over-approximized set of VMExternRefs referenced by Wasm activations. Calling a Wasm function and passing it a VMExternRef moves the VMExternRef into the table, and the compiled Wasm function logically "borrows" the VMExternRef from the table. Similarly, global.get and table.get operations clone the gotten
VMExternRef into the VMExternRefActivationsTable and then "borrow" the reference out of the table.

When a VMExternRef is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the VMExternRefActivationsTable and the reference count from the table will be dropped at the next GC).

For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804

Depends on https://github.com/rust-lang/backtrace-rs/pull/341

view this post on Zulip Wasmtime GitHub notifications bot (Jun 11 2020 at 21:51):

fitzgen requested alexcrichton for a review on PR #1832.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 11 2020 at 22:17):

fitzgen updated PR #1832 from externref-stack-maps to master:

For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a VMExternRef avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the VMExternRef is cloned.

When passing a VMExternRef into compiled Wasm code, we don't want to do reference count mutations for every compiled local.{get,set}, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing VMExternRefs somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of VMExternRefs that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of VMExternRefs inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the VMExternRefs that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set.

The VMExternRefActivationsTable implements the over-approximized set of VMExternRefs referenced by Wasm activations. Calling a Wasm function and passing it a VMExternRef moves the VMExternRef into the table, and the compiled Wasm function logically "borrows" the VMExternRef from the table. Similarly, global.get and table.get operations clone the gotten
VMExternRef into the VMExternRefActivationsTable and then "borrow" the reference out of the table.

When a VMExternRef is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the VMExternRefActivationsTable and the reference count from the table will be dropped at the next GC).

For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804

Depends on https://github.com/rust-lang/backtrace-rs/pull/341

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 18:37):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 18:37):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 18:37):

alexcrichton created PR Review Comment:

Perhaps a merge conflict gone wrong?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 18:37):

alexcrichton created PR Review Comment:

Could the canary business here be moved into catch_traps to avoid some duplication?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 18:37):

alexcrichton created PR Review Comment:

I'd personally still prefer to avoid having this here if it's sole purpose is to keep the field alive. I think that memory management should be deferred to Store

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 18:37):

alexcrichton created PR Review Comment:

This might be a bit more compact as (0..size).map(|_| ...).collect()

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 18:37):

alexcrichton created PR Review Comment:

(same with externref_activations_table above too)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 18:37):

alexcrichton created PR Review Comment:

I don't actually see this end up getting used anywhere, was this perhaps from a previous iteration?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 18:37):

alexcrichton created PR Review Comment:

How is this field used from wasm code? I tried to read over but couldn't find much... It also looked like VMOffsets contained information about the end field below too, but it's not an UnsafeCell?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 18:37):

alexcrichton created PR Review Comment:

Since this registry only needs to be within a Store, how come this is an RwLock instead of RefCell?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 18:37):

alexcrichton created PR Review Comment:

Could this assert that sp is not null for wasm frames? (help catch bugs in backtrace)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 18:37):

alexcrichton created PR Review Comment:

Ah ok I confirmed on Zulip, but it sounds like this is planned to get used in the future. I presume that the idea is that with table.get you'd have a fast path for using these two fields to inline the try_insert method, and you'd fallback to an intrinsic for out-of-line growth and such?

Reading over all this dealing with chunks and such feels a bit overly complicated. Could this perhaps be simplified a bit? I'm thinking that this could perhaps be restructured to something that looks like:

 pub struct VMExternRefActivationsTable {
    // finter into `fast_path_storage`
    next: UnsafeCell<NonNull<TableElem>>,
    // never changes after creation
    end: NonNull<TableElem>,
    // used by `try_insert` and stored via `next` in jit code
    fast_path_storage: Box<[TableElem]>,

    imprecise_roots: HashSet<VMExternRef>,
    precise_roots: HashSet<VMExternRef>,
}

where here fast_path_storage is allocated once and never changes. When an overflow of that table happens it's drained and everything is moved into imprecise_roots. That way imprecise_roots is a growing set which deduplicates things (in case you table.get the same thing a bunch of times) and also handles reallocation for us (we don't need to manually double capacities and such).

On a GC we'd fill in precise_roots and then we'd mem::swap the precise/imprecise sets and then clear the imprecise one. Maybe with some other trickery around optimizing reference counts or something like that.

I'm mostly hoping that we can simplifiy the chunks list because:

With a persistent hash set I think it'd solve the duplication/capacity issues? Anyway curious what you think about this!

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 18:37):

alexcrichton created PR Review Comment:

Because of this, could the activation table have a Drop which asserts that the precise_stack_roots set is empty? Otherwise we'd leak data accidentally b/c there's no dtor which drains the map.

(or maybe we should also assert that all chunks are None since they don't get frobbed on Drop either?)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 20:12):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 20:12):

fitzgen created PR Review Comment:

Yeah, when I get merge conflicts in Cargo.lock I generally just blow it away and let the next build recreate it, but this pulled in that problematic dep update we all talked about a couple days ago. Bleh.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 20:14):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 20:14):

fitzgen created PR Review Comment:

It probably doesn't matter in practice, but that would mean the canary is taken from a stack frame that has since been popped

/me shrugs

I guess I could switch the API to table.with_stack_canary(|| { ... }) instead of an RAII thing? Would be harder to misuse this way too. I think I'll do that.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 20:32):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 20:32):

fitzgen created PR Review Comment:

This is a good idea, however

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 22:09):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 22:09):

alexcrichton created PR Review Comment:

Either way's fine by me, I was mostly just hoping that we can encapsulate everything necessary to enter wasm in one function in wasmtime, which currently is catch_traps. I think the with_stack_canary call can be embedded in there too? (we can perhaps rename catch_traps to invoke_wasm or something like that)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 22:11):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 22:11):

alexcrichton created PR Review Comment:

Switching fast_path_storage to a Vec<T> would be an easy way to make the fast-path-table growable (that's a good point, and one thing I was curious how often it would come into play). For shrinking we could periodically call HashSet::shrink_to_fit I think?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 22:24):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 22:24):

fitzgen created PR Review Comment:

Yeah. Would be nice if HashSet::shrink_to was stable, so we could shrink capacity by half when the length is a quarter of the capacity, and keep the amortized O(1). We could do shrink_to_fit followed by reserve but this is suboptimal.

Anyways, I'm just gonna do the simplest thing here for now, and leave that stuff to when we actually see it in profiles.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 23:33):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 23:33):

fitzgen created PR Review Comment:

Also, the backtrace update needs these dep updates.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 23:44):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 23:44):

fitzgen created PR Review Comment:

This runs into the weirdness at the wasmtime and wasmtime-runtime boundary again: this is inside wasmtime-runtime and so it has "no" knowledge of Store and whether anything else is holding the table alive for it.

Would you prefer that InstanceHandle::new took a *mut VMExternRefActivationsTable instead of an Rc and we added this to the safetry invariants required to be maintained for InstanceHandle::new?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 23:44):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 23:44):

fitzgen created PR Review Comment:

Same question as above regarding *mut vs Arc: https://github.com/bytecodealliance/wasmtime/pull/1832/files?file-filters%5B%5D=.rs#r439681507

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 23:49):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 23:49):

fitzgen created PR Review Comment:

Ah I misunderstood what you were asking for in the original comment. This is done now.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 23:52):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2020 at 23:52):

fitzgen created PR Review Comment:

This doesn't work for disabling reference types, and there is no disable function to go along with enable either.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 13 2020 at 00:23):

fitzgen updated PR #1832 from externref-stack-maps to master:

For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a VMExternRef avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the VMExternRef is cloned.

When passing a VMExternRef into compiled Wasm code, we don't want to do reference count mutations for every compiled local.{get,set}, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing VMExternRefs somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of VMExternRefs that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of VMExternRefs inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the VMExternRefs that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set.

The VMExternRefActivationsTable implements the over-approximized set of VMExternRefs referenced by Wasm activations. Calling a Wasm function and passing it a VMExternRef moves the VMExternRef into the table, and the compiled Wasm function logically "borrows" the VMExternRef from the table. Similarly, global.get and table.get operations clone the gotten
VMExternRef into the VMExternRefActivationsTable and then "borrow" the reference out of the table.

When a VMExternRef is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the VMExternRefActivationsTable and the reference count from the table will be dropped at the next GC).

For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804

Depends on https://github.com/rust-lang/backtrace-rs/pull/341

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 16:27):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 16:27):

fitzgen created PR Review Comment:

It was unnecessary, and is now removed.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 16:31):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 16:31):

fitzgen created PR Review Comment:

Otherwise Module is not Send:

error[E0277]: `std::cell::RefCell<wasmtime_runtime::externref::StackMapRegistryInner>` cannot be shared between threads safely
   --> crates/wasmtime/src/module.rs:579:5
    |
578 |     fn _assert<T: Send + Sync>() {}
    |                   ---- required by this bound in `module::_assert_send_sync::_assert`
579 |     _assert::<Module>();
    |     ^^^^^^^^^^^^^^^^^ `std::cell::RefCell<wasmtime_runtime::externref::StackMapRegistryInner>` cannot be shared between threads safely
    |
    = help: within `wasmtime_runtime::externref::StackMapRegistry`, the trait `std::marker::Sync` is not implemented for `std::cell::RefCell<wasmtime_runtime::externref::StackMapRegistryInner>`
    = note: required because it appears within the type `wasmtime_runtime::externref::StackMapRegistry`
    = note: required because of the requirements on the impl of `std::marker::Send` for `std::sync::Arc<wasmtime_runtime::externref::StackMapRegistry>`
    = note: required because it appears within the type `wasmtime_runtime::externref::StackMapRegistration`
    = note: required because of the requirements on the impl of `std::marker::Send` for `std::sync::Arc<wasmtime_runtime::externref::StackMapRegistration>`
    = note: required because it appears within the type `std::option::Option<std::sync::Arc<wasmtime_runtime::externref::StackMapRegistration>>`
    = note: required because it appears within the type `std::option::Option<std::option::Option<std::sync::Arc<wasmtime_runtime::externref::StackMapRegistration>>>`
    = note: required because of the requirements on the impl of `std::marker::Send` for `std::sync::Mutex<std::option::Option<std::option::Option<std::sync::Arc<wasmtime_runtime::externref::StackMapRegistration>>>>`
    = note: required because of the requirements on the impl of `std::marker::Send` for `std::sync::Arc<std::sync::Mutex<std::option::Option<std::option::Option<std::sync::Arc<wasmtime_runtime::externref::StackMapRegistration>>>>>`
    = note: required because it appears within the type `module::Module`

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 16:32):

fitzgen updated PR #1832 from externref-stack-maps to master:

For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a VMExternRef avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the VMExternRef is cloned.

When passing a VMExternRef into compiled Wasm code, we don't want to do reference count mutations for every compiled local.{get,set}, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing VMExternRefs somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of VMExternRefs that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of VMExternRefs inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the VMExternRefs that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set.

The VMExternRefActivationsTable implements the over-approximized set of VMExternRefs referenced by Wasm activations. Calling a Wasm function and passing it a VMExternRef moves the VMExternRef into the table, and the compiled Wasm function logically "borrows" the VMExternRef from the table. Similarly, global.get and table.get operations clone the gotten
VMExternRef into the VMExternRefActivationsTable and then "borrow" the reference out of the table.

When a VMExternRef is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the VMExternRefActivationsTable and the reference count from the table will be dropped at the next GC).

For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804

Depends on https://github.com/rust-lang/backtrace-rs/pull/341

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 16:46):

fitzgen updated PR #1832 from externref-stack-maps to master:

For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a VMExternRef avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the VMExternRef is cloned.

When passing a VMExternRef into compiled Wasm code, we don't want to do reference count mutations for every compiled local.{get,set}, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing VMExternRefs somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of VMExternRefs that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of VMExternRefs inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the VMExternRefs that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set.

The VMExternRefActivationsTable implements the over-approximized set of VMExternRefs referenced by Wasm activations. Calling a Wasm function and passing it a VMExternRef moves the VMExternRef into the table, and the compiled Wasm function logically "borrows" the VMExternRef from the table. Similarly, global.get and table.get operations clone the gotten
VMExternRef into the VMExternRefActivationsTable and then "borrow" the reference out of the table.

When a VMExternRef is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the VMExternRefActivationsTable and the reference count from the table will be dropped at the next GC).

For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804

Depends on https://github.com/rust-lang/backtrace-rs/pull/341

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 16:50):

fitzgen updated PR #1832 from externref-stack-maps to master:

For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a VMExternRef avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the VMExternRef is cloned.

When passing a VMExternRef into compiled Wasm code, we don't want to do reference count mutations for every compiled local.{get,set}, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing VMExternRefs somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of VMExternRefs that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of VMExternRefs inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the VMExternRefs that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set.

The VMExternRefActivationsTable implements the over-approximized set of VMExternRefs referenced by Wasm activations. Calling a Wasm function and passing it a VMExternRef moves the VMExternRef into the table, and the compiled Wasm function logically "borrows" the VMExternRef from the table. Similarly, global.get and table.get operations clone the gotten
VMExternRef into the VMExternRefActivationsTable and then "borrow" the reference out of the table.

When a VMExternRef is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the VMExternRefActivationsTable and the reference count from the table will be dropped at the next GC).

For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804

Depends on https://github.com/rust-lang/backtrace-rs/pull/341

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 19:13):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 19:13):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 19:13):

alexcrichton created PR Review Comment:

This seems to have gone backwards?

(along with a number of other crates?)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 19:13):

alexcrichton created PR Review Comment:

FWIW this is likely to break CI until this is fixed

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 19:13):

alexcrichton created PR Review Comment:

Published now!

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 19:13):

alexcrichton created PR Review Comment:

I think I commented on this last time too, so sorry if I missed your response in the meantime, but can there be a destructor for this type which debug-asserts that all slots in this chunk are None?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 19:13):

alexcrichton created PR Review Comment:

Here on insert_slow_path it always inserts in the hash set, but presumably after gc the chunk is empty?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 19:13):

alexcrichton created PR Review Comment:

Also, should this idiom perhaps be encapsulated in a method on the activation table?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 19:13):

alexcrichton created PR Review Comment:

Ah right yeah, I think we probably need to clean up that storge of the registrations in Module, but for now seems fine.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 19:13):

alexcrichton created PR Review Comment:

Instead of #[cfg] could this conditionally panic if if cfg! in the implementation of Deref and such?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 19:13):

alexcrichton created PR Review Comment:

Ah sorry missed this earlier, but yeah let's try passing around *mut here instead of an owned pointer.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 19:13):

alexcrichton created PR Review Comment:

Perhaps we should take a leaf out of Rc's book and use strong_count for forwards-compatibility with weak references if we ever add them?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 19:13):

alexcrichton created PR Review Comment:

To confirm, this for sure works, right? One thing I'd be worried about is rustc's rvalue-promotion where the 0 gets promoted into static data. If you did let canary = &0, for example, I think that'd point into static data rather than on the stack. I'd be a bit worried that eventually if rustc did more mir inlining or something like that it'd inline the definition of canary to here and do the rvalue promotion anyway.

In any case so long as this works today I'm fine, and I would imagine that if rustc ever changed it would cause tests to fail here too.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 19:13):

alexcrichton created PR Review Comment:

I think this'll need to look like register_jit_code ish where we register a compiled module with a Store and metadata about that is stored in Store rather than the Module.

Honestly we should do this with register_frame_info too, but I can tackle that some other time.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 19:13):

alexcrichton created PR Review Comment:

you can probably avoid passing these two parameters explicitly and read them from the passed-in store argument inside the method.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 19:13):

alexcrichton created PR Review Comment:

Out of curiosity, could we always enable this? Presumably once reference types ships we'll enable this unconditionally anyway.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 19:13):

alexcrichton created PR Review Comment:

I think that this API may actually be incorrect for GC, because you can register a module into two stores. The second store won't actually get registered in its registry which I think means GC won't be precise?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 19:13):

alexcrichton created PR Review Comment:

Answering my own question, if my thought earlier about the stack canary and rvalue promotion breaks things this assertion will break. If our stack canary never works then nothing will get gc'd from the above loop.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 20:06):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 20:06):

fitzgen created PR Review Comment:

Sorry I didn't mention this, but since switching from the list-of-chunks representation to your new suggested one, the table automatically runs the Drop implementations, so dropping the table will now properly avoid leaks.

Therefore, I didn't think it was necessary to insert these debug assertions anymore. (Also, inserting them would require that we gc before dropping the table, to ensure that it gets swept. Everything is easier when we can rely on automatic Drops!)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 20:07):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 20:07):

fitzgen created PR Review Comment:

I figured since it is already a slow path, it might as well do de-duplication, rather than use the bump chunk. I can leave a comment to this effect, and also make a method for this idiom.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 20:10):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 20:10):

alexcrichton created PR Review Comment:

Whoops missed that, nice!

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 20:16):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 20:16):

fitzgen created PR Review Comment:

It does work today (the tests would fail if it didn't, since they are checking the reference counts and asserting that they go down after GC, which means we did sweep the table, which means we saw the canary).

I agree that it is slightly fragile. I don't know what we could do to avoid it without something like test::blackbox or calling getcontext to get the SP (eww out-of-line call).

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 20:17):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 20:17):

fitzgen created PR Review Comment:

Then we would need to always have the inner: T member and rely on LLVM to optimize it away when not used in non-cfg(debug_assertions). Does that sound OK to you?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 20:23):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 20:23):

alexcrichton created PR Review Comment:

Yeah given the limited usage of this I trust LLVM enough to figure everything out.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 20:51):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 20:51):

fitzgen created PR Review Comment:

Good eye! I didn't realize that a single module could be registered with multiple stores!

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 21:19):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 21:19):

fitzgen created PR Review Comment:

We could, but it implies an additional pass over the IR by cranelift, and if we know there aren't any being used, then we can skip that unnecessary work.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 21:19):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 21:19):

fitzgen created PR Review Comment:

Ah, yep, I see you figured it out :)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 21:20):

fitzgen updated PR #1832 from externref-stack-maps to master:

For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a VMExternRef avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the VMExternRef is cloned.

When passing a VMExternRef into compiled Wasm code, we don't want to do reference count mutations for every compiled local.{get,set}, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing VMExternRefs somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of VMExternRefs that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of VMExternRefs inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the VMExternRefs that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set.

The VMExternRefActivationsTable implements the over-approximized set of VMExternRefs referenced by Wasm activations. Calling a Wasm function and passing it a VMExternRef moves the VMExternRef into the table, and the compiled Wasm function logically "borrows" the VMExternRef from the table. Similarly, global.get and table.get operations clone the gotten
VMExternRef into the VMExternRefActivationsTable and then "borrow" the reference out of the table.

When a VMExternRef is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the VMExternRefActivationsTable and the reference count from the table will be dropped at the next GC).

For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804

Depends on https://github.com/rust-lang/backtrace-rs/pull/341

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 21:24):

fitzgen updated PR #1832 from externref-stack-maps to master:

For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a VMExternRef avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the VMExternRef is cloned.

When passing a VMExternRef into compiled Wasm code, we don't want to do reference count mutations for every compiled local.{get,set}, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing VMExternRefs somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of VMExternRefs that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of VMExternRefs inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the VMExternRefs that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set.

The VMExternRefActivationsTable implements the over-approximized set of VMExternRefs referenced by Wasm activations. Calling a Wasm function and passing it a VMExternRef moves the VMExternRef into the table, and the compiled Wasm function logically "borrows" the VMExternRef from the table. Similarly, global.get and table.get operations clone the gotten
VMExternRef into the VMExternRefActivationsTable and then "borrow" the reference out of the table.

When a VMExternRef is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the VMExternRefActivationsTable and the reference count from the table will be dropped at the next GC).

For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804

Depends on https://github.com/rust-lang/backtrace-rs/pull/341

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 21:25):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 21:25):

fitzgen created PR Review Comment:

Okay the new version fixes this by registering with the store, rather than with the module. This also means that we can get rid of StackMapRegistration entirely, and can instead rely on the store to keep the registry itself alive!

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 21:25):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 21:25):

alexcrichton created PR Review Comment:

This seems fine for now, but I wanted to point this out because that optimization is something we need to likely implement in short order in that case. In theory there should be a fast path for functions which don't use anyref?

Put another way, if this is scoped to only reference types for performance reasons, then we need to file an issue about that and get it fixed before fully shipping reference types.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 21:35):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 21:35):

fitzgen created PR Review Comment:

Filed https://github.com/bytecodealliance/wasmtime/issues/1883

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 22:05):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 22:05):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 22:05):

alexcrichton created PR Review Comment:

I think at this point this can be RefCell?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 22:05):

alexcrichton created PR Review Comment:

Since these strong references are dropped here, could this either take &... or plumb through the *mut business?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 22:05):

alexcrichton created PR Review Comment:

To confirm, is every function guaranteed to have a stack map, even if it doesn't use reference types?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 22:05):

alexcrichton created PR Review Comment:

I think this can be RefCell now?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 22:05):

alexcrichton created PR Review Comment:

Mind throwing a cfg! here for the target pointer width? (and below too)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 22:05):

alexcrichton created PR Review Comment:

Oh dear this is nasty, can we load flags from the Store instead of trying to recreate them here though?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 22:05):

alexcrichton created PR Review Comment:

(in that I think we have a Box<dyn TargetIsa> shoved somewhere in there I think

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 22:31):

fitzgen created PR Review Comment:

Good eye, this is a bug introduced during all the refactoring today. I'm going to just make StackMapRegistry::register_stack_maps idempotent, rather than trying to check if its already registered here. This is a nice little clean up, since the registry is in the best position to answer this question anyways.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 22:31):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 22:39):

fitzgen updated PR #1832 from externref-stack-maps to master:

For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a VMExternRef avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the VMExternRef is cloned.

When passing a VMExternRef into compiled Wasm code, we don't want to do reference count mutations for every compiled local.{get,set}, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing VMExternRefs somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of VMExternRefs that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of VMExternRefs inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the VMExternRefs that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set.

The VMExternRefActivationsTable implements the over-approximized set of VMExternRefs referenced by Wasm activations. Calling a Wasm function and passing it a VMExternRef moves the VMExternRef into the table, and the compiled Wasm function logically "borrows" the VMExternRef from the table. Similarly, global.get and table.get operations clone the gotten
VMExternRef into the VMExternRefActivationsTable and then "borrow" the reference out of the table.

When a VMExternRef is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the VMExternRefActivationsTable and the reference count from the table will be dropped at the next GC).

For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804

Depends on https://github.com/rust-lang/backtrace-rs/pull/341

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 23:06):

fitzgen updated PR #1832 from externref-stack-maps to master:

For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a VMExternRef avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the VMExternRef is cloned.

When passing a VMExternRef into compiled Wasm code, we don't want to do reference count mutations for every compiled local.{get,set}, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing VMExternRefs somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of VMExternRefs that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of VMExternRefs inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the VMExternRefs that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set.

The VMExternRefActivationsTable implements the over-approximized set of VMExternRefs referenced by Wasm activations. Calling a Wasm function and passing it a VMExternRef moves the VMExternRef into the table, and the compiled Wasm function logically "borrows" the VMExternRef from the table. Similarly, global.get and table.get operations clone the gotten
VMExternRef into the VMExternRefActivationsTable and then "borrow" the reference out of the table.

When a VMExternRef is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the VMExternRefActivationsTable and the reference count from the table will be dropped at the next GC).

For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804

Depends on https://github.com/rust-lang/backtrace-rs/pull/341

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 23:07):

fitzgen updated PR #1832 from externref-stack-maps to master:

For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a VMExternRef avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the VMExternRef is cloned.

When passing a VMExternRef into compiled Wasm code, we don't want to do reference count mutations for every compiled local.{get,set}, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing VMExternRefs somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of VMExternRefs that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of VMExternRefs inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the VMExternRefs that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set.

The VMExternRefActivationsTable implements the over-approximized set of VMExternRefs referenced by Wasm activations. Calling a Wasm function and passing it a VMExternRef moves the VMExternRef into the table, and the compiled Wasm function logically "borrows" the VMExternRef from the table. Similarly, global.get and table.get operations clone the gotten
VMExternRef into the VMExternRefActivationsTable and then "borrow" the reference out of the table.

When a VMExternRef is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the VMExternRefActivationsTable and the reference count from the table will be dropped at the next GC).

For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804

Depends on https://github.com/rust-lang/backtrace-rs/pull/341

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 23:38):

fitzgen updated PR #1832 from externref-stack-maps to master:

For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a VMExternRef avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the VMExternRef is cloned.

When passing a VMExternRef into compiled Wasm code, we don't want to do reference count mutations for every compiled local.{get,set}, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing VMExternRefs somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of VMExternRefs that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of VMExternRefs inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the VMExternRefs that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set.

The VMExternRefActivationsTable implements the over-approximized set of VMExternRefs referenced by Wasm activations. Calling a Wasm function and passing it a VMExternRef moves the VMExternRef into the table, and the compiled Wasm function logically "borrows" the VMExternRef from the table. Similarly, global.get and table.get operations clone the gotten
VMExternRef into the VMExternRefActivationsTable and then "borrow" the reference out of the table.

When a VMExternRef is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the VMExternRefActivationsTable and the reference count from the table will be dropped at the next GC).

For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804

Depends on https://github.com/rust-lang/backtrace-rs/pull/341

view this post on Zulip Wasmtime GitHub notifications bot (Jun 15 2020 at 23:43):

fitzgen updated PR #1832 from externref-stack-maps to master:

For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a VMExternRef avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the VMExternRef is cloned.

When passing a VMExternRef into compiled Wasm code, we don't want to do reference count mutations for every compiled local.{get,set}, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing VMExternRefs somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of VMExternRefs that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of VMExternRefs inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the VMExternRefs that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set.

The VMExternRefActivationsTable implements the over-approximized set of VMExternRefs referenced by Wasm activations. Calling a Wasm function and passing it a VMExternRef moves the VMExternRef into the table, and the compiled Wasm function logically "borrows" the VMExternRef from the table. Similarly, global.get and table.get operations clone the gotten
VMExternRef into the VMExternRefActivationsTable and then "borrow" the reference out of the table.

When a VMExternRef is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the VMExternRefActivationsTable and the reference count from the table will be dropped at the next GC).

For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804

Depends on https://github.com/rust-lang/backtrace-rs/pull/341

view this post on Zulip Wasmtime GitHub notifications bot (Jun 16 2020 at 00:53):

fitzgen updated PR #1832 from externref-stack-maps to master:

For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a VMExternRef avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the VMExternRef is cloned.

When passing a VMExternRef into compiled Wasm code, we don't want to do reference count mutations for every compiled local.{get,set}, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing VMExternRefs somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of VMExternRefs that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of VMExternRefs inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the VMExternRefs that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set.

The VMExternRefActivationsTable implements the over-approximized set of VMExternRefs referenced by Wasm activations. Calling a Wasm function and passing it a VMExternRef moves the VMExternRef into the table, and the compiled Wasm function logically "borrows" the VMExternRef from the table. Similarly, global.get and table.get operations clone the gotten
VMExternRef into the VMExternRefActivationsTable and then "borrow" the reference out of the table.

When a VMExternRef is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the VMExternRefActivationsTable and the reference count from the table will be dropped at the next GC).

For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf

cc #929

Fixes #1804

Depends on https://github.com/rust-lang/backtrace-rs/pull/341

view this post on Zulip Wasmtime GitHub notifications bot (Jun 16 2020 at 01:26):

fitzgen merged PR #1832.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 16 2020 at 13:56):

alexcrichton created PR Review Comment:

For future instnaces of this, mind tagging this with a FIXME and an issue number so when we get around to aarch64 reference types we can make sure we run all the tests?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 16 2020 at 13:56):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 16 2020 at 13:56):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 16 2020 at 13:56):

alexcrichton created PR Review Comment:

Same comment here for the aarch64 testing


Last updated: Jan 24 2025 at 00:11 UTC