Stream: git-wasmtime

Topic: wasmtime / PR #4551 Implement an incremental compilation ...


view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2022 at 17:38):

bnjbvr opened PR #4551 from wip-05-09_wasm_cache to main:

[Opening a draft for discussions; there are a remaining TODOs i'll need to address, and I hope to write a better PR description here.]

This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.

Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.

Fixes #4155.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2022 at 17:38):

bnjbvr edited PR #4551 from wip-05-09_wasm_cache to main:

[Opening a draft for discussions; there are a remaining TODOs i'll need to address, and I hope to write a better PR description here.]

This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.

Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.

Fixes #4155.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2022 at 18:30):

bjorn3 created PR review comment:

A 64bit hash may be too small to be safely used in adversarial cases.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2022 at 18:30):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2022 at 18:34):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2022 at 18:34):

bjorn3 created PR review comment:

Won't this cause miscompilations when you at first call functions a, a, b and the next time a, b, b as there is no way to distinguish them based on the cache key?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2022 at 18:36):

bjorn3 created PR review comment:

Shouldn't this use a cryptographically secure hash function? FxHasher is known to have hash collisions. It is a quick and dirty hash function only suitable when you can handle hash collisions, like in a hash table.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2022 at 18:36):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2022 at 18:37):

bjorn3 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2022 at 18:39):

bjorn3 created PR review comment:

I believe you can access all these through the public api's. I did prefer that over direct accesses as it is less likely to violate invariants.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2022 at 18:39):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2022 at 18:46):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2022 at 18:46):

afonso360 created PR review comment:

        for _ in 0..self.param(&self.config.funcrefs_per_function)? {

view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2022 at 18:55):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2022 at 18:55):

afonso360 created PR review comment:

Can we refactor this to use generate_func above?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2022 at 19:18):

afonso360 deleted PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2022 at 19:21):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 28 2022 at 19:21):

afonso360 created PR review comment:

Can we refactor this to use generate_func above?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2022 at 16:58):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2022 at 17:11):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 01 2022 at 13:05):

bnjbvr created PR review comment:

That's the thing: there's a perfect bijection between Func's FuncRef and ExternalName, so we don't need to record the external names, and we can just be "resilient" with them, meaning that it's fine to the actual callees change. This will end up generating different MachReloc later.

I think Chris' suggestion to split the Function/MachCompileResult into the "core" and parameters ought to make this clearer and easier to understand.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 01 2022 at 13:05):

bnjbvr submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 01 2022 at 13:05):

bnjbvr created PR review comment:

See other answer below.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 01 2022 at 13:05):

bnjbvr submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 01 2022 at 13:07):

bnjbvr submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 01 2022 at 13:07):

bnjbvr created PR review comment:

It doesn't have to be, because right now the incremental cache first looks up using the CacheKeyHash, and then compares against the CacheKey, exactly as HashMaps do first the hashing and then use the PartialEq to check for equality.

This was a very safe first way to implement this, and it's also what limits performance of reloads. An alternative way would be to use a safe, longer hash here (as you suggest in this and the other comment above), and see if we can get rid of the equality check in that case. I'd like to experiment with this, likely later.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 01 2022 at 13:07):

bnjbvr created PR review comment:

Good catch, thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Aug 01 2022 at 13:07):

bnjbvr submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 01 2022 at 13:48):

bnjbvr updated PR #4551 from wip-05-09_wasm_cache to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 01 2022 at 15:27):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 01 2022 at 15:27):

bjorn3 created PR review comment:

That's the thing: there's a perfect bijection between Func's FuncRef and ExternalName, so we don't need to record the external names, and we can just be "resilient" with them, meaning that it's fine to the actual callees change. This will end up generating different MachReloc later.

I don't see how a bijection helps.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 01 2022 at 15:31):

bnjbvr submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 01 2022 at 15:31):

bnjbvr created PR review comment:

Right now, the MachRelocs contain some ExternalNames, when the MachCompileResult is cached. When we load from the cache, and have a function with different FuncRefs -> ExternalNames, we can:

  1. first lookup what FuncRef in the original function caused the ExternalName to pop up in the MachReloc, since it's a bijection
  2. use the FuncRef -> ExternalName from the function-to-compile to rewrite the MachReloc on the fly

view this post on Zulip Wasmtime GitHub notifications bot (Aug 01 2022 at 15:35):

bnjbvr submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 01 2022 at 15:35):

bnjbvr created PR review comment:

(... so we might as well just use the FuncRef everywhere directly, to avoid maintaining the invariant that we want this bijection; and this is in line with what Chris suggested in his comment, so I'm doing that now.)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 01 2022 at 16:50):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 01 2022 at 16:50):

bjorn3 created PR review comment:

When we load from the cache, and have a function with different FuncRefs -> ExternalNames, we can:

But what if you have a function with the same FuncRef -> ExternalName, but where the body eg calls functions in a different order?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 01 2022 at 16:50):

bjorn3 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 02 2022 at 07:43):

bnjbvr submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 02 2022 at 07:43):

bnjbvr created PR review comment:

Each different function call (ir::Opcode::Call) references the FuncRef it's calling, so this is entirely determined by the CLIF IR. If two functions have the same body and each call two functions, but two different functions, then two different FuncRef are involved, leading to different lookups in the FuncRef -> ExternalName mapping, thus different ExternalNames. (Well, that's assuming that the ExternalNames are all different from each other, which would be important if we stayed down that path but that's not the case.)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 02 2022 at 13:12):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 02 2022 at 13:12):

bjorn3 created PR review comment:

You are normalizing away the difference between different ExternalName's away here, right? CachedExternalName::User doesn't have any field to distinguish one from another.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 02 2022 at 13:15):

bnjbvr submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 02 2022 at 13:15):

bnjbvr created PR review comment:

Indeed, because they're "spurious", at this point; only the FuncRef really matters, and there's a way to go from ExternalName back to FuncRef (in the _source_ function that filled the cache), and then from the FuncRef to ExternalName (in the function for which we're reusing the cache).

But don't get too attached to this, it's going to be simplified by Chris' suggestion below :)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2022 at 19:24):

bnjbvr updated PR #4551 from wip-05-09_wasm_cache to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 10 2022 at 14:51):

bnjbvr updated PR #4551 from wip-05-09_wasm_cache to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 10 2022 at 15:48):

bnjbvr updated PR #4551 from wip-05-09_wasm_cache to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 10 2022 at 16:04):

bnjbvr edited PR #4551 from wip-05-09_wasm_cache to main:

This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.

After the suggestion of Chris, Function has been split into mostly two parts:

Most changes are here to accomodate those requirements, in particular that FunctionStencil should be Hashable to be used as a key in the cache:

The cache computes a sha256 hash of the FunctionStencil, and uses this as the cache key. No equality check (using PartialEq) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.

A basic fuzz target has been introduced that tries to do the bare minimum:

Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.

Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many FunctionStencils are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement.

Fixes #4155.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 10 2022 at 16:14):

bnjbvr has marked PR #4551 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 10 2022 at 16:14):

bnjbvr requested cfallin for a review on PR #4551.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 10 2022 at 16:48):

bnjbvr updated PR #4551 from wip-05-09_wasm_cache to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 10 2022 at 16:53):

bnjbvr updated PR #4551 from wip-05-09_wasm_cache to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 11 2022 at 17:09):

bnjbvr updated PR #4551 from wip-05-09_wasm_cache to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 11 2022 at 17:20):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 11 2022 at 18:27):

cfallin updated PR #4551 from wip-05-09_wasm_cache to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 11 2022 at 19:12):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 11 2022 at 19:12):

bjorn3 created PR review comment:

This error message is confusing if the consumer of cranelift is not wasmtime.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 11 2022 at 19:14):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 11 2022 at 19:14):

bjorn3 created PR review comment:

Maybe use SecondaryMap instead? BTreeMap results in a lot of code bloat (significantly more than HashMap) and is not very efficient for dense maps. It is reasonably efficient for sparse maps though.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 11 2022 at 19:17):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 11 2022 at 19:17):

bjorn3 created PR review comment:

Does using RelSourceLoc the same way as SourceLoc give correct results? In cg_clif SourceLoc is already function local. (It is backed by an IndexSet with mir::SourceLoc as entries)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 11 2022 at 19:23):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 11 2022 at 19:23):

bjorn3 created PR review comment:

What are calls to these supposed to be replaced with?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 11 2022 at 19:23):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 11 2022 at 19:23):

bjorn3 created PR review comment:

???? Why do you need a Function to define a data object?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 11 2022 at 19:24):

bjorn3 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 15:35):

bnjbvr created PR review comment:

I've ran into bad perf regressions b/o SecondaryMap auto resize behaviors, so I'm a bit dubious about this. We could reevaluate later.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 15:35):

bnjbvr submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 15:37):

bnjbvr submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 15:37):

bnjbvr created PR review comment:

Yes, it's mostly used internally so I wouldn't even expect embedders to interact with it.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 15:42):

bnjbvr submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 15:42):

bnjbvr created PR review comment:

It was to process relocations (to extract the , but now I can see that DataDescription abuses the MachReloc and it should be its own type that doesn't use ir::ExternalName at all, now that the latter uses references into an external table for lookup. I'll try doing this quickly, but if it takes too long I'd prefer that somebody who understands this code better take a look.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 16:02):

bnjbvr submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 16:02):

bnjbvr created PR review comment:

I managed to put them back eventually.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 16:03):

bnjbvr created PR review comment:

That wasn't too hard, eventually; there's now a ModuleReloc that is a MachReloc but using ModuleExtName (which embeds user-defined name) instead of ir::ExternalName for names.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 16:03):

bnjbvr submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 16:14):

bnjbvr updated PR #4551 from wip-05-09_wasm_cache to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 16:15):

bnjbvr has enabled auto merge for PR #4551.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 16:15):

bnjbvr has disabled auto merge for PR #4551.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 16:15):

bnjbvr updated PR #4551 from wip-05-09_wasm_cache to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 16:16):

bnjbvr has enabled auto merge for PR #4551.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 16:22):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 16:22):

bjorn3 created PR review comment:

These haven't yet been re-added.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 16:47):

bnjbvr merged PR #4551.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 17:42):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 12 2022 at 17:42):

bjorn3 created PR review comment:

@bnjbvr?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 16 2022 at 08:37):

bnjbvr created PR review comment:

Ah yeah, seems I added it back in one implementation, but not in the trait. Is that something that you were using, @bjorn3? I could add it back later, if required

view this post on Zulip Wasmtime GitHub notifications bot (Aug 16 2022 at 08:37):

bnjbvr submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 16 2022 at 09:07):

bjorn3 created PR review comment:

This change means that declare_func_in_data and declare_data_in_data are no longer forwarded to FooModule when you call them on a &mut FooModule, which is incorrect if those functions have been overwritten by FooModule.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 16 2022 at 09:07):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 16 2022 at 09:13):

bnjbvr created PR review comment:

I see; it's likely subpar design if the default implementation is incorrect for some implementations by default, but that's a different discussion to have :) Will post a small patch for this chunk.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 16 2022 at 09:13):

bnjbvr submitted PR review.


Last updated: Dec 23 2024 at 12:05 UTC