bnjbvr opened PR #4551 from wip-05-09_wasm_cache
to main
:
[Opening a draft for discussions; there are a remaining TODOs i'll need to address, and I hope to write a better PR description here.]
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.
Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.
Fixes #4155.
bnjbvr edited PR #4551 from wip-05-09_wasm_cache
to main
:
[Opening a draft for discussions; there are a remaining TODOs i'll need to address, and I hope to write a better PR description here.]
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.
Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.
Fixes #4155.
bjorn3 created PR review comment:
A 64bit hash may be too small to be safely used in adversarial cases.
bjorn3 submitted PR review.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Won't this cause miscompilations when you at first call functions a, a, b and the next time a, b, b as there is no way to distinguish them based on the cache key?
bjorn3 created PR review comment:
Shouldn't this use a cryptographically secure hash function? FxHasher is known to have hash collisions. It is a quick and dirty hash function only suitable when you can handle hash collisions, like in a hash table.
bjorn3 submitted PR review.
bjorn3 edited PR review comment.
bjorn3 created PR review comment:
I believe you can access all these through the public api's. I did prefer that over direct accesses as it is less likely to violate invariants.
bjorn3 submitted PR review.
afonso360 submitted PR review.
afonso360 created PR review comment:
for _ in 0..self.param(&self.config.funcrefs_per_function)? {
afonso360 submitted PR review.
afonso360 created PR review comment:
Can we refactor this to use
generate_func
above?
afonso360 deleted PR review comment.
afonso360 submitted PR review.
afonso360 created PR review comment:
Can we refactor this to use
generate_func
above?
cfallin submitted PR review.
cfallin submitted PR review.
bnjbvr created PR review comment:
That's the thing: there's a perfect bijection between Func's
FuncRef
andExternalName
, so we don't need to record the external names, and we can just be "resilient" with them, meaning that it's fine to the actual callees change. This will end up generating differentMachReloc
later.I think Chris' suggestion to split the
Function
/MachCompileResult
into the "core" and parameters ought to make this clearer and easier to understand.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
See other answer below.
bnjbvr submitted PR review.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
It doesn't have to be, because right now the incremental cache first looks up using the
CacheKeyHash
, and then compares against theCacheKey
, exactly asHashMap
s do first the hashing and then use thePartialEq
to check for equality.This was a very safe first way to implement this, and it's also what limits performance of reloads. An alternative way would be to use a safe, longer hash here (as you suggest in this and the other comment above), and see if we can get rid of the equality check in that case. I'd like to experiment with this, likely later.
bnjbvr created PR review comment:
Good catch, thanks!
bnjbvr submitted PR review.
bnjbvr updated PR #4551 from wip-05-09_wasm_cache
to main
.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
That's the thing: there's a perfect bijection between Func's FuncRef and ExternalName, so we don't need to record the external names, and we can just be "resilient" with them, meaning that it's fine to the actual callees change. This will end up generating different MachReloc later.
I don't see how a bijection helps.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
Right now, the
MachReloc
s contain someExternalName
s, when theMachCompileResult
is cached. When we load from the cache, and have a function with differentFuncRef
s ->ExternalName
s, we can:
- first lookup what
FuncRef
in the original function caused theExternalName
to pop up in theMachReloc
, since it's a bijection- use the
FuncRef
->ExternalName
from the function-to-compile to rewrite theMachReloc
on the fly
bnjbvr submitted PR review.
bnjbvr created PR review comment:
(... so we might as well just use the
FuncRef
everywhere directly, to avoid maintaining the invariant that we want this bijection; and this is in line with what Chris suggested in his comment, so I'm doing that now.)
bjorn3 submitted PR review.
bjorn3 created PR review comment:
When we load from the cache, and have a function with different FuncRefs -> ExternalNames, we can:
But what if you have a function with the same FuncRef -> ExternalName, but where the body eg calls functions in a different order?
bjorn3 edited PR review comment.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
Each different function call (
ir::Opcode::Call
) references theFuncRef
it's calling, so this is entirely determined by the CLIF IR. If two functions have the same body and each call two functions, but two different functions, then two differentFuncRef
are involved, leading to different lookups in theFuncRef -> ExternalName
mapping, thus differentExternalName
s. (Well, that's assuming that theExternalName
s are all different from each other, which would be important if we stayed down that path but that's not the case.)
bjorn3 submitted PR review.
bjorn3 created PR review comment:
You are normalizing away the difference between different ExternalName's away here, right?
CachedExternalName::User
doesn't have any field to distinguish one from another.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
Indeed, because they're "spurious", at this point; only the
FuncRef
really matters, and there's a way to go fromExternalName
back toFuncRef
(in the _source_ function that filled the cache), and then from theFuncRef
toExternalName
(in the function for which we're reusing the cache).But don't get too attached to this, it's going to be simplified by Chris' suggestion below :)
bnjbvr updated PR #4551 from wip-05-09_wasm_cache
to main
.
bnjbvr updated PR #4551 from wip-05-09_wasm_cache
to main
.
bnjbvr updated PR #4551 from wip-05-09_wasm_cache
to main
.
bnjbvr edited PR #4551 from wip-05-09_wasm_cache
to main
:
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.
After the suggestion of Chris,
Function
has been split into mostly two parts:
- on the one hand,
FunctionStencil
contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (CompiledCodeBase<Stencil>
) will be the same. This makes caching trivial, as the only thing to cache is theFunctionStencil
.- on the other hand,
FunctionParameters
contain the... function parameters that are required to finalize the result of compilation into aCompiledCode
(akaCompiledCodeBase<Final>
) with proper final relocations etc., by applying fixups and so on.Most changes are here to accomodate those requirements, in particular that
FunctionStencil
should beHash
able to be used as a key in the cache:
- most source locations are now relative to a base source location in the function, and as such they're encoded as
RelSourceLoc
in theFunctionStencil
. This required changes so that there's no need to explicitly mark aSourceLoc
as the base source location, it's automatically detected instead the first time a non-defaultSourceLoc
is set.- user-defined external names in the
FunctionStencil
(aka before this patchExternalName::User { namespace, index }
) are now references into an external table ofUserExternalNameRef -> UserExternalName
, present in theFunctionParameters
, and must be explicitly declared usingFunction::declare_imported_user_function
.- some refactorings have been made for function names:
ExternalName
was used as the type for aFunction
's name; while it thus allowedExternalName::Libcall
in this place, this would have been quite confusing to use it there. Instead, a new enumUserFuncName
is introduced for this name, that's either a user-defined function name (the aboveUserExternalName
) or a test case name.- The future of
ExternalName
is likely to become a full reference into theFunctionParameters
's mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.The cache computes a sha256 hash of the
FunctionStencil
, and uses this as the cache key. No equality check (usingPartialEq
) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.A basic fuzz target has been introduced that tries to do the bare minimum:
- check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
- check that a trivial modification in the external mapping of
UserExternalNameRef -> UserExternalName
hits the cache, and that other modifications don't hit the cache.
- This last check is less efficient and less likely to happen, so probably should be rethought a bit.
Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.
Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many
FunctionStencil
s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement.Fixes #4155.
bnjbvr has marked PR #4551 as ready for review.
bnjbvr requested cfallin for a review on PR #4551.
bnjbvr updated PR #4551 from wip-05-09_wasm_cache
to main
.
bnjbvr updated PR #4551 from wip-05-09_wasm_cache
to main
.
bnjbvr updated PR #4551 from wip-05-09_wasm_cache
to main
.
cfallin submitted PR review.
cfallin updated PR #4551 from wip-05-09_wasm_cache
to main
.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
This error message is confusing if the consumer of cranelift is not wasmtime.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Maybe use SecondaryMap instead? BTreeMap results in a lot of code bloat (significantly more than HashMap) and is not very efficient for dense maps. It is reasonably efficient for sparse maps though.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Does using RelSourceLoc the same way as SourceLoc give correct results? In cg_clif SourceLoc is already function local. (It is backed by an IndexSet with mir::SourceLoc as entries)
bjorn3 submitted PR review.
bjorn3 created PR review comment:
What are calls to these supposed to be replaced with?
bjorn3 submitted PR review.
bjorn3 created PR review comment:
???? Why do you need a Function to define a data object?
bjorn3 edited PR review comment.
bnjbvr created PR review comment:
I've ran into bad perf regressions b/o SecondaryMap auto resize behaviors, so I'm a bit dubious about this. We could reevaluate later.
bnjbvr submitted PR review.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
Yes, it's mostly used internally so I wouldn't even expect embedders to interact with it.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
It was to process relocations (to extract the , but now I can see that
DataDescription
abuses theMachReloc
and it should be its own type that doesn't useir::ExternalName
at all, now that the latter uses references into an external table for lookup. I'll try doing this quickly, but if it takes too long I'd prefer that somebody who understands this code better take a look.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
I managed to put them back eventually.
bnjbvr created PR review comment:
That wasn't too hard, eventually; there's now a
ModuleReloc
that is aMachReloc
but usingModuleExtName
(which embeds user-defined name) instead ofir::ExternalName
for names.
bnjbvr submitted PR review.
bnjbvr updated PR #4551 from wip-05-09_wasm_cache
to main
.
bnjbvr has enabled auto merge for PR #4551.
bnjbvr has disabled auto merge for PR #4551.
bnjbvr updated PR #4551 from wip-05-09_wasm_cache
to main
.
bnjbvr has enabled auto merge for PR #4551.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
These haven't yet been re-added.
bnjbvr merged PR #4551.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
@bnjbvr?
bnjbvr created PR review comment:
Ah yeah, seems I added it back in one implementation, but not in the trait. Is that something that you were using, @bjorn3? I could add it back later, if required
bnjbvr submitted PR review.
bjorn3 created PR review comment:
This change means that
declare_func_in_data
anddeclare_data_in_data
are no longer forwarded toFooModule
when you call them on a&mut FooModule
, which is incorrect if those functions have been overwritten byFooModule
.
bjorn3 submitted PR review.
bnjbvr created PR review comment:
I see; it's likely subpar design if the default implementation is incorrect for some implementations by default, but that's a different discussion to have :) Will post a small patch for this chunk.
bnjbvr submitted PR review.
Last updated: Jan 24 2025 at 00:11 UTC