bnjbvr opened PR #4551 from wip-05-09_wasm_cache to main:
[Opening a draft for discussions; there are a remaining TODOs i'll need to address, and I hope to write a better PR description here.]
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.
Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.
Fixes #4155.
bnjbvr edited PR #4551 from wip-05-09_wasm_cache to main:
[Opening a draft for discussions; there are a remaining TODOs i'll need to address, and I hope to write a better PR description here.]
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.
Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.
Fixes #4155.
bjorn3 created PR review comment:
A 64bit hash may be too small to be safely used in adversarial cases.
bjorn3 submitted PR review.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Won't this cause miscompilations when you at first call functions a, a, b and the next time a, b, b as there is no way to distinguish them based on the cache key?
bjorn3 created PR review comment:
Shouldn't this use a cryptographically secure hash function? FxHasher is known to have hash collisions. It is a quick and dirty hash function only suitable when you can handle hash collisions, like in a hash table.
bjorn3 submitted PR review.
bjorn3 edited PR review comment.
bjorn3 created PR review comment:
I believe you can access all these through the public api's. I did prefer that over direct accesses as it is less likely to violate invariants.
bjorn3 submitted PR review.
afonso360 submitted PR review.
afonso360 created PR review comment:
for _ in 0..self.param(&self.config.funcrefs_per_function)? {
afonso360 submitted PR review.
afonso360 created PR review comment:
Can we refactor this to use
generate_funcabove?
afonso360 deleted PR review comment.
afonso360 submitted PR review.
afonso360 created PR review comment:
Can we refactor this to use
generate_funcabove?
cfallin submitted PR review.
cfallin submitted PR review.
bnjbvr created PR review comment:
That's the thing: there's a perfect bijection between Func's
FuncRefandExternalName, so we don't need to record the external names, and we can just be "resilient" with them, meaning that it's fine to the actual callees change. This will end up generating differentMachReloclater.I think Chris' suggestion to split the
Function/MachCompileResultinto the "core" and parameters ought to make this clearer and easier to understand.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
See other answer below.
bnjbvr submitted PR review.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
It doesn't have to be, because right now the incremental cache first looks up using the
CacheKeyHash, and then compares against theCacheKey, exactly asHashMaps do first the hashing and then use thePartialEqto check for equality.This was a very safe first way to implement this, and it's also what limits performance of reloads. An alternative way would be to use a safe, longer hash here (as you suggest in this and the other comment above), and see if we can get rid of the equality check in that case. I'd like to experiment with this, likely later.
bnjbvr created PR review comment:
Good catch, thanks!
bnjbvr submitted PR review.
bnjbvr updated PR #4551 from wip-05-09_wasm_cache to main.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
That's the thing: there's a perfect bijection between Func's FuncRef and ExternalName, so we don't need to record the external names, and we can just be "resilient" with them, meaning that it's fine to the actual callees change. This will end up generating different MachReloc later.
I don't see how a bijection helps.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
Right now, the
MachRelocs contain someExternalNames, when theMachCompileResultis cached. When we load from the cache, and have a function with differentFuncRefs ->ExternalNames, we can:
- first lookup what
FuncRefin the original function caused theExternalNameto pop up in theMachReloc, since it's a bijection- use the
FuncRef->ExternalNamefrom the function-to-compile to rewrite theMachRelocon the fly
bnjbvr submitted PR review.
bnjbvr created PR review comment:
(... so we might as well just use the
FuncRefeverywhere directly, to avoid maintaining the invariant that we want this bijection; and this is in line with what Chris suggested in his comment, so I'm doing that now.)
bjorn3 submitted PR review.
bjorn3 created PR review comment:
When we load from the cache, and have a function with different FuncRefs -> ExternalNames, we can:
But what if you have a function with the same FuncRef -> ExternalName, but where the body eg calls functions in a different order?
bjorn3 edited PR review comment.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
Each different function call (
ir::Opcode::Call) references theFuncRefit's calling, so this is entirely determined by the CLIF IR. If two functions have the same body and each call two functions, but two different functions, then two differentFuncRefare involved, leading to different lookups in theFuncRef -> ExternalNamemapping, thus differentExternalNames. (Well, that's assuming that theExternalNames are all different from each other, which would be important if we stayed down that path but that's not the case.)
bjorn3 submitted PR review.
bjorn3 created PR review comment:
You are normalizing away the difference between different ExternalName's away here, right?
CachedExternalName::Userdoesn't have any field to distinguish one from another.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
Indeed, because they're "spurious", at this point; only the
FuncRefreally matters, and there's a way to go fromExternalNameback toFuncRef(in the _source_ function that filled the cache), and then from theFuncReftoExternalName(in the function for which we're reusing the cache).But don't get too attached to this, it's going to be simplified by Chris' suggestion below :)
bnjbvr updated PR #4551 from wip-05-09_wasm_cache to main.
bnjbvr updated PR #4551 from wip-05-09_wasm_cache to main.
bnjbvr updated PR #4551 from wip-05-09_wasm_cache to main.
bnjbvr edited PR #4551 from wip-05-09_wasm_cache to main:
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.
After the suggestion of Chris,
Functionhas been split into mostly two parts:
- on the one hand,
FunctionStencilcontains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (CompiledCodeBase<Stencil>) will be the same. This makes caching trivial, as the only thing to cache is theFunctionStencil.- on the other hand,
FunctionParameterscontain the... function parameters that are required to finalize the result of compilation into aCompiledCode(akaCompiledCodeBase<Final>) with proper final relocations etc., by applying fixups and so on.Most changes are here to accomodate those requirements, in particular that
FunctionStencilshould beHashable to be used as a key in the cache:
- most source locations are now relative to a base source location in the function, and as such they're encoded as
RelSourceLocin theFunctionStencil. This required changes so that there's no need to explicitly mark aSourceLocas the base source location, it's automatically detected instead the first time a non-defaultSourceLocis set.- user-defined external names in the
FunctionStencil(aka before this patchExternalName::User { namespace, index }) are now references into an external table ofUserExternalNameRef -> UserExternalName, present in theFunctionParameters, and must be explicitly declared usingFunction::declare_imported_user_function.- some refactorings have been made for function names:
ExternalNamewas used as the type for aFunction's name; while it thus allowedExternalName::Libcallin this place, this would have been quite confusing to use it there. Instead, a new enumUserFuncNameis introduced for this name, that's either a user-defined function name (the aboveUserExternalName) or a test case name.- The future of
ExternalNameis likely to become a full reference into theFunctionParameters's mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.The cache computes a sha256 hash of the
FunctionStencil, and uses this as the cache key. No equality check (usingPartialEq) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.A basic fuzz target has been introduced that tries to do the bare minimum:
- check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
- check that a trivial modification in the external mapping of
UserExternalNameRef -> UserExternalNamehits the cache, and that other modifications don't hit the cache.
- This last check is less efficient and less likely to happen, so probably should be rethought a bit.
Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.
Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many
FunctionStencils are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement.Fixes #4155.
bnjbvr has marked PR #4551 as ready for review.
bnjbvr requested cfallin for a review on PR #4551.
bnjbvr updated PR #4551 from wip-05-09_wasm_cache to main.
bnjbvr updated PR #4551 from wip-05-09_wasm_cache to main.
bnjbvr updated PR #4551 from wip-05-09_wasm_cache to main.
cfallin submitted PR review.
cfallin updated PR #4551 from wip-05-09_wasm_cache to main.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
This error message is confusing if the consumer of cranelift is not wasmtime.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Maybe use SecondaryMap instead? BTreeMap results in a lot of code bloat (significantly more than HashMap) and is not very efficient for dense maps. It is reasonably efficient for sparse maps though.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Does using RelSourceLoc the same way as SourceLoc give correct results? In cg_clif SourceLoc is already function local. (It is backed by an IndexSet with mir::SourceLoc as entries)
bjorn3 submitted PR review.
bjorn3 created PR review comment:
What are calls to these supposed to be replaced with?
bjorn3 submitted PR review.
bjorn3 created PR review comment:
???? Why do you need a Function to define a data object?
bjorn3 edited PR review comment.
bnjbvr created PR review comment:
I've ran into bad perf regressions b/o SecondaryMap auto resize behaviors, so I'm a bit dubious about this. We could reevaluate later.
bnjbvr submitted PR review.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
Yes, it's mostly used internally so I wouldn't even expect embedders to interact with it.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
It was to process relocations (to extract the , but now I can see that
DataDescriptionabuses theMachRelocand it should be its own type that doesn't useir::ExternalNameat all, now that the latter uses references into an external table for lookup. I'll try doing this quickly, but if it takes too long I'd prefer that somebody who understands this code better take a look.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
I managed to put them back eventually.
bnjbvr created PR review comment:
That wasn't too hard, eventually; there's now a
ModuleRelocthat is aMachRelocbut usingModuleExtName(which embeds user-defined name) instead ofir::ExternalNamefor names.
bnjbvr submitted PR review.
bnjbvr updated PR #4551 from wip-05-09_wasm_cache to main.
bnjbvr has enabled auto merge for PR #4551.
bnjbvr has disabled auto merge for PR #4551.
bnjbvr updated PR #4551 from wip-05-09_wasm_cache to main.
bnjbvr has enabled auto merge for PR #4551.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
These haven't yet been re-added.
bnjbvr merged PR #4551.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
@bnjbvr?
bnjbvr created PR review comment:
Ah yeah, seems I added it back in one implementation, but not in the trait. Is that something that you were using, @bjorn3? I could add it back later, if required
bnjbvr submitted PR review.
bjorn3 created PR review comment:
This change means that
declare_func_in_dataanddeclare_data_in_dataare no longer forwarded toFooModulewhen you call them on a&mut FooModule, which is incorrect if those functions have been overwritten byFooModule.
bjorn3 submitted PR review.
bnjbvr created PR review comment:
I see; it's likely subpar design if the default implementation is incorrect for some implementations by default, but that's a different discussion to have :) Will post a small patch for this chunk.
bnjbvr submitted PR review.
Last updated: Dec 13 2025 at 19:03 UTC