bnjbvr opened issue #4155:
Feature
Allow Wasmtime/Cranelift to store compiled artifacts of single functions in a global cache, and when trying to compile a function with the same IR, reload the compiled artifact from the cache instead of recompiling it from scratch.
Benefit
- Speed up recompilation of wasm modules that have already been compiled once, but for which only a very small subset of code changed. Quite impactful for hot-reload use cases and incremental workflows, in particular.
- Speed up compilation of wasm modules sharing lots of inner code, creating a global compilation cache (e.g. when reusing the same libraries in different modules, it's possible the lib code may be present and the same in all the modules).
Implementation
- Since this will require moar dependencies in Cranelift (namely
serde
andbincode
) and some users might not need the compilation cache, I plan to implement this behind a Cargo feature (and behind a dynamic runtime config in Cranelift too, to enable/disable without having to recompile it all).- In Cranelift, simple memoization strategy when compiling:
- Use the function's _meaningful_ CLIF as the input (cache key).
- Use the
MachCompileResult
as the output (cache value)To make this really efficient, the caching system should be resilient to _non-meaningful_ changes in the function's input. For instance, it should be independent of the source locations, but also to adding/removing functions (which would shift function references indices, and thus cause mostly only cache misses). This may involve relocations or other fixups to make this work, just after reloading from the cache.
I also want to make sure that we don't accidentally miss a new field in the
ir::Function
being meaningful in the cache key, so here I think of using a mix of strategies:
- static typing, i.e. use new "cacheable" types for
MachCompileResult
and its fields, that require explicit conversion from/to the non-cacheable types. So when we developers add new fields toMachCompileResult
we think about their being needed in the key or not.- fuzzing. Credit goes to @cfallin for the idea: generate a random CLIF/wasm function F, compile it with the incremental compilation cache, slightly alter the function to obtain another function F", recompile it; if the cache keys for F and F" are the same, compare compiling F" from scratch with the result of loading from the cache + applying the fixups: the results should be exactly the same.
(more implementations strategies / design choices below)
The caching mechanism should also check the compilation configuration (for which target are we compiling, with which cranelift compile flags).
- I hoped to be able to reuse wasmtime-cache's mechanism in there, but most of the metadata doesn't apply cleanly when translated to Cranelift. So I am planning to either split the metadata struct in the bits that apply to Cranelift and those that don't, or to create a new similar data structure.
Provide a way to configure how to store/retrieve the cached bytes (basically a key/value store for which keys and values are
Vec<u8>
or similar). That allows user to implement their cache policy on top of that, and use different backing storage (in memory, on disk), etc.
- Wasmtime would expose a new public API to make this configurable, when enabling the incremental compilation cache. (It could be in the same public API that allows to enable the cache at all, in fact.)
- Is that fine to use a Rust trait here, or will it cause hardships for the C bindings?
Alternatives
- Speed up compilation in general using algorithmic/micro-architectural/memory optimizations, which is a fine but orthogonal goal.
- Have a compilation cache at the wasm level (that is, use the wasm function as the input). Probably a bit simpler, but would not benefit non-wasmtime consumers like cg_clif (... which may not need it, because it might benefit from rustc's incremental compilation framework, likely?)
- [OTHER SMART ALTERNATIVES GO HERE]
bnjbvr labeled issue #4155:
Feature
Allow Wasmtime/Cranelift to store compiled artifacts of single functions in a global cache, and when trying to compile a function with the same IR, reload the compiled artifact from the cache instead of recompiling it from scratch.
Benefit
- Speed up recompilation of wasm modules that have already been compiled once, but for which only a very small subset of code changed. Quite impactful for hot-reload use cases and incremental workflows, in particular.
- Speed up compilation of wasm modules sharing lots of inner code, creating a global compilation cache (e.g. when reusing the same libraries in different modules, it's possible the lib code may be present and the same in all the modules).
Implementation
- Since this will require moar dependencies in Cranelift (namely
serde
andbincode
) and some users might not need the compilation cache, I plan to implement this behind a Cargo feature (and behind a dynamic runtime config in Cranelift too, to enable/disable without having to recompile it all).- In Cranelift, simple memoization strategy when compiling:
- Use the function's _meaningful_ CLIF as the input (cache key).
- Use the
MachCompileResult
as the output (cache value)To make this really efficient, the caching system should be resilient to _non-meaningful_ changes in the function's input. For instance, it should be independent of the source locations, but also to adding/removing functions (which would shift function references indices, and thus cause mostly only cache misses). This may involve relocations or other fixups to make this work, just after reloading from the cache.
I also want to make sure that we don't accidentally miss a new field in the
ir::Function
being meaningful in the cache key, so here I think of using a mix of strategies:
- static typing, i.e. use new "cacheable" types for
MachCompileResult
and its fields, that require explicit conversion from/to the non-cacheable types. So when we developers add new fields toMachCompileResult
we think about their being needed in the key or not.- fuzzing. Credit goes to @cfallin for the idea: generate a random CLIF/wasm function F, compile it with the incremental compilation cache, slightly alter the function to obtain another function F", recompile it; if the cache keys for F and F" are the same, compare compiling F" from scratch with the result of loading from the cache + applying the fixups: the results should be exactly the same.
(more implementations strategies / design choices below)
The caching mechanism should also check the compilation configuration (for which target are we compiling, with which cranelift compile flags).
- I hoped to be able to reuse wasmtime-cache's mechanism in there, but most of the metadata doesn't apply cleanly when translated to Cranelift. So I am planning to either split the metadata struct in the bits that apply to Cranelift and those that don't, or to create a new similar data structure.
Provide a way to configure how to store/retrieve the cached bytes (basically a key/value store for which keys and values are
Vec<u8>
or similar). That allows user to implement their cache policy on top of that, and use different backing storage (in memory, on disk), etc.
- Wasmtime would expose a new public API to make this configurable, when enabling the incremental compilation cache. (It could be in the same public API that allows to enable the cache at all, in fact.)
- Is that fine to use a Rust trait here, or will it cause hardships for the C bindings?
Alternatives
- Speed up compilation in general using algorithmic/micro-architectural/memory optimizations, which is a fine but orthogonal goal.
- Have a compilation cache at the wasm level (that is, use the wasm function as the input). Probably a bit simpler, but would not benefit non-wasmtime consumers like cg_clif (... which may not need it, because it might benefit from rustc's incremental compilation framework, likely?)
- [OTHER SMART ALTERNATIVES GO HERE]
bnjbvr labeled issue #4155:
Feature
Allow Wasmtime/Cranelift to store compiled artifacts of single functions in a global cache, and when trying to compile a function with the same IR, reload the compiled artifact from the cache instead of recompiling it from scratch.
Benefit
- Speed up recompilation of wasm modules that have already been compiled once, but for which only a very small subset of code changed. Quite impactful for hot-reload use cases and incremental workflows, in particular.
- Speed up compilation of wasm modules sharing lots of inner code, creating a global compilation cache (e.g. when reusing the same libraries in different modules, it's possible the lib code may be present and the same in all the modules).
Implementation
- Since this will require moar dependencies in Cranelift (namely
serde
andbincode
) and some users might not need the compilation cache, I plan to implement this behind a Cargo feature (and behind a dynamic runtime config in Cranelift too, to enable/disable without having to recompile it all).- In Cranelift, simple memoization strategy when compiling:
- Use the function's _meaningful_ CLIF as the input (cache key).
- Use the
MachCompileResult
as the output (cache value)To make this really efficient, the caching system should be resilient to _non-meaningful_ changes in the function's input. For instance, it should be independent of the source locations, but also to adding/removing functions (which would shift function references indices, and thus cause mostly only cache misses). This may involve relocations or other fixups to make this work, just after reloading from the cache.
I also want to make sure that we don't accidentally miss a new field in the
ir::Function
being meaningful in the cache key, so here I think of using a mix of strategies:
- static typing, i.e. use new "cacheable" types for
MachCompileResult
and its fields, that require explicit conversion from/to the non-cacheable types. So when we developers add new fields toMachCompileResult
we think about their being needed in the key or not.- fuzzing. Credit goes to @cfallin for the idea: generate a random CLIF/wasm function F, compile it with the incremental compilation cache, slightly alter the function to obtain another function F", recompile it; if the cache keys for F and F" are the same, compare compiling F" from scratch with the result of loading from the cache + applying the fixups: the results should be exactly the same.
(more implementations strategies / design choices below)
The caching mechanism should also check the compilation configuration (for which target are we compiling, with which cranelift compile flags).
- I hoped to be able to reuse wasmtime-cache's mechanism in there, but most of the metadata doesn't apply cleanly when translated to Cranelift. So I am planning to either split the metadata struct in the bits that apply to Cranelift and those that don't, or to create a new similar data structure.
Provide a way to configure how to store/retrieve the cached bytes (basically a key/value store for which keys and values are
Vec<u8>
or similar). That allows user to implement their cache policy on top of that, and use different backing storage (in memory, on disk), etc.
- Wasmtime would expose a new public API to make this configurable, when enabling the incremental compilation cache. (It could be in the same public API that allows to enable the cache at all, in fact.)
- Is that fine to use a Rust trait here, or will it cause hardships for the C bindings?
Alternatives
- Speed up compilation in general using algorithmic/micro-architectural/memory optimizations, which is a fine but orthogonal goal.
- Have a compilation cache at the wasm level (that is, use the wasm function as the input). Probably a bit simpler, but would not benefit non-wasmtime consumers like cg_clif (... which may not need it, because it might benefit from rustc's incremental compilation framework, likely?)
- [OTHER SMART ALTERNATIVES GO HERE]
bnjbvr assigned issue #4155 (assigned to bnjbvr):
Feature
Allow Wasmtime/Cranelift to store compiled artifacts of single functions in a global cache, and when trying to compile a function with the same IR, reload the compiled artifact from the cache instead of recompiling it from scratch.
Benefit
- Speed up recompilation of wasm modules that have already been compiled once, but for which only a very small subset of code changed. Quite impactful for hot-reload use cases and incremental workflows, in particular.
- Speed up compilation of wasm modules sharing lots of inner code, creating a global compilation cache (e.g. when reusing the same libraries in different modules, it's possible the lib code may be present and the same in all the modules).
Implementation
- Since this will require moar dependencies in Cranelift (namely
serde
andbincode
) and some users might not need the compilation cache, I plan to implement this behind a Cargo feature (and behind a dynamic runtime config in Cranelift too, to enable/disable without having to recompile it all).- In Cranelift, simple memoization strategy when compiling:
- Use the function's _meaningful_ CLIF as the input (cache key).
- Use the
MachCompileResult
as the output (cache value)To make this really efficient, the caching system should be resilient to _non-meaningful_ changes in the function's input. For instance, it should be independent of the source locations, but also to adding/removing functions (which would shift function references indices, and thus cause mostly only cache misses). This may involve relocations or other fixups to make this work, just after reloading from the cache.
I also want to make sure that we don't accidentally miss a new field in the
ir::Function
being meaningful in the cache key, so here I think of using a mix of strategies:
- static typing, i.e. use new "cacheable" types for
MachCompileResult
and its fields, that require explicit conversion from/to the non-cacheable types. So when we developers add new fields toMachCompileResult
we think about their being needed in the key or not.- fuzzing. Credit goes to @cfallin for the idea: generate a random CLIF/wasm function F, compile it with the incremental compilation cache, slightly alter the function to obtain another function F", recompile it; if the cache keys for F and F" are the same, compare compiling F" from scratch with the result of loading from the cache + applying the fixups: the results should be exactly the same.
(more implementations strategies / design choices below)
The caching mechanism should also check the compilation configuration (for which target are we compiling, with which cranelift compile flags).
- I hoped to be able to reuse wasmtime-cache's mechanism in there, but most of the metadata doesn't apply cleanly when translated to Cranelift. So I am planning to either split the metadata struct in the bits that apply to Cranelift and those that don't, or to create a new similar data structure.
Provide a way to configure how to store/retrieve the cached bytes (basically a key/value store for which keys and values are
Vec<u8>
or similar). That allows user to implement their cache policy on top of that, and use different backing storage (in memory, on disk), etc.
- Wasmtime would expose a new public API to make this configurable, when enabling the incremental compilation cache. (It could be in the same public API that allows to enable the cache at all, in fact.)
- Is that fine to use a Rust trait here, or will it cause hardships for the C bindings?
Alternatives
- Speed up compilation in general using algorithmic/micro-architectural/memory optimizations, which is a fine but orthogonal goal.
- Have a compilation cache at the wasm level (that is, use the wasm function as the input). Probably a bit simpler, but would not benefit non-wasmtime consumers like cg_clif (... which may not need it, because it might benefit from rustc's incremental compilation framework, likely?)
- [OTHER SMART ALTERNATIVES GO HERE]
bnjbvr commented on issue #4155:
I should add: I've started working on this. I already have something that works for the simplest case, that is, changing a single statement in the source code and recompiling only recompiles the one function that changed. I'm now looking into making it resilient when adding/removing functions internally-defined in the same wasm module (viz. not imported functions yet).
bjorn3 commented on issue #4155:
Can you hash the entirety of
Function
as cache key and instead ensure that theFunction
only changes when there are meaningful changes to the input wasm? For example you could renumber allFuncRef
starting from 0 in a deterministic way. So for example a function calling functions 5 and 3 could getFuncRef(0)
andFuncRef(1)
to reference functions 5 and 3 respectively. If a new function shifts said functions to offset 6 and 4, you would still getFuncRef(0)
andFuncRef(1)
, but a different mapping table. This should even be resilient against a completely different function with the same signature being called as theFuncRef
's wouldn't change. ForVMContext
offsets you could introduce a newGlobalValueData
variant containing a constant allowing relocating ofVMContext
offsets after compilation. For source locations you could tell cranelift-codegen the source locations relative to the start of the function and then add the function offset when actually using them.Also I think this caching implementation can be kept in a separate crate. There is already a cranelift-codegen feature to enable serde serialization and deserialization of
Function
, which should be enough.Probably a bit simpler, but would not benefit non-wasmtime consumers like cg_clif (... which may not need it, because it might benefit from rustc's incremental compilation framework, likely?)
Cg_clif benefits from rustc's incremental compilation framework, but that one only works at the level of codegen units for codegen which is less fine grained than individual functions.
cfallin commented on issue #4155:
Thanks for writing all of this up, @bnjbvr! I am very excited to see how this works out, and early results already show that it should be a huge improvement for users with tight edit-compile-reload loops.
Two things I wanted to add:
A little more detail on the "relocations" idea that we came up with, both to document it here and to see if others can think of related ideas or any issues. The basic idea is that parts of input (CLIF) to Cranelift may vary slightly in unimportant ways when other parts of a Wasm module are changed. These are "renumbering" kinds of problems: the shape of the CLIF is the same, but particular values change.
Two particular examples are:
SourceLoc
s (e.g. all the source code for the slightly-edited input Wasm module shifts down by one line) and offsets in Wasmtime'svmctx
struct at which imported function structs are loaded (e.g. adding an import shifts all of these after the import down by 16 bytes).The basic idea is to make these constants "symbolic" somehow and feed them into the code as fixups after the point at which it's cached. (We called them "relocations" in our discussion but actually I like something like "constant parameters" a bit better now...) Inserting concrete values for constant parameters may involve fixups in the
MachCompileResult
's metadata (e.g. for line-number info) but also maybe in the machine code itself (e.g. a constant offset for a load).The latter case makes things interesting too: e.g. a load may go out-of-range (for example with the 12-bit offset for a reg+imm addressing mode on AArch64). In this case I think we can say that parameter insertion may "fail", and we just fall back to a compilation from scratch. So adding imports might hit a cliff at some point at which all the generated code needs to start using a more verbose access sequence to a huge
vmctx
, but that's OK.Regarding the caching API itself: I think I might actually slightly prefer an "inverted control" scheme here where the embedder controls all access to the KV store, rather than try to reuse something from Wasmtime. In particular, if the CLIF lets us get a "cache key", and if the compiler context exposes the two-step process "compile to cacheable object" and "finalize cacheable object given concrete params/actual CLIF/..." then one could write a wrapper around this that does the right thing with a KV store, but it's more flexible.
The specific reason I favor this is that it keeps Cranelift agnostic to any notions of storage or IO; a compiler really should live as much as possible in the "pure function from data to data" space. And there may be other ways a user would want to use this functionality than we initially imagine and bake into a KV-store trait. Two particular cases that are hard are the C API, as you mention; but also e.g. async (what if the KV store is async? Do large parts of Cranelift become async too?).
cfallin edited a comment on issue #4155:
Thanks for writing all of this up, @bnjbvr! I am very excited to see how this works out, and early results already show that it should be a huge improvement for users with tight edit-compile-reload loops.
Two things I wanted to add:
A little more detail on the "relocations" idea that we came up with, both to document it here and to see if others can think of related ideas or any issues. The basic idea is that parts of input (CLIF) to Cranelift may vary slightly in unimportant ways when other parts of a Wasm module are changed. These are "renumbering" kinds of problems: the shape of the CLIF is the same, but particular values change.
Two particular examples are:
SourceLoc
s (e.g. all the source code for the slightly-edited input Wasm module shifts down by one line) and offsets in Wasmtime'svmctx
struct at which imported function structs are loaded (e.g. adding an import shifts all of these after the import down by 16 bytes).The basic idea is to make these constants "symbolic" somehow and feed them into the code as fixups after the point at which it's cached. (We called them "relocations" in our discussion but actually I like something like "constant parameters" a bit better now...) Inserting concrete values for constant parameters may involve fixups in the
MachCompileResult
's metadata (e.g. for line-number info) but also maybe in the machine code itself (e.g. a constant offset for a load).The latter case makes things interesting too: e.g. a load may go out-of-range (for example with the 12-bit offset for a reg+imm addressing mode on AArch64). In this case I think we can say that parameter insertion may "fail", and we just fall back to a compilation from scratch. So adding imports might hit a cliff at some point at which all the generated code needs to start using a more verbose access sequence to a huge
vmctx
, but that's OK.Regarding the caching API itself: I think I might actually slightly prefer an "inverted control" scheme here where the embedder controls all access to the KV store, rather than try to reuse something from Wasmtime. In particular, if the CLIF lets us get a "cache key", and if the compiler context exposes the two-step process "compile to cacheable object" and "finalize cacheable object given concrete params/actual CLIF/..." then one could write a wrapper around this that does the right thing with a KV store, but it's more flexible.
The specific reason I favor this is that it keeps Cranelift agnostic to any notions of storage or IO; a compiler really should live as much as possible in the "pure function from data to data" space. And there may be other ways a user would want to use this functionality than we initially imagine and bake into a KV-store trait. Two particular cases that are hard are the C API, as you mention; but also e.g. async (what if the KV store is async? Do large parts of Cranelift become async too?).
alexcrichton commented on issue #4155:
From an internal design perspective I would recommend following what
wasmtime-cache
is already doing which is to have a function that looks like:pub fn get_data_raw<T, U, E>( &self, state: &T, // NOTE: These are function pointers instead of closures so that they // don't accidentally close over something not accounted in the cache. compute: fn(&T) -> Result<U, E>, serialize: fn(&T, &U) -> Option<Vec<u8>>, deserialize: fn(&T, Vec<u8>) -> Option<U>, ) -> Result<U, E> where T: Hash, { // ... }
Here
state
is explicitlyHash
to generate a sha or something like that and thecompute
,serialize
, anddeserialize
functions are all explicitly function pointers that cannot close over state. This means that the computation ofU
, the memoization of thecompute
computation overT
, is guaranteed to reasonably be a function ofT
.Ideally the
wasmtime-cache
bits could either be reused for this or somehow refactored to not duplicate this. Historically I have found that stale caches, especially with incremental compilation, are so difficult to debug and diagnose that getting the interface right so that it's practically never broken is extremely important.
bnjbvr commented on issue #4155:
(Will use ICC for incremental compilation cache from now on :))
Can you hash the entirety of Function as cache key and instead ensure that the Function only changes when there are meaningful changes to the input wasm? For example you could renumber all FuncRef starting from 0 in a deterministic way. So for example a function calling functions 5 and 3 could get FuncRef(0) and FuncRef(1) to reference functions 5 and 3 respectively. If a new function shifts said functions to offset 6 and 4, you would still get FuncRef(0) and FuncRef(1), but a different mapping table. This should even be resilient against a completely different function with the same signature being called as the FuncRef's wouldn't change. For VMContext offsets you could introduce a new GlobalValueData variant containing a constant allowing relocating of VMContext offsets after compilation. For source locations you could tell cranelift-codegen the source locations relative to the start of the function and then add the function offset when actually using them.
I think that's a fine idea, but I wonder if this wouldn't bias the code towards the incremental cache a bit too much, making it non "zero-cost" for users who disabled the incremental compilation cache (ICC) feature? Say, when it comes to source locations, this means that first we'd transform all source locations into relative (from start) ones, then if the ICC isn't used, re-transform them back later into absolute source locations (for the
MachReloc
s to be correct) -- or carry different meanings for these fields whether the ICC is enabled or not. This back-and-forth wouldn't add any value to users not using the ICC, so I tend to slightly prefer the opposite choice which is to have bespoke "mirror" data structures for the compilation cache (which could contain plain references toFunction
's fields, when they don't need any change, and only introduce new ones when they do need internal changes).The strategies you mention here can still be used, though. In fact the relative-from-start source location was what I implemented for that specific problem indeed, and I had a similar idea of a scheme for function refs.
Also I think this caching implementation can be kept in a separate crate. There is already a cranelift-codegen feature to enable serde serialization and deserialization of Function, which should be enough.
That would be really nice if it could be a separate crate, yet I am unclear if it's doable because currently the ICC uses lots of cranelift-codegen internals. Will keep this in mind, and we can reconsider this later when closer to the "final" design and implementation.
bnjbvr commented on issue #4155:
@alexcrichton @cfallin I am trying to consolidate the two design suggestions, so rephrasing to make sure I've understood clearly:
- Alex is proposing to reuse the backing storage that's used for Wasmtime caching of modules, since there's already a nice API that handles that, with safety backed in the function signature and generics.
- If I have understood clearly: this is something I would like to avoid, since our use case implies having control over the storage itself, and not reuse the on-disk cache directory that wasmtime uses. (In fact, we're not using wasmtime-cache because of that need for control over the backing storage for the cache.) Also, I think that the
compute
in this case would be one of the methods exposed incranelift_codegen
and proposed by Chris, so it wouldn't really need to be provided by a wasmtime user, then, right?- Chris is proposing that cranelift-codegen exposes 2 methods: one to perform the compile step with constants, and then another one to finish the compilation by fixing up the constants.
- This makes sense to me, and indeed makes it possible to have all sorts of IO providers on top of that. Then from looking at the callers of
cranelift_codegen::compile
, I think that we'd need a way to specify _another_wasmtime_environ::compilation::Compiler
impl inModule::build_artifacts
, so we can use the ICC methods provided bycranelift-codegen
there. Am I understanding this correctly? If so, would refactoring theEngine
's code to use aBox<dyn Compiler>
instead of the concretewasmtime_environ::Compiler
impl make sense, as well as providing a public API to set adyn Compiler
impl that should be used at runtime? (If not, I'm happy to chat on Zulip!)
alexcrichton commented on issue #4155:
To clarify I'm talking more about methods of caching moreso than using specific code exactly as-is without any modifications. The strategy used by
wasmtime-cache
I feel is pretty solid and we should be quite deliberate about migrating away from that if we do.I do think it would be a not-great state to be in to have two caching systems at the end of this, so I think in the long run we don't want two entirely separate systems for caching things that know nothing of each other. In the short-to-mid term though it's fine to develop whatever and refactor things later to share more code.
bjorn3 commented on issue #4155:
Say, when it comes to source locations, this means that first we'd transform all source locations into relative (from start) ones, then if the ICC isn't used, re-transform them back later into absolute source locations (for the MachRelocs to be correct)
We already need transforming between the output format of cranelift and the serialized format of wasmtime. It is just a cheap addition which can be folded into this transformation. Basically just add the addition here: https://github.com/bytecodealliance/wasmtime/blob/cb13175c429d6b3e63c8122ce6427b4529a2873d/crates/cranelift/src/compiler.rs#L652
so I tend to slightly prefer the opposite choice which is to have bespoke "mirror" data structures for the compilation cache (which could contain plain references to Function's fields, when they don't need any change, and only introduce new ones when they do need internal changes).
That would make the incremental compilation cache depend on the original version of the function, right? I'm afraid that would turn hard to debug bugs like is normally the case for incremental compilation related bugs into straight up nightmares.
bnjbvr commented on issue #4155:
It is just a cheap addition which can be folded into this transformation. Basically just add the addition here:
I guess it's not "just" that :-). In practice every transitive field that contains a
SourceLoc
inMachCompileResult
should get the same treatment, so that also means tweakingmach_reloc_to_reloc
,mach_trap_to_trap
, etc. That also means moving the responsibility of doing those fixups from Cranelift to the consumer of the compiled artifact, when the initial intent was to make it as easy as possible for every Cranelift user to have something that works out of the box without requiring them to do extra work. (This argument has limited applicability as the two main users are wasmtime and cg_clif, but still hold in theory at least.)I do hear your concerns about running into incremental compilation related bugs, though, which would be mitigated by the mix of static typing sanity + fuzzing; any other idea to make it more robust is appreciated here too.
cfallin commented on issue #4155:
To unify things a bit: I think that we could, and probably should, use the same caching mechanism for both the Wasmtime module-level cache and any Cranelift function-level cache. I was mainly concerned above about flexibility (unsure with a synchronous API like wasmtime-cache: how would one use an async external storage provider, e.g. a key-value store across a network?).
I agree with @alexcrichton that we should design to make it impossible to get cache keys wrong. To that end if we have a "put together the pieces yourself" option, I think that we can still make it completely safe by storing the expected CLIF hash in the cached value as well, and checking that on return. In essence the interface that I think provides maximum flexibility is something like:
Context::compile_to_cacheable() -> (CacheKey, Vec<u8>)
: from a Cranelift function, get bytes and a cache keyContext::get_expected_cache_key() -> CacheKey
: from a Cranelift function, get the cache key for the cached compilation result we need, if anyContext::compile_maybe_with_cache(bytes: Option<&[u8]>) -> MachCompileResult
: from a Cranelift function, compile to machine code. If the given bytes are a valid compilation result fromcompile_to_cacheable()
with a matching hash to what's embedded in the bytes, then use them; otherwise fall back to from-scratch compilation. Also fall back to from-scratch if cached result isn't usable due to e.g. an offset going out-of-bounds with vmctx shifts.This is fully "dynamically safe" in that the cached data is only used if the cache key matches, and the cache key production is internal to Cranelift and opaque to outside. We can then use the type-safety ideas that Ben mentions (wrapper types on fields) to make that CacheKey correct.
This together with the fuzzing to verify correctness of incremental compilation I suspect is enough, and this whole approach grants the flexibility that Ben's case needs.
Then tying this back to Wasmtime:
wasmtime-cache
could potentially use these building blocks to cache functions as well as whole modules; or an advanced user of Cranelift could use them directly. Thoughts?
bjorn3 commented on issue #4155:
I guess it's not "just" that :-). In practice every transitive field that contains a SourceLoc in MachCompileResult should get the same treatment, so that also means tweaking mach_reloc_to_reloc, mach_trap_to_trap, etc.
mach_reloc_to_reloc ignores srcloc and so does mach_trap_to_trap. collect_address_maps seems to be the only that actually reads srcloc.
bnjbvr commented on issue #4155:
mach_reloc_to_reloc ignores srcloc and so does mach_trap_to_trap. collect_address_maps seems to be the only that actually reads srcloc.
Indeed, I even started a patch to remove those as they're really unused in the code base,... and then I recalled that Spidermonkey would use those (if it integrated with the new backend). Anyhow, I don't still think it would make for a good API to request any cranelift embedder to do this kind of transform themselves, when it can just be done in Cranelift directly, so I'm going to stick with my plan (which is more aligned with what Chris suggests too), incorporating the good ideas from yours :)
bjorn3 commented on issue #4155:
Indeed, I even started a patch to remove those as they're really unused in the code base,... and then I recalled that Spidermonkey would use those (if it integrated with the new backend).
mach_reloc_to_reloc and friends are in wasmtime-cranelift so spidermonkey can't use them even if it wanted to. They are the translation between Cranelift's representation and Wasmtime's serializable representation of the compilation results.
bnjbvr commented on issue #4155:
I was talking about the
srcloc
fields inMachTrap
et al. In fact I've received confirmation from the Spidermonkey folks that they might not need it, so will open a PR to remove these, as an incremental step towards this.(Sorry all for the noise caused by this sidetracked discussion, and @bjorn3 happy to follow up in Zulip if you'd like to chat more about this.)
bnjbvr closed issue #4155 (assigned to bnjbvr):
Feature
Allow Wasmtime/Cranelift to store compiled artifacts of single functions in a global cache, and when trying to compile a function with the same IR, reload the compiled artifact from the cache instead of recompiling it from scratch.
Benefit
- Speed up recompilation of wasm modules that have already been compiled once, but for which only a very small subset of code changed. Quite impactful for hot-reload use cases and incremental workflows, in particular.
- Speed up compilation of wasm modules sharing lots of inner code, creating a global compilation cache (e.g. when reusing the same libraries in different modules, it's possible the lib code may be present and the same in all the modules).
Implementation
- Since this will require moar dependencies in Cranelift (namely
serde
andbincode
) and some users might not need the compilation cache, I plan to implement this behind a Cargo feature (and behind a dynamic runtime config in Cranelift too, to enable/disable without having to recompile it all).- In Cranelift, simple memoization strategy when compiling:
- Use the function's _meaningful_ CLIF as the input (cache key).
- Use the
MachCompileResult
as the output (cache value)To make this really efficient, the caching system should be resilient to _non-meaningful_ changes in the function's input. For instance, it should be independent of the source locations, but also to adding/removing functions (which would shift function references indices, and thus cause mostly only cache misses). This may involve relocations or other fixups to make this work, just after reloading from the cache.
I also want to make sure that we don't accidentally miss a new field in the
ir::Function
being meaningful in the cache key, so here I think of using a mix of strategies:
- static typing, i.e. use new "cacheable" types for
MachCompileResult
and its fields, that require explicit conversion from/to the non-cacheable types. So when we developers add new fields toMachCompileResult
we think about their being needed in the key or not.- fuzzing. Credit goes to @cfallin for the idea: generate a random CLIF/wasm function F, compile it with the incremental compilation cache, slightly alter the function to obtain another function F", recompile it; if the cache keys for F and F" are the same, compare compiling F" from scratch with the result of loading from the cache + applying the fixups: the results should be exactly the same.
(more implementations strategies / design choices below)
The caching mechanism should also check the compilation configuration (for which target are we compiling, with which cranelift compile flags).
- I hoped to be able to reuse wasmtime-cache's mechanism in there, but most of the metadata doesn't apply cleanly when translated to Cranelift. So I am planning to either split the metadata struct in the bits that apply to Cranelift and those that don't, or to create a new similar data structure.
Provide a way to configure how to store/retrieve the cached bytes (basically a key/value store for which keys and values are
Vec<u8>
or similar). That allows user to implement their cache policy on top of that, and use different backing storage (in memory, on disk), etc.
- Wasmtime would expose a new public API to make this configurable, when enabling the incremental compilation cache. (It could be in the same public API that allows to enable the cache at all, in fact.)
- Is that fine to use a Rust trait here, or will it cause hardships for the C bindings?
Alternatives
- Speed up compilation in general using algorithmic/micro-architectural/memory optimizations, which is a fine but orthogonal goal.
- Have a compilation cache at the wasm level (that is, use the wasm function as the input). Probably a bit simpler, but would not benefit non-wasmtime consumers like cg_clif (... which may not need it, because it might benefit from rustc's incremental compilation framework, likely?)
- [OTHER SMART ALTERNATIVES GO HERE]
Last updated: Jan 24 2025 at 00:11 UTC