Hi! Some of the hashmaps used internally seem to show up in flamegraphs for my program that creates a lot of smaller wasm modules. e.g., https://github.com/bytecodealliance/wasmtime/blob/cd97c9f14713d81de5453fb9861319e78a761432/crates/wasmtime/src/linker.rs#L88 is one of them. Is there any interest in being able to pass custom hashers (like rustc-hash
) to help with hashing performance?
Would another option be to change the HashMap implementation used here? It doesn't seem to me like this would be a DoS attack vector, so perhaps a fast HashMap that'd be less robust against those would work here? @Alex Crichton, thoughts?
Ah yeah we likely don't need dos protection for most of our usage of hash maps, so rather than specifying custom hashers as an embedder it should be fine to hardcode many hash maps to a faster hasher.
That being said though I'd probably first go the route of trying to reduce the usage of APIs that trigger hashing. That's not always possible, though, for example if you have a single Linker
and instantiate lots of small moudles through that there's not much we can do to reduce the hashing any more.
Yeah I currently use separate Linker
s per module but still spend a significant amount of time hashing during the initial insert
s. I can try to submit a PR for some of the spots I notice, would you prefer any specific fast hasher (rustc-hash, ahash, something else)?
we have ahash in our audits, so that'd probably be good to use. Though I think rustc-hash should also just work in that regard
Isn't ahash the default rust hasher now (via being hashbrown's default)? Ah no I see libstd opts out of ahash.
Is it not possible to use one Linker
for all modules?
(that'd be best to reduce all the insertions and clones involved there too)
Yeah I also want to try that soon, although I might have the opposite problem of having a large hashmap to search afterwards. I think a faster hasher would be beneficial either way
indeed!
Also, are you doing repeated instantiations of the same module? Or instantiations of new modules primarily?
I ask b/c InstancePre
can help with the former, repeated instantaitions of the same module, and skips hash maps entirely
That's also a good idea, it's mostly new modules right now but I think I could probably reuse many of them. I need to spend some more understanding the implications of reusing modules and linkers, especially trade-offs (e.g., inlining different constants into a module from the host at compilation time with separate modules, vs. reusing one module but using different environments or something)
FWIW if it helps with context, my use case is analogous to Excel/Google Sheets spreadsheets where each formula becomes its own tiny wasm module (using its own linker) in my current design
So it's easy to end up with tens of thousands-ish modules if they're all completely unique, but a lot of the time the formulas are the exact same but just need slightly different environments passed to them
ah ok makes sense, well in any case switching hashers definitely makes sense, and any other optimizations you'd like to apply to Linker
or the compilation process in general I think would make sense too
most of Wasmtime's optimizations have been around runtime and InstancePre
-to-Instance
so this part isn't quite so well optimized
Josh Groves said:
e.g., https://github.com/bytecodealliance/wasmtime/blob/cd97c9f14713d81de5453fb9861319e78a761432/crates/wasmtime/src/linker.rs#L88 is one of them.
How much is it specifically that one? It looks like ImportKey
is two indexes into an interned strings list? If so, that's a great candidate for some type-specific hashing logic. Or maybe even making a struct InternKey(u32);
instead of using raw usize
to shrink ImportKey from 16 bytes down to 8 on 64-bit, which save a bunch of hashing work.
(This is totally a peanut-gallery comment because zulip put this thread in a digest and I've been looking recently at hashtables where the key is two integers. Feel free to ignore if I'm off-base.)
Unfortunately I don’t have great stats right now, the sampling profiler in vtune makes it difficult to get detailed statistics for sub-millisecond code without setting up artificial microbenchmarks. I just noticed SipHash show up in a few hot paths for wasmtime so I figured it might be worth raising here. That partialeq optimization looks neat!
It's a fun one, though I'm hopeful that between Rust and LLVM we can make it unnecessary in future (<https://github.com/rust-lang/rust/issues/117800>).
Entity
in bevy shows up everywhere, so it's worth some uglification for even tiny speedups. I don't know nearly enough about wasmtime to know if ImportKey
is anywhere close to as critical.
Opened https://github.com/bytecodealliance/wasmtime/pull/7828 as a follow-up to this conversation
Last updated: Dec 23 2024 at 13:07 UTC