wasmtime / issue #13355 Switching to a faster hashmap imp... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #13355 Switching to a faster hashmap imp...

Wasmtime GitHub notifications bot (May 13 2026 at 20:45):

jeffcharles opened issue #13355:

Feature

Would it be possible to have Wasmtime use a faster implementation for hashmaps and hashsets?

Benefit

When benchmarking Wasmtime 42.0.2 compared to Wasmtime 41.0.4 for some of our workloads, I noticed a ~14% regression in terms of wall clock time and estimated cycles. When I update crates/environ/src/collections.rs to use hashbrown instead of the standard library hashmap and hashset based off undoing part of #12509, that regression drops to ~7%. Our use-case for Wasmtime is in a latency sensitive environment.

Implementation

Naively, would it be possible/advisable to use hashbrown or fxhash as the hashmap and hashset implementations when std is enabled until collections which are able to handle OOMs are in place? Given they perform faster than the standard library hashmaps and hashsets and I don't think we need the cryptographic security in the standard library's implementations.

Alternatives

Maybe a configurable hash implementation? We're willing to tolerate OOMs hypothetically causing a process abort if it reduces the amount of the performance regression we're seeing.

Wasmtime GitHub notifications bot (May 13 2026 at 21:02):

alexcrichton commented on issue #13355:

Would you be able to share and/or assemble some workloads that regressed? Changing hash maps and algorithms is totally on the table and reasonable to do, but depending on where the slowdown is coming from we might be able to remove the hash maps entirely and/or get some other larger win.

Wasmtime GitHub notifications bot (May 13 2026 at 21:46):

bjorn3 commented on issue #13355:

FWIW libstd's hashmap already uses hashbrown internally. The performance difference is almost certainly caused by libstd using a HashDOS resistent hasher by default. You can override the hashes for libstd's hashmap too.

Wasmtime GitHub notifications bot (May 15 2026 at 19:57):

jeffcharles commented on issue #13355:

Would you be able to share and/or assemble some workloads that regressed?

Take a look at https://github.com/jeffcharles/wasmtime-42-perf-analysis. The repo includes a ./run-callgrind.sh script to run a callgrind benchmark in an x86 OCI container. The main branch uses Wasmtime 41, the wasmtime-42 branch uses Wasmtime 42, and the wasmtime-42-hashbrown branch uses a fork of Wasmtime 42 that uses hashbrown instead.

The results when comparing main (Wasmtime 41) and wasmtime-42:

  Instructions:                      157275|136774               (+14.9890%) [+1.14989x]
  L1 Hits:                           205114|181013               (+13.3145%) [+1.13315x]
  LL Hits:                             2600|2913                 (-10.7449%) [-1.12038x]
  RAM Hits:                            2238|2304                 (-2.86458%) [-1.02949x]
  Total read+write:                  209952|186230               (+12.7380%) [+1.12738x]
  Estimated Cycles:                  296444|276218               (+7.32248%) [+1.07322x]

The results when comparing main (Wasmtime 41) and wasmtime-42-hashbrown:

  Instructions:                      128474|135948               (-5.49769%) [-1.05818x]
  L1 Hits:                           170470|179969               (-5.27813%) [-1.05572x]
  LL Hits:                             2509|2836                 (-11.5303%) [-1.13033x]
  RAM Hits:                            2272|2279                 (-0.30715%) [-1.00308x]
  Total read+write:                  175251|185084               (-5.31272%) [-1.05611x]
  Estimated Cycles:                  262535|273914               (-4.15422%) [-1.04334x]

The code being benchmarked is in raw_run_module.

Wasmtime GitHub notifications bot (May 16 2026 at 00:31):

alexcrichton commented on issue #13355:

Thanks! Without going too deep down the docker/callgrind hole, I lightly edited it to just be raw criterion and I'm showing a 4% regression in wall time from Wasmtime 41 to Wasmtime 42. A samply-based profile looks like this.

From this it looks like the main hash map related location is the Linker, and that's what would in theory need to change. There's a few things about this worth pointing out:

We generally consider the Linker a create-once primitive where it's not designed for clone/insertion to be on the hot path, as it is here. The natural implementation of this use case sort of requires it though due to this leveraging runtime linking of one instance to another. This is something I'd consider a bit of a gap in Wasmtime's embedder API where you're unable to leverage InstancePre, for example, without further changes. The absolute ideal performance here will come about if you're able to link these modules together statically and have that compiled by Wasmtime. That way you'd be able to avoid hash maps entirely on the hot path.

Using a faster hash algorithm here is a bit tricky because the Linker is sometimes user-controlled and sometimes host-controlled. It's neither obvious that a DoS resistant hash is needed nor that it's specifically not needed. One theoretical option here would be that we could add a type parameter to Linker to allow embeddings to control this, and you'd be able to configure it in this use case to something faster.

So, on one hand, yes, I think we could either just switch Linker or expose a type parameter to use a non-DoS-resistant hash. I haven't tested locally the perf impact of that, however. On the other hand, though, if you're interested in the fastest possible execution time it'll be side-stepping this entierly. The "easiest" option would be to use walrus or something similar to combine the two core wasm modules here into one. You'd resolve imports of one to another and the final module would only have the resulting imports. This would be a relatively invasive change, however, and I understand if you don't have appetite for such a change.

Nevertheless I wanted to at least write this all down. I wasn't able to repro 7% or a 14% regression, but that could just be a difference in hardware perhaps.

Wasmtime GitHub notifications bot (May 19 2026 at 14:45):

jeffcharles commented on issue #13355:

Thanks for writing that up! Having an API like InstancePre except stateless (that is, it would define hashmap entries with stubbed values for the imports that would get replaced at instantiation time) would likely help without having to give up dynamic linking. But I can understand a reluctance to add something like this just for us. And point taken on us having an option to statically link with an additional ahead of time transformation of the guest Wasm.

And yes, I did notice a difference in the performance change between x86 and AArch64. Tried to use callgrind on x86 to ensure some degree of consistency with the numbers.

Wasmtime GitHub notifications bot (May 19 2026 at 15:17):

jeffcharles commented on issue #13355:

Thinking about this a little more, we require some approach that can enable dynamic linking between fresh instances in the hot path. We have extremely aggressive upper limits on the size of final Wasm modules to minimize memory use and minimize latency fetching them so statically linking them to dynamic libraries ahead-of-time is not feasible.

Wasmtime GitHub notifications bot (May 26 2026 at 22:28):

alexcrichton commented on issue #13355:

Perhaps the fastest option for you in the meantime would be to invoke Instance::new directly? You could precompute ahead of time what exports to extract and pass in to various places (e.g. via introspection, similar to what Linker does). That still won't be the fastest path since it'll re-type-check everything on all instantiations, but it'll avoid needing to clone a Linker and/or manage items within it, bypassing hash maps entirely. Would that be feasible for you?

Wasmtime GitHub notifications bot (May 27 2026 at 14:23):

jeffcharles commented on issue #13355:

Thank you for the suggestion! I can explore that and see if that works for us.

Last updated: Jul 29 2026 at 05:03 UTC