grovesNL opened PR #7828 from grovesNL:rustc-hash
to bytecodealliance:main
:
From a Zulip discussion a few months ago (https://bytecodealliance.zulipchat.com/#narrow/stream/217126-wasmtime/topic/Custom.20hashers) it was mentioned that custom hashers like
rustc-hash
should be ok for most usage of hashmaps/indexmaps because they don't need DoS protection.
exports
showed up in a recent profile (e.g., when_get_export
doesinstance.module().exports.get_full(name)?;
) so I thought it might be good to start by addingrustc-hash
here. Because this is used inget_export
, this optimization should be useful generally.
grovesNL requested alexcrichton for a review on PR #7828.
grovesNL requested wasmtime-core-reviewers for a review on PR #7828.
grovesNL requested wasmtime-default-reviewers for a review on PR #7828.
alexcrichton commented on PR #7828:
Thanks for this! This is a bit of an interesting question to me though because Wasmtime is in a different position than rustc and is exposed to untrusted modules and this could in theory be an issue for embedders. That being said I think it'd still be good to solve this. One idea would be to expose the underlying indices that Wasmtime uses. For example instead of looking up via name on instances each time you could look up the name on a
Module
once, get an index, and then use that index with instantiations of all future modules. That should in theory be even faster than a rustc-hash based approach.Would a scheme like that work for your use case? If so would you be interested in adapting this PR to such an approach?
grovesNL commented on PR #7828:
Yeah that makes sense for embedders. In my case I'm both the producer and consumer of the module (e.g., I'm writing the export names during encoding, and then immediately looking them up from the instance) so DoS isn't important. It would be useful to override the hasher throughout most types in cases like this but I can appreciate not wanting to complicate type signatures too.
That scheme sounds ok, but I'm not sure how to expose the index we'd need without mapping bidirectionally between
EntityIndex
andusize
somewhere. For example,exports
isIndexMap<String, EntityIndex>
so it doesn't seem like we have a great way to expose eitherEntityIndex
orusize
(the actual index for the map):
If we expose
EntityIndex
(the value for the index map) on the originalwasmtime::Module
, then we don't have a clear way to convert back from anEntityIndex
to theusize
needed to look it up on the instance later on. e.g., theEntityIndex
tousize
mapping isn't preserved duringtranslate_payload
so we can't recover it afterwards except by iterating all exports to find it.If we expose
usize
, then we'd still have to do some kind of name tousize
lookup after we have the instance, but that would have the same overhead as the originalget_export
name lookup.Ideally I could get the
EntityIndex
from the originalwasmtime::Module
or query theusize
by name fromInstancePre
somehow (so I could share it across all instances created from it).
alexcrichton commented on PR #7828:
Oh the raw value here that
get_export
is interested in is theExportIndex
. There's no need to recomputed theusize
index of the name in the export map. For that either returningExportIndex
raw or perhaps a wrapper around it should be sufficient (along with a new method likeget_export
that takes this as an argument)
grovesNL commented on PR #7828:
If we add a function like this to
wasmtime::Module
:pub fn get_export_index(&self, name: &str) -> Option<EntityIndex> { let module = self.compiled_module().module(); module.exports.get(name).copied() }
Then on
Instance
we might want want something like:pub fn get_export_by_index(&self, mut store: impl AsContextMut, index: EntityIndex) -> Option<Extern> { // ... same boilerplate as `get_export` ... let item = unsafe { Extern::from_wasmtime_export(instance.get_export_by_index(index), store) }; let data = &mut store[self.0]; data.exports[i] = Some(item.clone()); Some(item) }
The part I'm not sure about is setting the lazily populated
data.exports[i]
with the export before we return it. We don't seem to be able to determine the index (i
) fromExportIndex
without searching theexports
by value, because theexports
indexmap is keyed by the export name, not itsEntityIndex
.
alexcrichton commented on PR #7828:
Aha I apologize, I had forgotten about that. That was why the
usize
you were talking about is important!How about this:
impl Module { fn get_export(&self, name: &str) -> Option<ModuleExport>; } #[derive(Copy, Clone)] pub struct ModuleExport { module: CompiledModuleId, // make sure this isn't mixed up with different modules entity: EntityIndex, // raw index into the wasm module export_name_index: usize, // the index within the `IndexMap` } impl Instance { pub fn get_module_export(&self, ..., export: &ModuleExport) -> Option<Export>; }
(please bikeshed the names)
Basically package everything up in a new struct in the
wasmtime
crate
alexcrichton commented on PR #7828:
I suppose stepping back a little bit for a moment, one thing I would say is that in Wasmtime we've tried to basically avoid hash maps wherever we can. Where possible we use indexes into lists rather than strings to entries. While hash maps are necessary in the API somewhere it's often possible to lift the hash lookups outside of loops. Or at least that's what's worked out for us so far, not that we can't make the hash maps faster here just that we'd prefer to explore other avenues first (and also because even a faster hash map is probably much slower than a simple bounds check)
grovesNL updated PR #7828.
grovesNL updated PR #7828.
grovesNL commented on PR #7828:
Thanks for clarifying! I updated the PR to match the scheme you described.
Is the purpose of including
CompiledModuleId
onModuleExport
just out of convenience for user-code, or should we compare these somewhere duringget_module_export
?
grovesNL updated PR #7828.
grovesNL updated PR #7828.
Last updated: Nov 22 2024 at 16:03 UTC