Stream: git-wasmtime

Topic: wasmtime / PR #7828 Use `rustc-hash` for module exports


view this post on Zulip Wasmtime GitHub notifications bot (Jan 26 2024 at 16:10):

grovesNL opened PR #7828 from grovesNL:rustc-hash to bytecodealliance:main:

From a Zulip discussion a few months ago (https://bytecodealliance.zulipchat.com/#narrow/stream/217126-wasmtime/topic/Custom.20hashers) it was mentioned that custom hashers like rustc-hash should be ok for most usage of hashmaps/indexmaps because they don't need DoS protection.

exports showed up in a recent profile (e.g., when _get_export does instance.module().exports.get_full(name)?;) so I thought it might be good to start by adding rustc-hash here. Because this is used in get_export, this optimization should be useful generally.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 26 2024 at 16:10):

grovesNL requested alexcrichton for a review on PR #7828.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 26 2024 at 16:10):

grovesNL requested wasmtime-core-reviewers for a review on PR #7828.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 26 2024 at 16:10):

grovesNL requested wasmtime-default-reviewers for a review on PR #7828.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 26 2024 at 19:02):

alexcrichton commented on PR #7828:

Thanks for this! This is a bit of an interesting question to me though because Wasmtime is in a different position than rustc and is exposed to untrusted modules and this could in theory be an issue for embedders. That being said I think it'd still be good to solve this. One idea would be to expose the underlying indices that Wasmtime uses. For example instead of looking up via name on instances each time you could look up the name on a Module once, get an index, and then use that index with instantiations of all future modules. That should in theory be even faster than a rustc-hash based approach.

Would a scheme like that work for your use case? If so would you be interested in adapting this PR to such an approach?

view this post on Zulip Wasmtime GitHub notifications bot (Jan 26 2024 at 20:31):

grovesNL commented on PR #7828:

Yeah that makes sense for embedders. In my case I'm both the producer and consumer of the module (e.g., I'm writing the export names during encoding, and then immediately looking them up from the instance) so DoS isn't important. It would be useful to override the hasher throughout most types in cases like this but I can appreciate not wanting to complicate type signatures too.

That scheme sounds ok, but I'm not sure how to expose the index we'd need without mapping bidirectionally between EntityIndex and usize somewhere. For example, exports is IndexMap<String, EntityIndex> so it doesn't seem like we have a great way to expose either EntityIndex or usize (the actual index for the map):

Ideally I could get the EntityIndex from the original wasmtime::Module or query the usize by name from InstancePre somehow (so I could share it across all instances created from it).

view this post on Zulip Wasmtime GitHub notifications bot (Jan 26 2024 at 21:58):

alexcrichton commented on PR #7828:

Oh the raw value here that get_export is interested in is the ExportIndex. There's no need to recomputed the usize index of the name in the export map. For that either returning ExportIndex raw or perhaps a wrapper around it should be sufficient (along with a new method like get_export that takes this as an argument)

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2024 at 00:05):

grovesNL commented on PR #7828:

If we add a function like this to wasmtime::Module:

pub fn get_export_index(&self, name: &str) -> Option<EntityIndex> {
    let module = self.compiled_module().module();
    module.exports.get(name).copied()
}

Then on Instance we might want want something like:

pub fn get_export_by_index(&self, mut store: impl AsContextMut, index: EntityIndex) -> Option<Extern> {
    // ... same boilerplate as `get_export` ...
    let item = unsafe { Extern::from_wasmtime_export(instance.get_export_by_index(index), store) };
    let data = &mut store[self.0];
    data.exports[i] = Some(item.clone());
    Some(item)
}

The part I'm not sure about is setting the lazily populated data.exports[i] with the export before we return it. We don't seem to be able to determine the index (i) from ExportIndex without searching the exports by value, because the exports indexmap is keyed by the export name, not its EntityIndex.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2024 at 00:23):

alexcrichton commented on PR #7828:

Aha I apologize, I had forgotten about that. That was why the usize you were talking about is important!

How about this:

impl Module {
    fn get_export(&self, name: &str) -> Option<ModuleExport>;
}

#[derive(Copy, Clone)]
pub struct ModuleExport {
    module: CompiledModuleId, // make sure this isn't mixed up with different modules
    entity: EntityIndex, // raw index into the wasm module
    export_name_index: usize, // the index within the `IndexMap`
}

impl Instance {
    pub fn get_module_export(&self, ..., export: &ModuleExport) -> Option<Export>;
}

(please bikeshed the names)

Basically package everything up in a new struct in the wasmtime crate

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2024 at 00:27):

alexcrichton commented on PR #7828:

I suppose stepping back a little bit for a moment, one thing I would say is that in Wasmtime we've tried to basically avoid hash maps wherever we can. Where possible we use indexes into lists rather than strings to entries. While hash maps are necessary in the API somewhere it's often possible to lift the hash lookups outside of loops. Or at least that's what's worked out for us so far, not that we can't make the hash maps faster here just that we'd prefer to explore other avenues first (and also because even a faster hash map is probably much slower than a simple bounds check)

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2024 at 02:06):

grovesNL updated PR #7828.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2024 at 02:07):

grovesNL updated PR #7828.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2024 at 02:11):

grovesNL commented on PR #7828:

Thanks for clarifying! I updated the PR to match the scheme you described.

Is the purpose of including CompiledModuleId on ModuleExport just out of convenience for user-code, or should we compare these somewhere during get_module_export?

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2024 at 02:15):

grovesNL updated PR #7828.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2024 at 02:16):

grovesNL updated PR #7828.


Last updated: Dec 23 2024 at 12:05 UTC