Stream: git-wasmtime

Topic: wasmtime / PR #5035 Implement loading modules from piped ...


view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 11:13):

cr0sh edited PR #5035 from run-piped-modules to main:

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

This is not discussed in the issue tracker, but on this Zulip thread.

Previously load_module opened a file on path to check if the module is precompiled, then closes before constructing the Module from it. This is fine on normal files, but on piped files(like /dev/stdin), is problematic as the header read is not visible on the next open.

So, this set of commits change the procedure to 1) try to mmap the whole file from the path. 2) If it fails(implying the file is something special), read the whole file into the buffer so we can safely read the header and reuse the buffer again on the subsequent Module::new.

The Zulip thread contains an additional concern by me, about implicitly allowing accsesing /dev/tty on - inputs, which is not considered here.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 11:22):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 11:22):

bjorn3 created PR review comment:

This introduces new copies of the data, slowing down startup. Previously Module::from_file would mmap the file once and then use this mmap'ed version directly. Now it will mmap it once (or read to a vec and then copy it to a new memory region in MmapVec::from_slice) and then copy it to a new memory region again in Module::new (through MmapVec::from_slice).

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 11:22):

cr0sh updated PR #5035 from run-piped-modules to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 11:28):

cr0sh created PR review comment:

Definitely right. To be honest, I thought directly writing Mmap/MmapVec with looped f.read() is more efficient but feeled like a premature optimization, so chose this implementation for brevity. (The main reason for this is the File::read_to_end is not compatible with MmapVec.)

If startup performance matters for piped inputs to, I'll fix this into like

loop {
    let buf = [0u8; 4<<10];
    match f.read(&mut buf) {
        Ok(0) => break,
        Ok(n) => /* copy `n` bytes to Mmap and increment the offset */ todo!(),
        Err(e) => return Err(e),
    }
}

But again I'm not sure this is worth it.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 11:28):

cr0sh submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 11:29):

cr0sh edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 14:44):

cr0sh updated PR #5035 from run-piped-modules to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 14:54):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 14:54):

bjorn3 created PR review comment:

This will still have an extra copy in case the mmap succeeded.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 23:45):

cr0sh submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 23:45):

cr0sh created PR review comment:

OK I didn't understand what you were saying. :sweat_smile: Now I see.
I'm afraid that this is not able to be trivially fixed, as there's no public API on Module to pass MmapVec directly into a constructor?

from_binary seems to be the 'generic' catch-all for many Module constructors but I don't have an idea what it does...

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 23:45):

cr0sh edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 23:50):

cr0sh updated PR #5035 from run-piped-modules to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 23:52):

cr0sh submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 23:52):

cr0sh created PR review comment:

Nevermind, you said about the case when the mmap suceeded, so trying to mmap and if it succeeds then discarding the mmap and use Module::from_path is OK. Is that you're thinking?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 09 2022 at 23:52):

cr0sh edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 11:43):

bjorn3 created PR review comment:

That may work.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 11:43):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 16:01):

sunfishcode created PR review comment:

The Wasmtime CLI serves as an example of how to use Wasmtime's API, and in general we don't want users of the API depending on internal crates like wasmtime-runtime. Would it be possible to factor out this code for opening and loading a module either from deserialized form or a plain module into a utility function in the wasmtime crate, so that the CLI code can call that rather than doing this work itself?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 16:01):

sunfishcode created PR review comment:

Arbitrary limitations and environment variables are something we'd like to avoid if possible. The users of mmap_vec don't care if they have an MMapVec or a plain Vec<u8>, so in the case where the mmap fails, would it work to just call std::fs::read to read the file? That would require reorganizing this code to avoid using MmapVec as the type to unify the two options. Something like this:

    let mmap_vec;
    let read;
    let bytes: &[u8] = if let Ok(mapped) = MmapVec::from_file(Path::new(path)) {
        mmap_vec = mapped;
        mmap_vec.as_ref()
    } else {
        read = std::fs::read("hello.txt")?;
        read.as_ref()
    };

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 16:01):

sunfishcode submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 16:01):

sunfishcode submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 10 2022 at 16:05):

sunfishcode edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 12:00):

cr0sh created PR review comment:

Yeah I misinterpreted @bjorn3 's first review so I'll roll back into previous implementation which just uses Read::read_to_end and Vec<u8>, and prevent redundant allocation when a module is from a regular file. This may be sub-optimal for irregular file case but I think it's OK because no one wants to do hyperfine 'cat foo.wasm | wasmtime -'?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 12:00):

cr0sh submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 12:01):

cr0sh created PR review comment:

This was only needed for MmapVec so I'll remove on the next commit. (see below.)

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2022 at 12:01):

cr0sh submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 15:22):

cr0sh submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 15:22):

cr0sh created PR review comment:

After re-reading my previous commits and review comments here, I realized this reply is not possible at all. MmapVec cannot be avoided because load_module should do both "header check" and "copyless mmap" in one path. I think your suggestion is truly valid but please understand that I don't know much about this project (yet) to implement it rapidly so I would come back with results after some time with some investigation and tryouts.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 15:23):

cr0sh edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 16:57):

sunfishcode submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 16:57):

sunfishcode created PR review comment:

Sounds good. And thanks for working on this! I agree that running modules from stdin is an interesting idea.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 23 2022 at 14:28):

cr0sh updated PR #5035 from run-piped-modules to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 23 2022 at 14:34):

cr0sh edited PR #5035 from run-piped-modules to main:

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

This is not discussed in the issue tracker, but on this Zulip thread.

Previously load_module opened a file on path to check if the module is precompiled, then closes before constructing the Module from it. This is fine on normal files, but on piped files(like /dev/stdin), is problematic as the header read is not visible on the next open.

So, this set of commits inserts a new procedure, before pre-opening to read the header, to detect if the file pointed by the path is a pipe(is_fifo of FileTypeExt), then directly loads the module from the path without checking if the modules is precompiled.

This change is unix-only, and intentionally does not support loading precompiled modules from pipes, because it should not be used that way. An additional warning message is printed in such situations to prevent users from being confused.

The Zulip thread contains an additional concern by me, about implicitly allowing accsesing /dev/tty on - inputs, which is not considered here.

*Edited to describe the current procedure to detect piped files.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2022 at 12:49):

cr0sh updated PR #5035 from run-piped-modules to main.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2022 at 12:55):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2022 at 12:55):

bjorn3 created PR review comment:

        #[cfg(unix)]
        {

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2022 at 13:00):

cr0sh submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2022 at 13:00):

cr0sh created PR review comment:

What a genius! :rolling_on_the_floor_laughing:

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2022 at 13:01):

cr0sh updated PR #5035 from run-piped-modules to main.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 29 2022 at 16:09):

cr0sh updated PR #5035 from run-piped-modules to main.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 29 2022 at 16:18):

cr0sh updated PR #5035 from run-piped-modules to main.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 29 2022 at 16:23):

cr0sh updated PR #5035 from run-piped-modules to main.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 29 2022 at 16:24):

cr0sh closed without merge PR #5035.


Last updated: Nov 22 2024 at 16:03 UTC