cr0sh edited PR #5035 from run-piped-modules
to main
:
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[v] This has been discussed in issue #..., or if not, please tell us why
here.[v] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[v] This PR contains test cases, if meaningful.
- [v] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->This is not discussed in the issue tracker, but on this Zulip thread.
Previously load_module opened a file on
path
to check if the module is precompiled, then closes before constructing the Module from it. This is fine on normal files, but on piped files(like /dev/stdin), is problematic as the header read is not visible on the next open.So, this set of commits change the procedure to 1) try to mmap the whole file from the
path
. 2) If it fails(implying the file is something special), read the whole file into the buffer so we can safely read the header and reuse the buffer again on the subsequent Module::new.The Zulip thread contains an additional concern by me, about implicitly allowing accsesing
/dev/tty
on-
inputs, which is not considered here.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
This introduces new copies of the data, slowing down startup. Previously
Module::from_file
would mmap the file once and then use this mmap'ed version directly. Now it will mmap it once (or read to a vec and then copy it to a new memory region inMmapVec::from_slice
) and then copy it to a new memory region again inModule::new
(throughMmapVec::from_slice
).
cr0sh updated PR #5035 from run-piped-modules
to main
.
cr0sh created PR review comment:
Definitely right. To be honest, I thought directly writing Mmap/MmapVec with looped
f.read()
is more efficient but feeled like a premature optimization, so chose this implementation for brevity. (The main reason for this is the File::read_to_end is not compatible with MmapVec.)If startup performance matters for piped inputs to, I'll fix this into like
loop { let buf = [0u8; 4<<10]; match f.read(&mut buf) { Ok(0) => break, Ok(n) => /* copy `n` bytes to Mmap and increment the offset */ todo!(), Err(e) => return Err(e), } }
But again I'm not sure this is worth it.
cr0sh submitted PR review.
cr0sh edited PR review comment.
cr0sh updated PR #5035 from run-piped-modules
to main
.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
This will still have an extra copy in case the mmap succeeded.
cr0sh submitted PR review.
cr0sh created PR review comment:
OK I didn't understand what you were saying. :sweat_smile: Now I see.
I'm afraid that this is not able to be trivially fixed, as there's no public API onModule
to pass MmapVec directly into a constructor?
from_binary
seems to be the 'generic' catch-all for many Module constructors but I don't have an idea what it does...
cr0sh edited PR review comment.
cr0sh updated PR #5035 from run-piped-modules
to main
.
cr0sh submitted PR review.
cr0sh created PR review comment:
Nevermind, you said about the case when the mmap suceeded, so trying to mmap and if it succeeds then discarding the mmap and use Module::from_path is OK. Is that you're thinking?
cr0sh edited PR review comment.
bjorn3 created PR review comment:
That may work.
bjorn3 submitted PR review.
sunfishcode created PR review comment:
The Wasmtime CLI serves as an example of how to use Wasmtime's API, and in general we don't want users of the API depending on internal crates like wasmtime-runtime. Would it be possible to factor out this code for opening and loading a module either from deserialized form or a plain module into a utility function in the wasmtime crate, so that the CLI code can call that rather than doing this work itself?
sunfishcode created PR review comment:
Arbitrary limitations and environment variables are something we'd like to avoid if possible. The users of
mmap_vec
don't care if they have anMMapVec
or a plainVec<u8>
, so in the case where the mmap fails, would it work to just callstd::fs::read
to read the file? That would require reorganizing this code to avoid usingMmapVec
as the type to unify the two options. Something like this:let mmap_vec; let read; let bytes: &[u8] = if let Ok(mapped) = MmapVec::from_file(Path::new(path)) { mmap_vec = mapped; mmap_vec.as_ref() } else { read = std::fs::read("hello.txt")?; read.as_ref() };
sunfishcode submitted PR review.
sunfishcode submitted PR review.
sunfishcode edited PR review comment.
cr0sh created PR review comment:
Yeah I misinterpreted @bjorn3 's first review so I'll roll back into previous implementation which just uses
Read::read_to_end
andVec<u8>
, and prevent redundant allocation when a module is from a regular file. This may be sub-optimal for irregular file case but I think it's OK because no one wants to dohyperfine 'cat foo.wasm | wasmtime -'
?
cr0sh submitted PR review.
cr0sh created PR review comment:
This was only needed for MmapVec so I'll remove on the next commit. (see below.)
cr0sh submitted PR review.
cr0sh submitted PR review.
cr0sh created PR review comment:
After re-reading my previous commits and review comments here, I realized this reply is not possible at all. MmapVec cannot be avoided because
load_module
should do both "header check" and "copyless mmap" in one path. I think your suggestion is truly valid but please understand that I don't know much about this project (yet) to implement it rapidly so I would come back with results after some time with some investigation and tryouts.
cr0sh edited PR review comment.
sunfishcode submitted PR review.
sunfishcode created PR review comment:
Sounds good. And thanks for working on this! I agree that running modules from stdin is an interesting idea.
cr0sh updated PR #5035 from run-piped-modules
to main
.
cr0sh edited PR #5035 from run-piped-modules
to main
:
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[v] This has been discussed in issue #..., or if not, please tell us why
here.[v] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[v] This PR contains test cases, if meaningful.
- [v] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->This is not discussed in the issue tracker, but on this Zulip thread.
Previously load_module opened a file on
path
to check if the module is precompiled, then closes before constructing the Module from it. This is fine on normal files, but on piped files(like /dev/stdin), is problematic as the header read is not visible on the next open.So, this set of commits inserts a new procedure, before pre-opening to read the header, to detect if the file pointed by the
path
is a pipe(is_fifo
of FileTypeExt), then directly loads the module from the path without checking if the modules is precompiled.This change is unix-only, and intentionally does not support loading precompiled modules from pipes, because it should not be used that way. An additional warning message is printed in such situations to prevent users from being confused.
The Zulip thread contains an additional concern by me, about implicitly allowing accsesing
/dev/tty
on-
inputs, which is not considered here.*Edited to describe the current procedure to detect piped files.
cr0sh updated PR #5035 from run-piped-modules
to main
.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
#[cfg(unix)] {
cr0sh submitted PR review.
cr0sh created PR review comment:
What a genius! :rolling_on_the_floor_laughing:
cr0sh updated PR #5035 from run-piped-modules
to main
.
cr0sh updated PR #5035 from run-piped-modules
to main
.
cr0sh updated PR #5035 from run-piped-modules
to main
.
cr0sh updated PR #5035 from run-piped-modules
to main
.
cr0sh closed without merge PR #5035.
Last updated: Jan 24 2025 at 00:11 UTC