hiddenbit opened PR #13256 from hiddenbit:pass-stdin-size-hint to bytecodealliance:main:
While working on a program that reads a large amount of data through stdin, I was surprised by slow stdin throughput under wasmtime. Piping 1 GiB into a simple WASI program that reads in 64 KiB chunks took about 38 seconds (~28 MiB/s). I expected something in the gigabytes-per-second order of magnitude.
To demonstrate the issue, below is a minimal WASI program that reads stdin in configurable chunks.
Piping 1 GiB of zeroes and attempting to read in 65536 bytes chunks:
# Before the changes in this PR: takes ~38s -> ~28 MiB/s time dd if=/dev/zero bs=65536 count=16384 2>/dev/null | /path/to/wasmtime stdio_bench.wasm 65536 # With the changes in this PR: takes ~1.2s -> ~900 MiB/s time dd if=/dev/zero bs=65536 count=16384 2>/dev/null | /path/to/wasmtime stdio_bench.wasm 65536Benchmark application code:
// build with: cargo build --release --target wasm32-wasip1 use std::env; use std::io::{self, Read}; use std::process; fn main() { let chunk_size: usize = env::args() .nth(1) .and_then(|s| s.parse().ok()) .unwrap_or_else(|| { eprintln!("Usage: stdin_bench <chunk-size-bytes>"); process::exit(1); }); let stdin = io::stdin(); let mut handle = stdin.lock(); let mut buf = vec![0u8; chunk_size]; let mut total: u64 = 0; loop { match handle.read(&mut buf) { Ok(0) => break, Ok(n) => total += n as u64, Err(e) => { eprintln!("read error: {e}"); process::exit(1); } } } eprintln!("{total} bytes read"); }Root cause
Regardless of how many bytes were requested (e.g. 65536) the worker thread would read at most 1024 bytes (hard-coded number) per round-trip, which results in slow performance.
See worker_thread_stdin.rs:108.Fix
StdinState::ReadRequestednow has ausizesize hint from the caller. The worker thread uses this hint to size its read buffer, clamped to[1024, MAX_READ_SIZE_ALLOC]to avoid guest-controlled unbounded allocation while still enabling efficient bulk reads.Callers now store a size hint when transitioning to
ReadRequested;Pollable::readyusesMAX_READ_SIZE_ALLOCbecause it does not know the size of the following read.When reading 1 GiB in 64 KiB chunks, I now get:
Before After Speedup ~28 MiB/s ~900 MiB/s 32x
hiddenbit requested wasmtime-wasi-reviewers for a review on PR #13256.
github-actions[bot] added the label wasi on PR #13256.
Last updated: May 03 2026 at 22:13 UTC