wasmtime / PR #13256 Fix slow WASI stdin reads by passing... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #13256 Fix slow WASI stdin reads by passing...

Wasmtime GitHub notifications bot (May 02 2026 at 15:08):

hiddenbit opened PR #13256 from hiddenbit:pass-stdin-size-hint to bytecodealliance:main:

While working on a program that reads a large amount of data through stdin, I was surprised by slow stdin throughput under wasmtime. Piping 1 GiB into a simple WASI program that reads in 64 KiB chunks took about 38 seconds (~28 MiB/s). I expected something in the gigabytes-per-second order of magnitude.

To demonstrate the issue, below is a minimal WASI program that reads stdin in configurable chunks.

Piping 1 GiB of zeroes and attempting to read in 65536 bytes chunks:
# Before the changes in this PR: takes ~38s -> ~28 MiB/s
time dd if=/dev/zero bs=65536 count=16384 2>/dev/null | /path/to/wasmtime stdio_bench.wasm 65536

# With the changes in this PR: takes ~1.2s -> ~900 MiB/s
time dd if=/dev/zero bs=65536 count=16384 2>/dev/null | /path/to/wasmtime stdio_bench.wasm 65536
Benchmark application code:
// build with: cargo build --release --target wasm32-wasip1
use std::env;
use std::io::{self, Read};
use std::process;

fn main() {
    let chunk_size: usize = env::args()
        .nth(1)
        .and_then(|s| s.parse().ok())
        .unwrap_or_else(|| {
            eprintln!("Usage: stdin_bench <chunk-size-bytes>");
            process::exit(1);
        });

    let stdin = io::stdin();
    let mut handle = stdin.lock();
    let mut buf = vec![0u8; chunk_size];
    let mut total: u64 = 0;

    loop {
        match handle.read(&mut buf) {
            Ok(0) => break,
            Ok(n) => total += n as u64,
            Err(e) => {
                eprintln!("read error: {e}");
                process::exit(1);
            }
        }
    }

    eprintln!("{total} bytes read");
}
Root cause

Regardless of how many bytes were requested (e.g. 65536) the worker thread would read at most 1024 bytes (hard-coded number) per round-trip, which results in slow performance.
See worker_thread_stdin.rs:108.

Fix

StdinState::ReadRequested now has a usize size hint from the caller. The worker thread uses this hint to size its read buffer, clamped to [1024, MAX_READ_SIZE_ALLOC] to avoid guest-controlled unbounded allocation while still enabling efficient bulk reads.

Callers now store a size hint when transitioning to ReadRequested; Pollable::ready uses MAX_READ_SIZE_ALLOC because it does not know the size of the following read.

When reading 1 GiB in 64 KiB chunks, I now get:

Before After Speedup

~28 MiB/s ~900 MiB/s 32x

Before	After	Speedup
~28 MiB/s	~900 MiB/s	32x

Wasmtime GitHub notifications bot (May 02 2026 at 15:08):

hiddenbit requested wasmtime-wasi-reviewers for a review on PR #13256.

Wasmtime GitHub notifications bot (May 02 2026 at 17:59):

github-actions[bot] added the label wasi on PR #13256.

Wasmtime GitHub notifications bot (May 04 2026 at 16:07):

:thumbs_up: alexcrichton submitted PR review:

Thanks! One small comment but otherwise looks good to me :+1:

Wasmtime GitHub notifications bot (May 04 2026 at 16:07):

:thumbs_up: alexcrichton submitted PR review:

Thanks! One small comment but otherwise looks good to me :+1:

Wasmtime GitHub notifications bot (May 04 2026 at 16:08):

:thumbs_up: alexcrichton submitted PR review:

Thanks! One small comment but otherwise looks good to me :+1:

Wasmtime GitHub notifications bot (May 04 2026 at 17:18):

hiddenbit updated PR #13256.

Wasmtime GitHub notifications bot (May 04 2026 at 17:22):

hiddenbit commented on PR #13256:

Thanks for the suggestion! You're right that .max(1024) is a holdover from the old hardcoded buffer size and doesn't belong here conceptually.

However, removing max(...) entirely introduces a subtle issue: A WASI guest can call read(0). If that propagates to the worker thread, it allocates BytesMut::zeroed(0) and calls stdin().read(&mut []), which returns Ok(0). The worker then interprets Ok(0) as EOF and transitions to StdinState::Closed, which permanently closes stdin. A subsequent read(42) would fail with StreamError::Closed. (If I didn't miss anything :sweat_smile:)

I've addressed it like this:

Short-circuit at the call site: InputStream::read now returns Ok(Bytes::new()) immediately when size == 0, avoiding an unnecessary worker wake-up entirely.

Safety floor in the worker: Replaced .max(1024) with .max(1) so that even if a zero somehow reaches the worker (e.g. from a future code path), it can never be misinterpreted as EOF.

This removes the arbitrary 1024 floor while guarding against the edge case.

Wasmtime GitHub notifications bot (May 04 2026 at 17:23):

hiddenbit edited PR #13256:

While working on a program that reads a large amount of data through stdin, I was surprised by slow stdin throughput under wasmtime. Piping 1 GiB into a simple WASI program that reads in 64 KiB chunks took about 38 seconds (~28 MiB/s). I expected something in the gigabytes-per-second order of magnitude.

To demonstrate the issue, below is a minimal WASI program that reads stdin in configurable chunks.

Piping 1 GiB of zeroes and attempting to read in 65536 bytes chunks:
# Before the changes in this PR: takes ~38s -> ~28 MiB/s
time dd if=/dev/zero bs=65536 count=16384 2>/dev/null | /path/to/wasmtime stdio_bench.wasm 65536

# With the changes in this PR: takes ~1.2s -> ~900 MiB/s
time dd if=/dev/zero bs=65536 count=16384 2>/dev/null | /path/to/wasmtime stdio_bench.wasm 65536
Benchmark application code:
// build with: cargo build --release --target wasm32-wasip1
use std::env;
use std::io::{self, Read};
use std::process;

fn main() {
    let chunk_size: usize = env::args()
        .nth(1)
        .and_then(|s| s.parse().ok())
        .unwrap_or_else(|| {
            eprintln!("Usage: stdin_bench <chunk-size-bytes>");
            process::exit(1);
        });

    let stdin = io::stdin();
    let mut handle = stdin.lock();
    let mut buf = vec![0u8; chunk_size];
    let mut total: u64 = 0;

    loop {
        match handle.read(&mut buf) {
            Ok(0) => break,
            Ok(n) => total += n as u64,
            Err(e) => {
                eprintln!("read error: {e}");
                process::exit(1);
            }
        }
    }

    eprintln!("{total} bytes read");
}
Root cause

Regardless of how many bytes were requested (e.g. 65536) the worker thread would read at most 1024 bytes (hard-coded number) per round-trip, which results in slow performance.
See worker_thread_stdin.rs:108.

Fix

StdinState::ReadRequested now has a usize size hint from the caller. The worker thread uses this hint to size its read buffer, clamped to [1, MAX_READ_SIZE_ALLOC] to avoid guest-controlled unbounded allocation while still enabling efficient bulk reads.

Callers now store a size hint when transitioning to ReadRequested; Pollable::ready uses MAX_READ_SIZE_ALLOC because it does not know the size of the following read.

When reading 1 GiB in 64 KiB chunks, I now get:

Before After Speedup

~28 MiB/s ~900 MiB/s 32x

Before	After	Speedup
~28 MiB/s	~900 MiB/s	32x

Wasmtime GitHub notifications bot (May 04 2026 at 17:44):

alexcrichton added PR #13256 Fix slow WASI stdin reads by passing size hint to worker thread to the merge queue.

Wasmtime GitHub notifications bot (May 04 2026 at 17:44):

alexcrichton commented on PR #13256:

Thanks!

Wasmtime GitHub notifications bot (May 04 2026 at 18:18):

:check: alexcrichton merged PR #13256.

Wasmtime GitHub notifications bot (May 04 2026 at 18:19):

alexcrichton removed PR #13256 Fix slow WASI stdin reads by passing size hint to worker thread from the merge queue.

Last updated: Jul 29 2026 at 05:03 UTC