Stream: git-wasmtime

Topic: wasmtime / issue #9570 OutOfMemory Error when Loading a 4...


view this post on Zulip Wasmtime GitHub notifications bot (Nov 06 2024 at 09:04):

maochenxi added the bug label to Issue #9570.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 06 2024 at 09:04):

maochenxi opened issue #9570:

Rust code for loading models

use std::convert::TryInto;
use std::fs;
use wasi_nn;
use rand::Rng;
use bytemuck::cast_slice;

pub fn main() {
    let xml = fs::read_to_string("fixture/model.xml").unwrap();
    println!("Read graph XML, first 50 characters: {}", &xml[..50]);

    let weights = fs::read("fixture/model.bin").unwrap();
    println!("Read graph weights, size in bytes: {}", weights.len());

    let graph = unsafe {
        wasi_nn::load(
            &[&xml.into_bytes(), &weights],
            wasi_nn::GRAPH_ENCODING_OPENVINO,
            wasi_nn::EXECUTION_TARGET_CPU,
        )
        .unwrap()
    };
    println!("Loaded graph into wasi-nn with ID: {}", graph);

    let context = unsafe { wasi_nn::init_execution_context(graph).unwrap() };
    println!("Created wasi-nn execution context with ID: {}", context);

    let input_text = "你好,今天的天气怎么样?";

    let tokenized_input = tokenize(input_text);
    let indexed_tokens: Vec<i32> = tokenized_input.iter().map(|&token| token as i32).collect();

    let tensor_a = wasi_nn::Tensor {
        dimensions: &[1, indexed_tokens.len() as u32],
        r#type: wasi_nn::TENSOR_TYPE_I32,
        data: bytemuck::cast_slice(&indexed_tokens),
    };

    unsafe {
        wasi_nn::set_input(context, 0, tensor_a).unwrap();
    }

    unsafe {
        wasi_nn::compute(context).unwrap();
    }
    println!("Executed graph inference");

    let mut output_buffer = vec![0i32; 1];
    unsafe {
        wasi_nn::get_output(
            context,
            0,
            &mut output_buffer[..] as *mut [i32] as *mut u8,
            (output_buffer.len() * 4).try_into().unwrap(),
        )
        .unwrap();
    }
    println!("output: {:?}", output_buffer);
}

fn tokenize(input: &str) -> Vec<i32> {
    input.chars().map(|c| c as i32).collect()
}

Steps to Reproduce

I encountered an OutOfMemory error when trying to load a TinyLlama model (with approximately 4GB of parameters) using the wasi-nn interface in Wasmtime. The model is in OpenVINO format. This is the url of TinyLlama:https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T.

Here is the command I used:
/home/maochenxi/wasm/runtime/wasmtime-v24.0.0-x86_64-linux/wasmtime run -S nn --dir=fixture::fixture target/wasm32-wasip1/release/wasi-nn-example.wasm

Actual Results

However, it throws the following error:
![image](https://github.com/user-attachments/assets/22d16c1f-70dd-4a9f-89b1-2c1d8b219bd1)

The error message suggests that the model might be exceeding Wasmtime's memory allocation limits, even though I set max-memory-size to larger momery. Such as:
/home/maochenxi/wasm/runtime/wasmtime-v24.0.0-x86_64-linux/wasmtime run -W max-m emory-size=10240000000 -S nn --dir=fixture::fixture target/wasm32-wasip1/release/wasi-nn-example.wasm

Versions and Environment

Wasmtime version or commit: 24.0.0

Operating system: Archlinux

Questions

  1. Is there a specific parameter in Wasmtime that can further increase memory allocation or better manage memory for large models?
  2. Are there any other workarounds or configurations within Wasmtime or wasi-nn that could help with loading models of this size?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 06 2024 at 09:17):

bjorn3 commented on issue #9570:

Wasm32 is limited to 4GB of linear memory. Subtract from that static data and the emulated stack and you have less than 4GB of memory you have to fit the weights and all other memory allocations in. Wasm64 allows significantly more memory to be used, but I'm not sure if wasi-nn works with wasm64. You can try compiling for the wasm64-wasip1 rustc target. Make sure to also pass the right flags to wasmtime to enable the memory64 proposal.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2024 at 07:44):

maochenxi commented on issue #9570:

Thank you for your response! I tried wasm64, and it does seem that wasi-nn does not support wasm64. Therefore, I’ll have to try switching to a smaller model.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2024 at 15:41):

alexcrichton commented on issue #9570:

Yes unfortunately currently no 64-bit WASI targets exist. I'm not personally aware of any implementation of 64-bit WASI implementation myself.

Otherwise though it's expected to see an OOM as 32-bit linear memories are limited to 4GiB. In that sense this is expected behavior and I'm not sure there's much we can do in Wasmtime about it, so I'm going to close this issue. If you're interested in a 64-bit WASI target it might make sense to open a dedicated tracking issue for that on the WASI repo (or one probably already exists)

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2024 at 15:41):

alexcrichton closed issue #9570:

Rust code for loading models

use std::convert::TryInto;
use std::fs;
use wasi_nn;
use rand::Rng;
use bytemuck::cast_slice;

pub fn main() {
    let xml = fs::read_to_string("fixture/model.xml").unwrap();
    println!("Read graph XML, first 50 characters: {}", &xml[..50]);

    let weights = fs::read("fixture/model.bin").unwrap();
    println!("Read graph weights, size in bytes: {}", weights.len());

    let graph = unsafe {
        wasi_nn::load(
            &[&xml.into_bytes(), &weights],
            wasi_nn::GRAPH_ENCODING_OPENVINO,
            wasi_nn::EXECUTION_TARGET_CPU,
        )
        .unwrap()
    };
    println!("Loaded graph into wasi-nn with ID: {}", graph);

    let context = unsafe { wasi_nn::init_execution_context(graph).unwrap() };
    println!("Created wasi-nn execution context with ID: {}", context);

    let input_text = "你好,今天的天气怎么样?";

    let tokenized_input = tokenize(input_text);
    let indexed_tokens: Vec<i32> = tokenized_input.iter().map(|&token| token as i32).collect();

    let tensor_a = wasi_nn::Tensor {
        dimensions: &[1, indexed_tokens.len() as u32],
        r#type: wasi_nn::TENSOR_TYPE_I32,
        data: bytemuck::cast_slice(&indexed_tokens),
    };

    unsafe {
        wasi_nn::set_input(context, 0, tensor_a).unwrap();
    }

    unsafe {
        wasi_nn::compute(context).unwrap();
    }
    println!("Executed graph inference");

    let mut output_buffer = vec![0i32; 1];
    unsafe {
        wasi_nn::get_output(
            context,
            0,
            &mut output_buffer[..] as *mut [i32] as *mut u8,
            (output_buffer.len() * 4).try_into().unwrap(),
        )
        .unwrap();
    }
    println!("output: {:?}", output_buffer);
}

fn tokenize(input: &str) -> Vec<i32> {
    input.chars().map(|c| c as i32).collect()
}

Steps to Reproduce

I encountered an OutOfMemory error when trying to load a TinyLlama model (with approximately 4GB of parameters) using the wasi-nn interface in Wasmtime. The model is in OpenVINO format. This is the url of TinyLlama:https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T.

Here is the command I used:
/home/maochenxi/wasm/runtime/wasmtime-v24.0.0-x86_64-linux/wasmtime run -S nn --dir=fixture::fixture target/wasm32-wasip1/release/wasi-nn-example.wasm

Actual Results

However, it throws the following error:
![image](https://github.com/user-attachments/assets/22d16c1f-70dd-4a9f-89b1-2c1d8b219bd1)

The error message suggests that the model might be exceeding Wasmtime's memory allocation limits, even though I set max-memory-size to larger momery. Such as:
/home/maochenxi/wasm/runtime/wasmtime-v24.0.0-x86_64-linux/wasmtime run -W max-m emory-size=10240000000 -S nn --dir=fixture::fixture target/wasm32-wasip1/release/wasi-nn-example.wasm

Versions and Environment

Wasmtime version or commit: 24.0.0

Operating system: Archlinux

Questions

  1. Is there a specific parameter in Wasmtime that can further increase memory allocation or better manage memory for large models?
  2. Are there any other workarounds or configurations within Wasmtime or wasi-nn that could help with loading models of this size?

Last updated: Jan 24 2025 at 00:11 UTC