maochenxi added the bug label to Issue #9570.
maochenxi opened issue #9570:
Rust code for loading models
use std::convert::TryInto; use std::fs; use wasi_nn; use rand::Rng; use bytemuck::cast_slice; pub fn main() { let xml = fs::read_to_string("fixture/model.xml").unwrap(); println!("Read graph XML, first 50 characters: {}", &xml[..50]); let weights = fs::read("fixture/model.bin").unwrap(); println!("Read graph weights, size in bytes: {}", weights.len()); let graph = unsafe { wasi_nn::load( &[&xml.into_bytes(), &weights], wasi_nn::GRAPH_ENCODING_OPENVINO, wasi_nn::EXECUTION_TARGET_CPU, ) .unwrap() }; println!("Loaded graph into wasi-nn with ID: {}", graph); let context = unsafe { wasi_nn::init_execution_context(graph).unwrap() }; println!("Created wasi-nn execution context with ID: {}", context); let input_text = "你好,今天的天气怎么样?"; let tokenized_input = tokenize(input_text); let indexed_tokens: Vec<i32> = tokenized_input.iter().map(|&token| token as i32).collect(); let tensor_a = wasi_nn::Tensor { dimensions: &[1, indexed_tokens.len() as u32], r#type: wasi_nn::TENSOR_TYPE_I32, data: bytemuck::cast_slice(&indexed_tokens), }; unsafe { wasi_nn::set_input(context, 0, tensor_a).unwrap(); } unsafe { wasi_nn::compute(context).unwrap(); } println!("Executed graph inference"); let mut output_buffer = vec![0i32; 1]; unsafe { wasi_nn::get_output( context, 0, &mut output_buffer[..] as *mut [i32] as *mut u8, (output_buffer.len() * 4).try_into().unwrap(), ) .unwrap(); } println!("output: {:?}", output_buffer); } fn tokenize(input: &str) -> Vec<i32> { input.chars().map(|c| c as i32).collect() }
Steps to Reproduce
I encountered an OutOfMemory error when trying to load a TinyLlama model (with approximately 4GB of parameters) using the wasi-nn interface in Wasmtime. The model is in OpenVINO format. This is the url of TinyLlama:https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T.
Here is the command I used:
/home/maochenxi/wasm/runtime/wasmtime-v24.0.0-x86_64-linux/wasmtime run -S nn --dir=fixture::fixture target/wasm32-wasip1/release/wasi-nn-example.wasm
Actual Results
However, it throws the following error:
![image](https://github.com/user-attachments/assets/22d16c1f-70dd-4a9f-89b1-2c1d8b219bd1)The error message suggests that the model might be exceeding Wasmtime's memory allocation limits, even though I set max-memory-size to larger momery. Such as:
/home/maochenxi/wasm/runtime/wasmtime-v24.0.0-x86_64-linux/wasmtime run -W max-m emory-size=10240000000 -S nn --dir=fixture::fixture target/wasm32-wasip1/release/wasi-nn-example.wasm
Versions and Environment
Wasmtime version or commit: 24.0.0
Operating system: Archlinux
Questions
- Is there a specific parameter in Wasmtime that can further increase memory allocation or better manage memory for large models?
- Are there any other workarounds or configurations within Wasmtime or wasi-nn that could help with loading models of this size?
bjorn3 commented on issue #9570:
Wasm32 is limited to 4GB of linear memory. Subtract from that static data and the emulated stack and you have less than 4GB of memory you have to fit the weights and all other memory allocations in. Wasm64 allows significantly more memory to be used, but I'm not sure if wasi-nn works with wasm64. You can try compiling for the
wasm64-wasip1
rustc target. Make sure to also pass the right flags to wasmtime to enable the memory64 proposal.
maochenxi commented on issue #9570:
Thank you for your response! I tried wasm64, and it does seem that wasi-nn does not support wasm64. Therefore, I’ll have to try switching to a smaller model.
alexcrichton commented on issue #9570:
Yes unfortunately currently no 64-bit WASI targets exist. I'm not personally aware of any implementation of 64-bit WASI implementation myself.
Otherwise though it's expected to see an OOM as 32-bit linear memories are limited to 4GiB. In that sense this is expected behavior and I'm not sure there's much we can do in Wasmtime about it, so I'm going to close this issue. If you're interested in a 64-bit WASI target it might make sense to open a dedicated tracking issue for that on the WASI repo (or one probably already exists)
alexcrichton closed issue #9570:
Rust code for loading models
use std::convert::TryInto; use std::fs; use wasi_nn; use rand::Rng; use bytemuck::cast_slice; pub fn main() { let xml = fs::read_to_string("fixture/model.xml").unwrap(); println!("Read graph XML, first 50 characters: {}", &xml[..50]); let weights = fs::read("fixture/model.bin").unwrap(); println!("Read graph weights, size in bytes: {}", weights.len()); let graph = unsafe { wasi_nn::load( &[&xml.into_bytes(), &weights], wasi_nn::GRAPH_ENCODING_OPENVINO, wasi_nn::EXECUTION_TARGET_CPU, ) .unwrap() }; println!("Loaded graph into wasi-nn with ID: {}", graph); let context = unsafe { wasi_nn::init_execution_context(graph).unwrap() }; println!("Created wasi-nn execution context with ID: {}", context); let input_text = "你好,今天的天气怎么样?"; let tokenized_input = tokenize(input_text); let indexed_tokens: Vec<i32> = tokenized_input.iter().map(|&token| token as i32).collect(); let tensor_a = wasi_nn::Tensor { dimensions: &[1, indexed_tokens.len() as u32], r#type: wasi_nn::TENSOR_TYPE_I32, data: bytemuck::cast_slice(&indexed_tokens), }; unsafe { wasi_nn::set_input(context, 0, tensor_a).unwrap(); } unsafe { wasi_nn::compute(context).unwrap(); } println!("Executed graph inference"); let mut output_buffer = vec![0i32; 1]; unsafe { wasi_nn::get_output( context, 0, &mut output_buffer[..] as *mut [i32] as *mut u8, (output_buffer.len() * 4).try_into().unwrap(), ) .unwrap(); } println!("output: {:?}", output_buffer); } fn tokenize(input: &str) -> Vec<i32> { input.chars().map(|c| c as i32).collect() }
Steps to Reproduce
I encountered an OutOfMemory error when trying to load a TinyLlama model (with approximately 4GB of parameters) using the wasi-nn interface in Wasmtime. The model is in OpenVINO format. This is the url of TinyLlama:https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T.
Here is the command I used:
/home/maochenxi/wasm/runtime/wasmtime-v24.0.0-x86_64-linux/wasmtime run -S nn --dir=fixture::fixture target/wasm32-wasip1/release/wasi-nn-example.wasm
Actual Results
However, it throws the following error:
![image](https://github.com/user-attachments/assets/22d16c1f-70dd-4a9f-89b1-2c1d8b219bd1)The error message suggests that the model might be exceeding Wasmtime's memory allocation limits, even though I set max-memory-size to larger momery. Such as:
/home/maochenxi/wasm/runtime/wasmtime-v24.0.0-x86_64-linux/wasmtime run -W max-m emory-size=10240000000 -S nn --dir=fixture::fixture target/wasm32-wasip1/release/wasi-nn-example.wasm
Versions and Environment
Wasmtime version or commit: 24.0.0
Operating system: Archlinux
Questions
- Is there a specific parameter in Wasmtime that can further increase memory allocation or better manage memory for large models?
- Are there any other workarounds or configurations within Wasmtime or wasi-nn that could help with loading models of this size?
Last updated: Dec 23 2024 at 12:05 UTC