wasi-nn and large language models · general

Stream: general

Topic: wasi-nn and large language models

Yoshua Wuyts (Dec 01 2023 at 11:54):

Hiya! So I know next to nothing about how AI tech works, but I'm expected to understand the intersection between WASI and AI well enough to be able to explain it at a high level to my fellow colleagues.

Something I'm not really clear about is the limits of the WASI-NN interface. I know right now it's mostly intended for (smaller?) inference-based workloads. But I'm wondering what the theoretical limits there are. If you had a powerful enough backend, could you use it to for large language models too? Or is that a different kind of workload that would require a different kind of interface?

Yoshua Wuyts (Dec 01 2023 at 11:55):

(I might be confusing terms here, I'm very sorry if I have. Like I said: I don't know much about this space, and how different things relate to each other.)

Alex Crichton (Dec 01 2023 at 15:05):

cc @Andrew Brown

(also I'll point out there's #wasi-nn which I can't move there myself but others might be able to)

Ralph (Dec 01 2023 at 16:26):

Hey @Yoshua Wuyts wasi-nn is great for "utility inferencing" because you pass the model into the function but that's a caching/memory size issue, clearly. There's a "named model" proposal that enables you to consume a handle to the larger models to enable both caching and guest modules can still be "simple." ish. Andrew can provide pointers to it

Yoshua Wuyts (Dec 01 2023 at 16:29):

Ohh, I see

Yoshua Wuyts (Dec 01 2023 at 16:30):

So to try and explain it back to you (to make sure I understand): large models are so big that you can't just pass them as arguments into functions - you need some other way to target them (like by using a name). wasi-nn currently requires that you pass the model into the function as well, so that's why it's best served for smaller models right now

Yoshua Wuyts (Dec 01 2023 at 16:30):

Is that... roughly right?

Ralph (Dec 01 2023 at 16:33):

at a high level, yeah -- no one is going to re-pass substantial bytes each call. It's engineering insanity. So the named models thing is the way to have the models in a local cache but used from the module.

Yoshua Wuyts (Dec 01 2023 at 16:34):

Okay yeah that makes sense! Thank you!

Ralph (Dec 01 2023 at 16:34):

only other thing I'll say is that as wasi-nn sorta does both now means no one's totes happy with it -- so this area will likely as not still go through some evolution, but this works "for now". We need to pound on it a bit.

Yoshua Wuyts (Dec 01 2023 at 16:37):

Yeah that makes sense. This all sounds like wasi-nn has a forward path to supporting more complex / intensive models in the future as well (via "named models"). Rather than that being completely out of scope or being categorically different in some way.

Yoshua Wuyts (Dec 01 2023 at 16:38):

It feels like the answer is: "we'll figure out a way to support that eventually", and already have a sense for what it would take.

Yoshua Wuyts (Dec 01 2023 at 16:38):

With inference models being the niche we're initially starting off with (which makes sense to me!)

Ralph (Dec 01 2023 at 16:40):

named models will support that today (I think that has been merged, but Andrew will know). The larger forward motion is whether we should revamp and/or separate wasi-nn into different capabilities. So it'll work for both cases for now; evolution is another issue.

Yoshua Wuyts (Dec 01 2023 at 16:40):

Ahhh, okay! I see!

Yoshua Wuyts (Dec 01 2023 at 16:40):

Thank you!

Mingqiu Sun (Dec 13 2023 at 01:14):

Hi Yoshua, did not see this thread earlier. In theory, the current spec of wasi-nn supports all neural network models including LLM, as eventaully the inputs to those models are in the tensor form of [batch, sequence, features]. However, usability may be an issue depending on your use cases. Some implementations of wasi-nn provide additional SDK utilites for helping with data preprocessing, such as converting images, natural languages into tensors. As for the size of models, this needs to be done only once, either by caching or named models on the server.

Last updated: Apr 08 2025 at 00:12 UTC