Hiya! So I know next to nothing about how AI tech works, but I'm expected to understand the intersection between WASI and AI well enough to be able to explain it at a high level to my fellow colleagues.
Something I'm not really clear about is the limits of the WASI-NN interface. I know right now it's mostly intended for (smaller?) inference-based workloads. But I'm wondering what the theoretical limits there are. If you had a powerful enough backend, could you use it to for large language models too? Or is that a different kind of workload that would require a different kind of interface?
(I might be confusing terms here, I'm very sorry if I have. Like I said: I don't know much about this space, and how different things relate to each other.)
cc @Andrew Brown
(also I'll point out there's #wasi-nn which I can't move there myself but others might be able to)
Hey @Yoshua Wuyts wasi-nn is great for "utility inferencing" because you pass the model into the function but that's a caching/memory size issue, clearly. There's a "named model" proposal that enables you to consume a handle to the larger models to enable both caching and guest modules can still be "simple." ish. Andrew can provide pointers to it
Ohh, I see
So to try and explain it back to you (to make sure I understand): large models are so big that you can't just pass them as arguments into functions - you need some other way to target them (like by using a name). wasi-nn currently requires that you pass the model into the function as well, so that's why it's best served for smaller models right now
Is that... roughly right?
at a high level, yeah -- no one is going to re-pass substantial bytes each call. It's engineering insanity. So the named models thing is the way to have the models in a local cache but used from the module.
Okay yeah that makes sense! Thank you!
only other thing I'll say is that as wasi-nn sorta does both now means no one's totes happy with it -- so this area will likely as not still go through some evolution, but this works "for now". We need to pound on it a bit.
Yeah that makes sense. This all sounds like wasi-nn
has a forward path to supporting more complex / intensive models in the future as well (via "named models"). Rather than that being completely out of scope or being categorically different in some way.
It feels like the answer is: "we'll figure out a way to support that eventually", and already have a sense for what it would take.
With inference models being the niche we're initially starting off with (which makes sense to me!)
named models will support that today (I think that has been merged, but Andrew will know). The larger forward motion is whether we should revamp and/or separate wasi-nn into different capabilities. So it'll work for both cases for now; evolution is another issue.
Ahhh, okay! I see!
Thank you!
Hi Yoshua, did not see this thread earlier. In theory, the current spec of wasi-nn supports all neural network models including LLM, as eventaully the inputs to those models are in the tensor form of [batch, sequence, features]. However, usability may be an issue depending on your use cases. Some implementations of wasi-nn provide additional SDK utilites for helping with data preprocessing, such as converting images, natural languages into tensors. As for the size of models, this needs to be done only once, either by caching or named models on the server.
Last updated: Nov 22 2024 at 16:03 UTC