Named models in the Wasmtime CLI · wasi-nn

I intend to merge a change to wasi-nn to add "named models," which adds a way to refer to ML models by name instead of having to pass in all of the bytes (see @Matthew Tamayo-Rios's PR here: https://github.com/WebAssembly/wasi-nn/pull/38). I'm looking for input on whether this functionality should be bubbled up to the Wasmtime CLI flags.

Support loading models by name by geekbeast · Pull Request #38 · WebAssembly/wasi-nn

This change has been scoped down to only adding support for getting a handle to a named model for inference and updating the WIT bindings to the latest spec.

Andrew Brown (Aug 09 2023 at 17:02):

The way I'm thinking about it, it would look and feel approximately like preopened directories. Maybe the user passes in --preloaded-models=<format>:<directory> and the wasmtimewasi-nn crate figures out the necessary bits to load those models and make them available with the directory name. Maybe we add a :<name> suffix on there so users have more control of the name.

Andrew Brown (Aug 09 2023 at 17:04):

My rationale for doing this in the CLI is to allow experimentation with this new API. This would also be available as a programmatic API on the the wasi-nn host object (e.g., during construction) for embedders to use.

Andrew Brown (Aug 09 2023 at 17:05):

Let me know any feedback before I go off and implement a bunch of stuff! cc: @Alex Crichton, @Dan Gohman, @fitzgen (he/him)

Alex Crichton (Aug 09 2023 at 17:20):

I dunno much about wasi-nn so I can't really comment on whether this seems right or not, but as for having a CLI option for wasi-nn that seems reasonable since we have a bunch for wasi-common

Angel M (Aug 22 2023 at 07:28):

This would be amazing. I started integrating WASI-NN to Wasm Workers Server and I see named models as a great addition to the project. Could we help on this?

feat: add WASI-NN bindings to Wasm Workers Server by Angelmmiguel · Pull Request #201 · vmware-labs/wasm-workers-server

Introduce the new WASI-NN bindings to run Machine Learning (ML) inference in workers. It includes a new configuration parameter (features.wasi_nn) that allows you to set the allowed ML backends for...

Andrew Brown (Aug 22 2023 at 16:32):

wasi-nn: add named models by abrown · Pull Request #6854 · bytecodealliance/wasmtime

This implements named models in Wasmtime; see the commit messages for more details.

Andrew Brown (Aug 22 2023 at 16:33):

I had been tagging @Pat Hickey with that sequence of PRs but he's probably busy with other stuff; @Alex Crichton, @Dan Gohman... do either of you want to take a look?

Andrew Brown (Aug 22 2023 at 16:38):

@Angel M, beyond that, I think the feature should be relatively "done"; I have some additional PRs to make with some cleanups and a testing overhaul for wasi-nn and @Matthew Tamayo-Rios has an additional backend to add in https://github.com/bytecodealliance/wasmtime/pull/6867. If you're interested in helping out, though, there are many little things that could be "made nicer" that you will see as you use wasi-nn--PRs for any of those would be appreciated. Also, if you're interested in adding new backends that would be helpful!

Add kserve backend implementation for wasi-nn by geekbeast · Pull Request #6867 · bytecodealliance/wasmtime

This implements a kserve backend allowing forwarding of wasi-nn calls over http to servers implementing the kserve protocol (documented here https://github.com/kserve/kserve/blob/master/docs/predic...

Angel M (Aug 23 2023 at 06:37):

Angel M (Aug 23 2023 at 06:43):

Amazing! First thing, I will give it a try and test the named-models feature. This will allow wws to configure a set of predefined models per worker / function. Regarding the wasi-nn-PRs, is there any specific tag for those issues? Those little improvements seem to be a great way to get involved in the project :)

Definitely adding new backends is something we have in mind. Tensorflowlite or Pytorch could be great additions. I know there are some security concerns related to Tensorflow. Not sure about your take on that one.

Angel M (Aug 23 2023 at 06:46):

@Andrew Brown another topic that I have in mind are LLMs and dynamic input / outputs. I'm still very new to AI / ML, so maybe you already figure out how to work with LLMs and WASI-NN. However, I couldn't find any example about those. Let me open a separate conversation about this so we can close this topic.

Angel M (Aug 23 2023 at 06:46):

Andrew Brown (Aug 23 2023 at 17:19):

Well, I haven't yet created issues for TODO work; I'm still in the middle of things so it's hard to see what should be fixed immediately and what I should postpone for later. I'll let you know once I have a bit more clarity on that. (e.g., see refactorings like https://github.com/bytecodealliance/wasmtime/pull/6893)

wasi-nn: remove `BackendKind`, add wrapper `struct`s by abrown · Pull Request #6893 · bytecodealliance/wasmtime

One improvement that came from discussions with @geekbeast is that BackendKind, the enum used for differentiating between ML implementation, is no longer necessary. Instead, we can use the generate...

Andrew Brown (Aug 23 2023 at 17:22):

Since TF accepts operators that can read/write files, do network I/O, etc., it seems like it would just open Wasmtime up to attacks. Some way of mitigating that would need to get figured out before moving https://github.com/bytecodealliance/wasmtime/pull/3977 forward. I haven't looked too closely at PyTorch yet.

Adding the TensorFlow backend to wasi-nn by brianjjones · Pull Request #3977 · bytecodealliance/wasmtime

Users will now be able to use either OpenVino or Tensorflow for their backend.

Andrew Brown (Aug 23 2023 at 17:23):

re: LLMs, maybe @Matthew Tamayo-Rios can comment further. He is planning to demo something at WasmCon and may have more comments about that.

Andrew Brown (Aug 23 2023 at 17:51):

@Angel M, I would say that any issues you find by using wasi-nn are extremely valuable. It is quite difficult to exhaustively test a thing like "ML" so and eventually we need to have a better testing strategy.

Angel M (Aug 23 2023 at 18:35):

Thank you @Andrew Brown for the background on the different tasks and refactors. I will take a look at the different open issues to start getting familiar with the codebase.

Angel M (Aug 23 2023 at 18:38):

Regarding WasmCon, that's amazing! Do you plan to attend @Andrew Brown ? I will give two talks, so I'll be there for sure hehe. Would be great to chat with you both :big_smile:

Andrew Brown (Aug 23 2023 at 22:50):

Angel M (Aug 25 2023 at 14:29):

Amazing! We plan to work more and more on Wasm + AI, so it would be great to chat with you and learn more :)

Matthew Tamayo-Rios (Aug 28 2023 at 07:01):

I will also be there and will be showing off a prompt based stable diffusion demo running through WASI-NN on top of fastly's compute@edge infrastructure.

Matthew Tamayo-Rios (Aug 28 2023 at 07:01):

One thing to keep in mind as we consider additional backends is that we need to figure out a better testing story for said backends since most backends require loading the relevant library (openvino, libtorch, etc) in order to run.

Matthew Tamayo-Rios (Aug 28 2023 at 07:04):

The other big issue for TensorFlow is that the SavedModel format expects to be able to read from a directory on disk. There's an older h5 format that is more limited in functionality that can be read from bytes, but it one more challenge to using tensorflow models in a dynamic environment. There's some work arounds, but I'm not sure they work outside of python (i.e you can implement a virtual drive)

Matthew Tamayo-Rios (Aug 28 2023 at 07:05):

PyTorch also has a lot of outstanding security issues at the moment for untrusted models, because they use the pickle storage format.

Stream: wasi-nn

Topic: Named models in the Wasmtime CLI

Andrew Brown (Aug 09 2023 at 16:58):

Andrew Brown (Aug 09 2023 at 17:02):

Andrew Brown (Aug 09 2023 at 17:04):

Andrew Brown (Aug 09 2023 at 17:05):

Alex Crichton (Aug 09 2023 at 17:20):

Angel M (Aug 22 2023 at 07:28):

Andrew Brown (Aug 22 2023 at 16:32):

Andrew Brown (Aug 22 2023 at 16:33):

Andrew Brown (Aug 22 2023 at 16:38):

Angel M (Aug 23 2023 at 06:37):

Angel M (Aug 23 2023 at 06:43):

Angel M (Aug 23 2023 at 06:46):

Angel M (Aug 23 2023 at 06:46):

Andrew Brown (Aug 23 2023 at 17:19):

Andrew Brown (Aug 23 2023 at 17:22):

Andrew Brown (Aug 23 2023 at 17:23):

Andrew Brown (Aug 23 2023 at 17:51):

Angel M (Aug 23 2023 at 18:35):

Angel M (Aug 23 2023 at 18:38):

Andrew Brown (Aug 23 2023 at 22:50):

Angel M (Aug 25 2023 at 14:29):

Matthew Tamayo-Rios (Aug 28 2023 at 07:01):

Matthew Tamayo-Rios (Aug 28 2023 at 07:01):

Matthew Tamayo-Rios (Aug 28 2023 at 07:04):

Matthew Tamayo-Rios (Aug 28 2023 at 07:05):

Matthew Tamayo-Rios (Aug 28 2023 at 07:05):