geekbeast opened PR #6867 from geekbeast:feature/kserve
to bytecodealliance:main
:
This implements a kserve backend allowing forwarding of wasi-nn calls over http to servers implementing the kserve protocol (documented here https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md).
This makes it easy to offload evaluation of inference workloads through async methods to an external service that can support all the various frameworks. Inference workloads tend to be resource heavy and the most popular frameworks have large security attack surfaces. Being able to control when and where they run will make it easier for people to safely use wasmtime with wasi-nn without having to parse and execute model on arbitrary inputs in process.
This a draft PR and still needs a few more items:
- Implementing a kserve registry
- Wiring the kserve implementation of backend trait to kserve client
geekbeast updated PR #6867.
geekbeast edited PR #6867:
This implements a kserve backend allowing forwarding of wasi-nn calls over http to servers implementing the kserve protocol (documented here https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md). It also makes certain API calls that are expected to be expensive, async.
This makes it easy to offload evaluation of inference workloads through async methods to an external service that can support all the various frameworks. Inference workloads tend to be resource heavy and the most popular frameworks have large security attack surfaces. Being able to control when and where they run will make it easier for people to safely use wasmtime with wasi-nn without having to parse and execute model on arbitrary inputs in process.
This a draft PR and still needs a few more items:
- Implementing a kserve registry
- Wiring the kserve implementation of backend trait to kserve client
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast has marked PR #6867 as ready for review.
geekbeast requested pchickey for a review on PR #6867.
geekbeast requested wasmtime-default-reviewers for a review on PR #6867.
geekbeast requested wasmtime-core-reviewers for a review on PR #6867.
pchickey requested abrown for a review on PR #6867.
abrown submitted PR review:
Before I do a closer review, here might be some things to look at:
- you'll probably want to go through and clean up the conflicts, TODOs, commented-out code, etc., so I don't have to wade through that
- you can split this into separate PRs to parallelize this if you want, e.g., I'm thinking of the "make wasi-nn host-
async
" or "improve the error handling parts" (I mean, you don't have to but if you want to move some stuff forward faster it might help)- I'm not sure what is going on with the tensor bytes type: that hasn't merged in the specification yet but I see a Git submodule update? What is going on with that?
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast edited PR #6867:
This implements a kserve backend allowing forwarding of wasi-nn calls over http to servers implementing the kserve protocol (documented here https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md). It also makes certain API calls that are expected to be expensive, async.
This makes it easy to offload evaluation of inference workloads through async methods to an external service that can support all the various frameworks. Inference workloads tend to be resource heavy and the most popular frameworks have large security attack surfaces. Being able to control when and where they run will make it easier for people to safely use wasmtime with wasi-nn without having to parse and execute model on arbitrary inputs in process.
This PR also implements a kserve registry so that models can be loaded via load named models and adds some better error reporting within what is possible in the current framework.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
github-merge-queue[bot] updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
geekbeast updated PR #6867.
pchickey submitted PR review:
I don't have the context on wasi-nn and kserve to review this comprehensively but from the Rust prespective there are some idiomatic suggestions I provided, and in general this needs a pass to replace all uses of
expect
outside of #[test] with error handling, as well as cleaning out commented out code from earlier stages in the development process.
pchickey submitted PR review:
I don't have the context on wasi-nn and kserve to review this comprehensively but from the Rust prespective there are some idiomatic suggestions I provided, and in general this needs a pass to replace all uses of
expect
outside of #[test] with error handling, as well as cleaning out commented out code from earlier stages in the development process.
pchickey created PR review comment:
delete commented out code
pchickey created PR review comment:
for these debugging strings, better to use
tracing::error!
over eprintln
pchickey created PR review comment:
delete
pchickey created PR review comment:
this comment doesn't make sense, maybe its dead code?
pchickey created PR review comment:
commented out code, not clear to me whether this still needs to be implemented as part of this PR?
pchickey created PR review comment:
Instead of this helper function, use a
std::io::Cursor
and then thebyteorder::WriteBytesExt
trait at each of the call sites.
pchickey created PR review comment:
not clear what these commented out members are about
pchickey created PR review comment:
this code panics whenever the server url is malformed or the server doesn't respond correctly, please change it to return errors instead and handle those errors appropriately where this is invoked.
pchickey created PR review comment:
tracing::error! for this failure
pchickey created PR review comment:
tracing::debug! here
pchickey created PR review comment:
commented out code should be deleted?
pchickey created PR review comment:
? instead of expect
abrown submitted PR review:
I am in favor of merging this PR. Obviously there are a bunch of Rust-level programming issues to clean up before we do that (like @pchickey points out) but, at the higher wasi-nn level, this mostly makes sense. Maybe if we resolve most of that here, we can do the following in follow up PRs:
- testing: I'm a bit worried about the testing story. From what I see, some of these tests depend on
http://localhost:8000
, a real kserve implementation, but I don't see that being set up in CI--how are those tests passing? At some point this new functionality needs to be validated during CI so that we immediately know if we've broken something by refactoring- documentation: eventually, we should be able to look at the module documentation and get a general idea of what the kserve backend is trying to do ("it connects to ... then sends a GET request to do ..."). Also, explaining why several of APIs _must_ become
async
would be helpful laterOtherwise, I think the general idea of plugging in a new backend is a good one and validates the refactoring to these
Box<dyn ...>
wrappers. Let's just clean up the extra code here that @pchickey notes and we can move on from there; @geekbeast, let me know if you want to pair up to finish this.
mtr-fastly updated PR #6867.
geekbeast updated PR #6867.
Last updated: Jan 24 2025 at 00:11 UTC