wasmtime / PR #6867 Add kserve backend implementation for... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #6867 Add kserve backend implementation for...

Wasmtime GitHub notifications bot (Aug 21 2023 at 09:53):

geekbeast opened PR #6867 from geekbeast:feature/kserve to bytecodealliance:main:

This implements a kserve backend allowing forwarding of wasi-nn calls over http to servers implementing the kserve protocol (documented here https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md).

This makes it easy to offload evaluation of inference workloads through async methods to an external service that can support all the various frameworks. Inference workloads tend to be resource heavy and the most popular frameworks have large security attack surfaces. Being able to control when and where they run will make it easier for people to safely use wasmtime with wasi-nn without having to parse and execute model on arbitrary inputs in process.

This a draft PR and still needs a few more items:

Implementing a kserve registry

Wiring the kserve implementation of backend trait to kserve client

Wasmtime GitHub notifications bot (Aug 21 2023 at 10:55):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Aug 21 2023 at 20:45):

geekbeast edited PR #6867:

This implements a kserve backend allowing forwarding of wasi-nn calls over http to servers implementing the kserve protocol (documented here https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md). It also makes certain API calls that are expected to be expensive, async.

This makes it easy to offload evaluation of inference workloads through async methods to an external service that can support all the various frameworks. Inference workloads tend to be resource heavy and the most popular frameworks have large security attack surfaces. Being able to control when and where they run will make it easier for people to safely use wasmtime with wasi-nn without having to parse and execute model on arbitrary inputs in process.

This a draft PR and still needs a few more items:

Implementing a kserve registry

Wiring the kserve implementation of backend trait to kserve client

Wasmtime GitHub notifications bot (Aug 23 2023 at 06:26):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Aug 23 2023 at 06:31):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Aug 23 2023 at 06:36):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Aug 23 2023 at 10:23):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Aug 24 2023 at 11:26):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Aug 26 2023 at 06:23):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Sep 11 2023 at 19:49):

geekbeast has marked PR #6867 as ready for review.

Wasmtime GitHub notifications bot (Sep 11 2023 at 19:49):

geekbeast requested pchickey for a review on PR #6867.

Wasmtime GitHub notifications bot (Sep 11 2023 at 19:49):

geekbeast requested wasmtime-default-reviewers for a review on PR #6867.

Wasmtime GitHub notifications bot (Sep 11 2023 at 19:49):

geekbeast requested wasmtime-core-reviewers for a review on PR #6867.

Wasmtime GitHub notifications bot (Sep 11 2023 at 20:00):

pchickey requested abrown for a review on PR #6867.

Wasmtime GitHub notifications bot (Sep 19 2023 at 20:45):

abrown submitted PR review:

Before I do a closer review, here might be some things to look at:

you'll probably want to go through and clean up the conflicts, TODOs, commented-out code, etc., so I don't have to wade through that

you can split this into separate PRs to parallelize this if you want, e.g., I'm thinking of the "make wasi-nn host-async" or "improve the error handling parts" (I mean, you don't have to but if you want to move some stuff forward faster it might help)

I'm not sure what is going on with the tensor bytes type: that hasn't merged in the specification yet but I see a Git submodule update? What is going on with that?

Wasmtime GitHub notifications bot (Nov 07 2023 at 08:07):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 07 2023 at 09:16):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 07 2023 at 09:19):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 07 2023 at 09:24):

geekbeast edited PR #6867:

This implements a kserve backend allowing forwarding of wasi-nn calls over http to servers implementing the kserve protocol (documented here https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md). It also makes certain API calls that are expected to be expensive, async.

This makes it easy to offload evaluation of inference workloads through async methods to an external service that can support all the various frameworks. Inference workloads tend to be resource heavy and the most popular frameworks have large security attack surfaces. Being able to control when and where they run will make it easier for people to safely use wasmtime with wasi-nn without having to parse and execute model on arbitrary inputs in process.

This PR also implements a kserve registry so that models can be loaded via load named models and adds some better error reporting within what is possible in the current framework.

Wasmtime GitHub notifications bot (Nov 07 2023 at 09:30):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 07 2023 at 09:41):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 07 2023 at 09:50):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 07 2023 at 15:38):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 07 2023 at 21:55):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 07 2023 at 21:57):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 07 2023 at 22:02):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 07 2023 at 22:13):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 07 2023 at 22:18):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 07 2023 at 23:16):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 07 2023 at 23:16):

github-merge-queue[bot] updated PR #6867.

Wasmtime GitHub notifications bot (Nov 07 2023 at 23:20):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 07 2023 at 23:20):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 08 2023 at 02:09):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 08 2023 at 02:29):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 08 2023 at 02:31):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 08 2023 at 03:43):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 08 2023 at 04:21):

geekbeast updated PR #6867.

Wasmtime GitHub notifications bot (Nov 09 2023 at 17:47):

pchickey submitted PR review:

I don't have the context on wasi-nn and kserve to review this comprehensively but from the Rust prespective there are some idiomatic suggestions I provided, and in general this needs a pass to replace all uses of expect outside of #[test] with error handling, as well as cleaning out commented out code from earlier stages in the development process.

Wasmtime GitHub notifications bot (Nov 09 2023 at 17:47):

pchickey submitted PR review:

I don't have the context on wasi-nn and kserve to review this comprehensively but from the Rust prespective there are some idiomatic suggestions I provided, and in general this needs a pass to replace all uses of expect outside of #[test] with error handling, as well as cleaning out commented out code from earlier stages in the development process.

Wasmtime GitHub notifications bot (Nov 09 2023 at 17:47):

pchickey created PR review comment:

delete commented out code

Wasmtime GitHub notifications bot (Nov 09 2023 at 17:47):

pchickey created PR review comment:

for these debugging strings, better to use tracing::error! over eprintln

Wasmtime GitHub notifications bot (Nov 09 2023 at 17:47):

pchickey created PR review comment:

delete

Wasmtime GitHub notifications bot (Nov 09 2023 at 17:47):

pchickey created PR review comment:

this comment doesn't make sense, maybe its dead code?

Wasmtime GitHub notifications bot (Nov 09 2023 at 17:47):

pchickey created PR review comment:

commented out code, not clear to me whether this still needs to be implemented as part of this PR?

Wasmtime GitHub notifications bot (Nov 09 2023 at 17:47):

pchickey created PR review comment:

Instead of this helper function, use a std::io::Cursor and then the byteorder::WriteBytesExt trait at each of the call sites.

Wasmtime GitHub notifications bot (Nov 09 2023 at 17:47):

pchickey created PR review comment:

not clear what these commented out members are about

Wasmtime GitHub notifications bot (Nov 09 2023 at 17:47):

pchickey created PR review comment:

this code panics whenever the server url is malformed or the server doesn't respond correctly, please change it to return errors instead and handle those errors appropriately where this is invoked.

Wasmtime GitHub notifications bot (Nov 09 2023 at 17:47):

pchickey created PR review comment:

tracing::error! for this failure

Wasmtime GitHub notifications bot (Nov 09 2023 at 17:47):

pchickey created PR review comment:

tracing::debug! here

Wasmtime GitHub notifications bot (Nov 09 2023 at 17:47):

pchickey created PR review comment:

commented out code should be deleted?

Wasmtime GitHub notifications bot (Nov 09 2023 at 17:47):

pchickey created PR review comment:

? instead of expect

Wasmtime GitHub notifications bot (Nov 09 2023 at 20:33):

abrown submitted PR review:

I am in favor of merging this PR. Obviously there are a bunch of Rust-level programming issues to clean up before we do that (like @pchickey points out) but, at the higher wasi-nn level, this mostly makes sense. Maybe if we resolve most of that here, we can do the following in follow up PRs:

testing: I'm a bit worried about the testing story. From what I see, some of these tests depend on http://localhost:8000, a real kserve implementation, but I don't see that being set up in CI--how are those tests passing? At some point this new functionality needs to be validated during CI so that we immediately know if we've broken something by refactoring

documentation: eventually, we should be able to look at the module documentation and get a general idea of what the kserve backend is trying to do ("it connects to ... then sends a GET request to do ..."). Also, explaining why several of APIs _must_ become async would be helpful later

Otherwise, I think the general idea of plugging in a new backend is a good one and validates the refactoring to these Box<dyn ...> wrappers. Let's just clean up the extra code here that @pchickey notes and we can move on from there; @geekbeast, let me know if you want to pair up to finish this.

Wasmtime GitHub notifications bot (Mar 21 2024 at 22:58):

mtr-fastly updated PR #6867.

Wasmtime GitHub notifications bot (Jun 10 2024 at 16:47):

geekbeast updated PR #6867.

Last updated: Apr 16 2025 at 17:03 UTC