Stream: git-wasmtime

Topic: wasmtime / PR #8964 Add FP16 and I64 support for wasi-nn ...


view this post on Zulip Wasmtime GitHub notifications bot (Jul 16 2024 at 13:48):

jianjunz opened PR #8964 from jianjunz:f16-i64 to bytecodealliance:main:

Some devices may not support FP32.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 17 2024 at 02:22):

jianjunz updated PR #8964.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 17 2024 at 02:48):

jianjunz updated PR #8964.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 17 2024 at 03:56):

jianjunz updated PR #8964.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 17 2024 at 07:21):

jianjunz updated PR #8964.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 17 2024 at 08:29):

jianjunz has marked PR #8964 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 17 2024 at 08:29):

jianjunz requested wasmtime-core-reviewers for a review on PR #8964.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 17 2024 at 08:29):

jianjunz requested alexcrichton for a review on PR #8964.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 17 2024 at 08:29):

jianjunz requested wasmtime-default-reviewers for a review on PR #8964.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 17 2024 at 14:30):

alexcrichton requested abrown for a review on PR #8964.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

When is this needed?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

Are we sure cast is right here? Poking around in windows-rs, I see that it has a CanInto implementation for TensorFeatureDescriptor here — doesn't that mean we should be using into() instead?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

        let inspectable =

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

You probably only need implement for this PR, right?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

Is this correct? ONNX is going to represent this type with 4 bytes instead of 2 bytes? If so, maybe we should add a comment describing why this is the case and linking to some documentation; otherwise, it's a bit surprising.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

I don't think this makes sense. The cargo test above runs both the unit tests within the library as well as the integration tests in the tests directory.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

Not sure if it's any faster, but this would avoid the size calculation:

let data = view.into_iter().map(f32::to_le_bytes).collect();

We might want to note somewhere that the LE ordering is here to match WebAssembly's ordering, not the platform we're running on (in which case always doing LE could be incorrect).

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

We can do this once at the top of the function.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

When doing this unsafe "cast" to a slice, we also need to make sure that the u8 pointer to tensor.data is aligned in such a way that the f32s in data will also be aligned to their 4-byte alignment. Here's an example of how to do the check from openvino-rs. And the docs for std::slice::from_raw_parts have all the details. I think it would be fine here to fail if things aren't aligned correctly (instead of trying to shift everything!), at least for the time being.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

                _ => unimplemented!("the winml backend only supports tensors, found: {}", kind),

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

                tensor.data.as_ptr().cast<i64>,

These days clippy tells me this is the preferred way to do this.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

This kind of comment can be moved from here to the unimplemented!() at the end of to_tensor so that both users and code readers can take advantage of this information.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

E.g., https://microsoft.github.io/windows-docs-rs/doc/windows/AI/MachineLearning/struct.TensorFloat16Bit.html#method.CreateFromArray.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

        TensorType::Fp32 => unsafe {

Isn't TensorType already imported?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

It took me a second to understand that what we're trying to do is "roundtrip" a tensor through to_inspectable and back through to_tensor. To make this more clear, can you look at factoring out some of the duplication here?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 01:40):

jianjunz updated PR #8964.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 01:44):

jianjunz submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 01:44):

jianjunz created PR review comment:

You're right. The cargo test above also runs unit tests. https://github.com/bytecodealliance/wasmtime/actions/runs/9969920948/job/27547820669#step:8:409

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 01:45):

jianjunz submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 01:45):

jianjunz created PR review comment:

Removed. They're not needed for this one.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 02:33):

jianjunz submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 02:33):

jianjunz created PR review comment:

output_tensor's type is unknown at the top of the function.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 02:33):

jianjunz created PR review comment:

WinML may report an error

self.binding.Bind("input", tensor of shape [10]);
self.binding.Bind("input", tensor of shape [11]);  <-- error

But it works

self.binding.Bind("input", tensor of shape [10]);
self.binding.Clear();
self.binding.Bind("input", tensor of shape [11]);

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 02:33):

jianjunz created PR review comment:

WinML may use 2 bytes internally because it's fp16, but f16 is not officially supported by Rust stable, so we use f32 here.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 04:52):

jianjunz submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 04:52):

jianjunz created PR review comment:

Removed crate::wit::types::.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 04:52):

jianjunz created PR review comment:

Done.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 04:52):

jianjunz created PR review comment:

Removed.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 04:52):

jianjunz created PR review comment:

Done.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 04:52):

jianjunz created PR review comment:

view's type is IVectorView, not a rust collection, so the short version doesn't work here.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 06:27):

jianjunz submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 06:27):

jianjunz created PR review comment:

My understanding is cast here is for COM interfaces, similar as QueryInterface, while into is for converting derived class into base class?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 06:28):

jianjunz updated PR #8964.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 06:33):

jianjunz submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 06:33):

jianjunz created PR review comment:

Replaced with cast.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 08:10):

jianjunz updated PR #8964.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 08:11):

jianjunz submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 08:11):

jianjunz created PR review comment:

Thanks for the info. Added assertion when getting length.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 08:12):

jianjunz submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2024 at 08:12):

jianjunz created PR review comment:

Implemented PartialEq for Tensor, so we don't need a lot of assert_eq! here.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2024 at 02:41):

abrown submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2024 at 02:41):

abrown created PR review comment:

Ok, so this needs to be fixed then in this PR?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2024 at 02:41):

abrown created PR review comment:

No, I don't think that is not correct. I checked out the code and let data = view.into_iter().flat_map(f32::to_le_bytes).collect(); (changing map to flat_map) seems to compile ok.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2024 at 02:41):

abrown created PR review comment:

We should probably be using shape here to verify that tensor.dimensions matches.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2024 at 02:41):

abrown created PR review comment:

Not sure that answers the question...

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2024 at 02:41):

abrown created PR review comment:

If you'd like to do this in a more standard way, I just learned of align_to which can be used to do the same thing.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2024 at 06:13):

jianjunz updated PR #8964.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2024 at 06:13):

jianjunz submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2024 at 06:13):

jianjunz created PR review comment:

I mean cast here queries if the COM object can be convert into a derived class, while into is used to convert into a base class. That's my understanding.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2024 at 06:13):

jianjunz created PR review comment:

No, not in this one. It cannot be easily fixed by adding a self.binding.Clear() here because a model may have multiple input features. In this case, application calls set_input more than once.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2024 at 06:13):

jianjunz created PR review comment:

Thanks for pointing out the duplication here. I think we don't need to verify shape here, WinML will check it later, so shape is removed.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2024 at 06:13):

jianjunz submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2024 at 06:13):

jianjunz created PR review comment:

flat_map works. Thanks.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2024 at 08:22):

jianjunz updated PR #8964.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 06 2024 at 00:34):

abrown submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 06 2024 at 00:34):

abrown created PR review comment:

I think I understand what you're saying about Clear: it erases _all_ the bindings, even for other model inputs. We can't have that. But what happens in this conceivable sequence?

This is valid, though silly. The user should not have to face an error from wasi-nn in this case, right? But, if WinML is going to raise an error, then should we protect this some other way, e.g., by checking that the tensor shape is what the model expects (either A or B)?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 06 2024 at 00:34):

abrown created PR review comment:

How about

let itensor = inspectable.cast<ITensor>()?;
let dimensions = dimensions_as_u32(&itensor.Shape()?)?;

or something like that?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 06 2024 at 00:34):

abrown created PR review comment:

Ok, please add a comment explaining that and link to the docs I mentioned.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 06 2024 at 00:34):

abrown created PR review comment:

I'm not familiar enough with COM so I'll take your word for it that this is the right approach.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 06 2024 at 00:37):

abrown submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 06 2024 at 00:37):

abrown created PR review comment:

Probably should just #[derive(PartialEq)] since it's equivalent to this I think.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 06 2024 at 02:16):

jianjunz updated PR #8964.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 06 2024 at 02:43):

jianjunz submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 06 2024 at 02:43):

jianjunz created PR review comment:

That's a variable input, accepts both A and B (like a string with different length). I feel like this is a bug of WinML, or the calling flow is incorrect. Clear fixes the issue but I'm not sure if that's the only solution, so I'm not adding Clear at this time.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 06 2024 at 02:43):

jianjunz created PR review comment:

It works, but GetAsVectorView is a method of TensorFloat16Bit. Then we'll need to cast itensor again to specific tensor type. The change above makes code clean but will it be a performance issue to cast twice?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 06 2024 at 10:45):

jianjunz updated PR #8964.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 06 2024 at 10:46):

jianjunz submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 06 2024 at 10:46):

jianjunz created PR review comment:

It works. Thanks.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 07 2024 at 23:51):

jianjunz updated PR #8964.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 08 2024 at 00:06):

abrown updated PR #8964.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 08 2024 at 16:38):

abrown submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 08 2024 at 16:54):

abrown merged PR #8964.


Last updated: Dec 23 2024 at 13:07 UTC