wasmtime / PR #8964 Add FP16 and I64 support for wasi-nn ... · git-wasmtime

Are we sure cast is right here? Poking around in windows-rs, I see that it has a CanInto implementation for TensorFeatureDescriptor here — doesn't that mean we should be using into() instead?

Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

        let inspectable =

Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

You probably only need implement for this PR, right?

Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

Is this correct? ONNX is going to represent this type with 4 bytes instead of 2 bytes? If so, maybe we should add a comment describing why this is the case and linking to some documentation; otherwise, it's a bit surprising.

Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

I don't think this makes sense. The cargo test above runs both the unit tests within the library as well as the integration tests in the tests directory.

Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

Not sure if it's any faster, but this would avoid the size calculation:
let data = view.into_iter().map(f32::to_le_bytes).collect();
We might want to note somewhere that the LE ordering is here to match WebAssembly's ordering, not the platform we're running on (in which case always doing LE could be incorrect).

Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

We can do this once at the top of the function.

Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

When doing this unsafe "cast" to a slice, we also need to make sure that the u8 pointer to tensor.data is aligned in such a way that the f32s in data will also be aligned to their 4-byte alignment. Here's an example of how to do the check from openvino-rs. And the docs for std::slice::from_raw_parts have all the details. I think it would be fine here to fail if things aren't aligned correctly (instead of trying to shift everything!), at least for the time being.

Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

                _ => unimplemented!("the winml backend only supports tensors, found: {}", kind),

Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

                tensor.data.as_ptr().cast<i64>,
These days clippy tells me this is the preferred way to do this.

Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

This kind of comment can be moved from here to the unimplemented!() at the end of to_tensor so that both users and code readers can take advantage of this information.

Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

E.g., https://microsoft.github.io/windows-docs-rs/doc/windows/AI/MachineLearning/struct.TensorFloat16Bit.html#method.CreateFromArray.

Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

        TensorType::Fp32 => unsafe {
Isn't TensorType already imported?

Wasmtime GitHub notifications bot (Jul 18 2024 at 23:37):

abrown created PR review comment:

It took me a second to understand that what we're trying to do is "roundtrip" a tensor through to_inspectable and back through to_tensor. To make this more clear, can you look at factoring out some of the duplication here?

Wasmtime GitHub notifications bot (Jul 19 2024 at 01:40):

jianjunz updated PR #8964.

Wasmtime GitHub notifications bot (Jul 19 2024 at 01:44):

jianjunz submitted PR review.

Wasmtime GitHub notifications bot (Jul 19 2024 at 01:44):

jianjunz created PR review comment:

You're right. The cargo test above also runs unit tests. https://github.com/bytecodealliance/wasmtime/actions/runs/9969920948/job/27547820669#step:8:409

Wasmtime GitHub notifications bot (Jul 19 2024 at 01:45):

jianjunz submitted PR review.

Wasmtime GitHub notifications bot (Jul 19 2024 at 01:45):

jianjunz created PR review comment:

Removed. They're not needed for this one.

Wasmtime GitHub notifications bot (Jul 19 2024 at 02:33):

jianjunz submitted PR review.

Wasmtime GitHub notifications bot (Jul 19 2024 at 02:33):

jianjunz created PR review comment:

output_tensor's type is unknown at the top of the function.

Wasmtime GitHub notifications bot (Jul 19 2024 at 02:33):

jianjunz created PR review comment:

WinML may report an error

self.binding.Bind("input", tensor of shape [10]);
self.binding.Bind("input", tensor of shape [11]);  <-- error

But it works

self.binding.Bind("input", tensor of shape [10]);
self.binding.Clear();
self.binding.Bind("input", tensor of shape [11]);

Wasmtime GitHub notifications bot (Jul 19 2024 at 02:33):

jianjunz created PR review comment:

WinML may use 2 bytes internally because it's fp16, but f16 is not officially supported by Rust stable, so we use f32 here.

Wasmtime GitHub notifications bot (Jul 19 2024 at 04:52):

jianjunz submitted PR review.

Wasmtime GitHub notifications bot (Jul 19 2024 at 04:52):

jianjunz created PR review comment:

Removed crate::wit::types::.

Wasmtime GitHub notifications bot (Jul 19 2024 at 04:52):

jianjunz created PR review comment:

Done.

Wasmtime GitHub notifications bot (Jul 19 2024 at 04:52):

jianjunz created PR review comment:

Removed.

Wasmtime GitHub notifications bot (Jul 19 2024 at 04:52):

jianjunz created PR review comment:

Done.

Wasmtime GitHub notifications bot (Jul 19 2024 at 04:52):

jianjunz created PR review comment:

view's type is IVectorView, not a rust collection, so the short version doesn't work here.

Wasmtime GitHub notifications bot (Jul 19 2024 at 06:27):

jianjunz submitted PR review.

Wasmtime GitHub notifications bot (Jul 19 2024 at 06:27):

jianjunz created PR review comment:

My understanding is cast here is for COM interfaces, similar as QueryInterface, while into is for converting derived class into base class?

Wasmtime GitHub notifications bot (Jul 19 2024 at 06:28):

jianjunz updated PR #8964.

Wasmtime GitHub notifications bot (Jul 19 2024 at 06:33):

jianjunz submitted PR review.

Wasmtime GitHub notifications bot (Jul 19 2024 at 06:33):

jianjunz created PR review comment:

Replaced with cast.

Wasmtime GitHub notifications bot (Jul 19 2024 at 08:10):

jianjunz updated PR #8964.

Wasmtime GitHub notifications bot (Jul 19 2024 at 08:11):

jianjunz submitted PR review.

Wasmtime GitHub notifications bot (Jul 19 2024 at 08:11):

jianjunz created PR review comment:

Thanks for the info. Added assertion when getting length.

Wasmtime GitHub notifications bot (Jul 19 2024 at 08:12):

jianjunz submitted PR review.

Wasmtime GitHub notifications bot (Jul 19 2024 at 08:12):

jianjunz created PR review comment:

Implemented PartialEq for Tensor, so we don't need a lot of assert_eq! here.

Wasmtime GitHub notifications bot (Jul 24 2024 at 02:41):

abrown submitted PR review.

Wasmtime GitHub notifications bot (Jul 24 2024 at 02:41):

abrown created PR review comment:

Ok, so this needs to be fixed then in this PR?

Wasmtime GitHub notifications bot (Jul 24 2024 at 02:41):

abrown created PR review comment:

No, I don't think that is not correct. I checked out the code and let data = view.into_iter().flat_map(f32::to_le_bytes).collect(); (changing map to flat_map) seems to compile ok.

Wasmtime GitHub notifications bot (Jul 24 2024 at 02:41):

abrown created PR review comment:

We should probably be using shape here to verify that tensor.dimensions matches.

Wasmtime GitHub notifications bot (Jul 24 2024 at 02:41):

abrown created PR review comment:

Not sure that answers the question...

Wasmtime GitHub notifications bot (Jul 24 2024 at 02:41):

abrown created PR review comment:

If you'd like to do this in a more standard way, I just learned of align_to which can be used to do the same thing.

Wasmtime GitHub notifications bot (Jul 24 2024 at 06:13):

jianjunz updated PR #8964.

Wasmtime GitHub notifications bot (Jul 24 2024 at 06:13):

jianjunz submitted PR review.

Wasmtime GitHub notifications bot (Jul 24 2024 at 06:13):

jianjunz created PR review comment:

I mean cast here queries if the COM object can be convert into a derived class, while into is used to convert into a base class. That's my understanding.

Wasmtime GitHub notifications bot (Jul 24 2024 at 06:13):

jianjunz created PR review comment:

No, not in this one. It cannot be easily fixed by adding a self.binding.Clear() here because a model may have multiple input features. In this case, application calls set_input more than once.

Wasmtime GitHub notifications bot (Jul 24 2024 at 06:13):

jianjunz created PR review comment:

Thanks for pointing out the duplication here. I think we don't need to verify shape here, WinML will check it later, so shape is removed.

Wasmtime GitHub notifications bot (Jul 24 2024 at 06:13):

jianjunz submitted PR review.

Wasmtime GitHub notifications bot (Jul 24 2024 at 06:13):

jianjunz created PR review comment:

flat_map works. Thanks.

Wasmtime GitHub notifications bot (Jul 24 2024 at 08:22):

jianjunz updated PR #8964.

Wasmtime GitHub notifications bot (Aug 06 2024 at 00:34):

abrown submitted PR review.

Wasmtime GitHub notifications bot (Aug 06 2024 at 00:34):

abrown created PR review comment:

I think I understand what you're saying about Clear: it erases _all_ the bindings, even for other model inputs. We can't have that. But what happens in this conceivable sequence?

Wasm guest calls set_input on input N with shape A

Wasm guest again calls set_input on input N with shape B

This is valid, though silly. The user should not have to face an error from wasi-nn in this case, right? But, if WinML is going to raise an error, then should we protect this some other way, e.g., by checking that the tensor shape is what the model expects (either A or B)?

Wasmtime GitHub notifications bot (Aug 06 2024 at 00:34):

abrown created PR review comment:

How about

let itensor = inspectable.cast<ITensor>()?;
let dimensions = dimensions_as_u32(&itensor.Shape()?)?;

or something like that?

Wasmtime GitHub notifications bot (Aug 06 2024 at 00:34):

abrown created PR review comment:

Ok, please add a comment explaining that and link to the docs I mentioned.

Wasmtime GitHub notifications bot (Aug 06 2024 at 00:34):

abrown created PR review comment:

I'm not familiar enough with COM so I'll take your word for it that this is the right approach.

Wasmtime GitHub notifications bot (Aug 06 2024 at 00:37):

abrown submitted PR review.

Wasmtime GitHub notifications bot (Aug 06 2024 at 00:37):

abrown created PR review comment:

Probably should just #[derive(PartialEq)] since it's equivalent to this I think.

Wasmtime GitHub notifications bot (Aug 06 2024 at 02:16):

jianjunz updated PR #8964.

Wasmtime GitHub notifications bot (Aug 06 2024 at 02:43):

jianjunz submitted PR review.

Wasmtime GitHub notifications bot (Aug 06 2024 at 02:43):

jianjunz created PR review comment:

That's a variable input, accepts both A and B (like a string with different length). I feel like this is a bug of WinML, or the calling flow is incorrect. Clear fixes the issue but I'm not sure if that's the only solution, so I'm not adding Clear at this time.

Wasmtime GitHub notifications bot (Aug 06 2024 at 02:43):

jianjunz created PR review comment:

It works, but GetAsVectorView is a method of TensorFloat16Bit. Then we'll need to cast itensor again to specific tensor type. The change above makes code clean but will it be a performance issue to cast twice?

Wasmtime GitHub notifications bot (Aug 06 2024 at 10:45):

jianjunz updated PR #8964.

Wasmtime GitHub notifications bot (Aug 06 2024 at 10:46):

jianjunz submitted PR review.

Wasmtime GitHub notifications bot (Aug 06 2024 at 10:46):

jianjunz created PR review comment:

It works. Thanks.

Wasmtime GitHub notifications bot (Aug 07 2024 at 23:51):

jianjunz updated PR #8964.

Wasmtime GitHub notifications bot (Aug 08 2024 at 00:06):

abrown updated PR #8964.

Wasmtime GitHub notifications bot (Aug 08 2024 at 16:38):

abrown submitted PR review.

Wasmtime GitHub notifications bot (Aug 08 2024 at 16:54):

abrown merged PR #8964.

Last updated: Apr 17 2025 at 22:03 UTC