Stream: git-wasmtime

Topic: wasmtime / issue #8391 Inconsistent results for wasi-nn d...


view this post on Zulip Wasmtime GitHub notifications bot (Apr 17 2024 at 08:55):

jianjunz added the bug label to Issue #8391.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 17 2024 at 08:55):

jianjunz opened issue #8391:

Test Case

nn_image_classification, nn_image_classification_named and nn_image_classification_onnx have different inference results. The first two cases are based on openvino backend, while the third one is based on onnxruntime backend.

Steps to Reproduce

Run the cases above locally or check the output of GitHub Actions.

An example output:
https://github.com/bytecodealliance/wasmtime/actions/runs/8716252474/job/23909494323#step:17:3914

Expected Results

Since both of them use mobilenet v2 and the same input tensor data, they should have similar results. We don't expect them to be exactly the same because of different model formats.

Actual Results

The result of wasi-nn openvino backend is [InferenceResult(963, 0.7113049), InferenceResult(762, 0.07070768), InferenceResult(909, 0.036356032), InferenceResult(926, 0.015456118), InferenceResult(567, 0.015344023)].

The result for onnx backend is [InferenceResult(470, 479.08182), InferenceResult(862, 378.7252), InferenceResult(626, 364.8759), InferenceResult(644, 334.28488), InferenceResult(556, 288.65884)].

CI for WinML backend is not enabled yet, but it has the same result as onnx backend.

Versions and Environment

Wasmtime version or commit: 19.0.1

Operating system: Windows

Architecture: x86_64

Extra Info

Although all these tests use mobilenet v2, openvino model requires input data to be BGR format, with mean values: 127.5, 127.5, 127.5. ONNX model (used by both onnxruntime backend and winml backend) requires input data to be RGB format, in the range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225] (onnx model description).

I'm not sure if we missed input data preprocessing, or it was processed somewhere else.

The test case for WinML backend uses a different input at this time. I'm trying to unifying the inputs for all backends so we can double check the correctness.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 18 2024 at 17:19):

abrown commented on issue #8391:

cc: @devigned

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2024 at 06:32):

jianjunz commented on issue #8391:

Wasi-nn example classification-component-onnx has images pre-processed here. Applying the same process for test inputs should fix this issue for onnxruntime and winml backends.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2024 at 15:10):

devigned commented on issue #8391:

Nice catch, @jianjunz! Thank you for opening this issue.

I'm happy to open a PR for this, but if you are interested, I'd gladly give it a review. Are you interested in contributing?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2024 at 15:10):

devigned edited a comment on issue #8391:

Nice catch, @jianjunz! Thank you for opening this issue.

I'm happy to open a PR for this, but if you are interested, I'd gladly give a PR from you a review. Are you interested in contributing?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 23 2024 at 09:48):

jianjunz commented on issue #8391:

Thanks, David. #8442 is opened for fixing this issue. It also makes ONNX Runtime backend and WinML backend share the same test code because both of them use ONNX models.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 11 2024 at 17:30):

abrown closed issue #8391:

Test Case

nn_image_classification, nn_image_classification_named and nn_image_classification_onnx have different inference results. The first two cases are based on openvino backend, while the third one is based on onnxruntime backend.

Steps to Reproduce

Run the cases above locally or check the output of GitHub Actions.

An example output:
https://github.com/bytecodealliance/wasmtime/actions/runs/8716252474/job/23909494323#step:17:3914

Expected Results

Since both of them use mobilenet v2 and the same input tensor data, they should have similar results. We don't expect them to be exactly the same because of different model formats.

Actual Results

The result of wasi-nn openvino backend is [InferenceResult(963, 0.7113049), InferenceResult(762, 0.07070768), InferenceResult(909, 0.036356032), InferenceResult(926, 0.015456118), InferenceResult(567, 0.015344023)].

The result for onnx backend is [InferenceResult(470, 479.08182), InferenceResult(862, 378.7252), InferenceResult(626, 364.8759), InferenceResult(644, 334.28488), InferenceResult(556, 288.65884)].

CI for WinML backend is not enabled yet, but it has the same result as onnx backend.

Versions and Environment

Wasmtime version or commit: 19.0.1

Operating system: Windows

Architecture: x86_64

Extra Info

Although all these tests use mobilenet v2, openvino model requires input data to be BGR format, with mean values: 127.5, 127.5, 127.5. ONNX model (used by both onnxruntime backend and winml backend) requires input data to be RGB format, in the range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225] (onnx model description).

I'm not sure if we missed input data preprocessing, or it was processed somewhere else.

The test case for WinML backend uses a different input at this time. I'm trying to unifying the inputs for all backends so we can double check the correctness.


Last updated: Dec 23 2024 at 12:05 UTC