Stream: git-wasmtime

Topic: wasmtime / issue #8571 Changing a multiplication operatio...


view this post on Zulip Wasmtime GitHub notifications bot (May 07 2024 at 13:37):

hungryzzz added the bug label to Issue #8571.

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2024 at 13:37):

hungryzzz opened issue #8571:

Test Case

cases.zip

Steps to Reproduce

I run the attached two cases(good.wasm&bad.wasm) in Wasmtime and WasmEdge, and collect their execution time respectively.

Expected Results & Actual Results

Actually the only difference between the attached two cases is as follow, i.e., a multiplication operation to a division one, but the execution time of Wasmtime increases by more than 5 times, while the same for Wasmedge and Wasmer.

![截屏2024-05-07 21 18 58](https://github.com/bytecodealliance/wasmtime/assets/32137313/a5c51594-b60f-466e-8517-48977fed9da3)

For good.wasm, the execution time in different runtimes are as follows:
<img width="1214" alt="截屏2024-05-07 21 32 15" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/41a41f7d-72ab-4233-81eb-2600a6cb021f">

For bad.wasm, the execution time in different runtimes are as follows:
<img width="1191" alt="截屏2024-05-07 21 33 33" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/2f281b67-4568-40d9-adba-240d069b6fdf">

Versions and Environment

Wasmtime version or commit: d0cf46a098d97bab9c4cf54a2f3aeaddea7d3831

Operating system: Linux ringzzz-OptiPlex-7070 5.15.0-97-generic

Architecture: Intel(R) Core(TM) i5-9500T CPU @ 2.20GHz

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2024 at 13:37):

hungryzzz edited issue #8571:

Test Case

cases.zip

Steps to Reproduce

Hi, I run the attached two cases(good.wasm&bad.wasm) in Wasmtime and WasmEdge, and collect their execution time respectively.

Expected Results & Actual Results

Actually the only difference between the attached two cases is as follow, i.e., a multiplication operation to a division one, but the execution time of Wasmtime increases by more than 5 times, while the same for Wasmedge and Wasmer.

![截屏2024-05-07 21 18 58](https://github.com/bytecodealliance/wasmtime/assets/32137313/a5c51594-b60f-466e-8517-48977fed9da3)

For good.wasm, the execution time in different runtimes are as follows:
<img width="1214" alt="截屏2024-05-07 21 32 15" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/41a41f7d-72ab-4233-81eb-2600a6cb021f">

For bad.wasm, the execution time in different runtimes are as follows:
<img width="1191" alt="截屏2024-05-07 21 33 33" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/2f281b67-4568-40d9-adba-240d069b6fdf">

Versions and Environment

Wasmtime version or commit: d0cf46a098d97bab9c4cf54a2f3aeaddea7d3831

Operating system: Linux ringzzz-OptiPlex-7070 5.15.0-97-generic

Architecture: Intel(R) Core(TM) i5-9500T CPU @ 2.20GHz

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2024 at 13:40):

hungryzzz edited issue #8571:

Test Case

cases.zip

Steps to Reproduce

Hi, I run the attached two cases(good.wasm&bad.wasm) in Wasmtime and WasmEdge, and collect their execution time respectively.

Expected Results & Actual Results

Actually the only difference between the attached two cases is as follow, i.e., a multiplication operation to a division one, but the execution time of Wasmtime increases by more than 5 times, while the execution time before and after the modification is nearly the same for Wasmedge and Wasmer.

![截屏2024-05-07 21 18 58](https://github.com/bytecodealliance/wasmtime/assets/32137313/a5c51594-b60f-466e-8517-48977fed9da3)

For good.wasm, the execution time in different runtimes are as follows:
<img width="1214" alt="截屏2024-05-07 21 32 15" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/41a41f7d-72ab-4233-81eb-2600a6cb021f">

For bad.wasm, the execution time in different runtimes are as follows:
<img width="1191" alt="截屏2024-05-07 21 33 33" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/2f281b67-4568-40d9-adba-240d069b6fdf">

Versions and Environment

Wasmtime version or commit: d0cf46a098d97bab9c4cf54a2f3aeaddea7d3831

Operating system: Linux ringzzz-OptiPlex-7070 5.15.0-97-generic

Architecture: Intel(R) Core(TM) i5-9500T CPU @ 2.20GHz

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2024 at 13:40):

hungryzzz edited issue #8571:

Test Case

cases.zip

Steps to Reproduce

Hi, I run the attached two cases(good.wasm&bad.wasm) in Wasmtime and WasmEdge, and collect their execution time respectively.

Expected Results & Actual Results

Actually the only difference between the attached two cases is as follow, i.e., a multiplication operation to a division one, but the execution time of Wasmtime increases by more than 5 times, while the execution time before and after the modification is nearly the same in Wasmedge and Wasmer.

![截屏2024-05-07 21 18 58](https://github.com/bytecodealliance/wasmtime/assets/32137313/a5c51594-b60f-466e-8517-48977fed9da3)

For good.wasm, the execution time in different runtimes are as follows:
<img width="1214" alt="截屏2024-05-07 21 32 15" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/41a41f7d-72ab-4233-81eb-2600a6cb021f">

For bad.wasm, the execution time in different runtimes are as follows:
<img width="1191" alt="截屏2024-05-07 21 33 33" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/2f281b67-4568-40d9-adba-240d069b6fdf">

Versions and Environment

Wasmtime version or commit: d0cf46a098d97bab9c4cf54a2f3aeaddea7d3831

Operating system: Linux ringzzz-OptiPlex-7070 5.15.0-97-generic

Architecture: Intel(R) Core(TM) i5-9500T CPU @ 2.20GHz

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2024 at 13:58):

bjorn3 commented on issue #8571:

Division is a lot more expensive than multiplication on pretty much every cpu. It may be that the other wasm engines are optimizing it away, or schedule the instructions slightly differently in a way that masks most of the latency on modern super scalar cpu's.

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2024 at 14:31):

hungryzzz commented on issue #8571:

Yes, you're right, but the default backend of Wasmer is also Cranelift, the execution time before and after the modification in Wasmer is nearly the same. So I am wondering if it is possible that the subsequent commits introduce some bugs.

<img width="1366" alt="截屏2024-05-07 22 27 01" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/650171d3-f6db-4e60-9ad1-0a0811c4d026">

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2024 at 16:06):

cfallin closed issue #8571:

Test Case

cases.zip

Steps to Reproduce

Hi, I run the attached two cases(good.wasm&bad.wasm) in Wasmtime and WasmEdge, and collect their execution time respectively.

Expected Results & Actual Results

Actually the only difference between the attached two cases is as follow, i.e., a multiplication operation to a division one, but the execution time of Wasmtime increases by more than 5 times, while the execution time before and after the modification is nearly the same in Wasmedge and Wasmer.

![截屏2024-05-07 21 18 58](https://github.com/bytecodealliance/wasmtime/assets/32137313/a5c51594-b60f-466e-8517-48977fed9da3)

For good.wasm, the execution time in different runtimes are as follows:
<img width="1214" alt="截屏2024-05-07 21 32 15" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/41a41f7d-72ab-4233-81eb-2600a6cb021f">

For bad.wasm, the execution time in different runtimes are as follows:
<img width="1191" alt="截屏2024-05-07 21 33 33" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/2f281b67-4568-40d9-adba-240d069b6fdf">

Versions and Environment

Wasmtime version or commit: d0cf46a098d97bab9c4cf54a2f3aeaddea7d3831

Operating system: Linux ringzzz-OptiPlex-7070 5.15.0-97-generic

Architecture: Intel(R) Core(TM) i5-9500T CPU @ 2.20GHz

view this post on Zulip Wasmtime GitHub notifications bot (May 07 2024 at 16:06):

cfallin commented on issue #8571:

@hungryzzz our current version of the mid-end optimizer can't yet rewrite divs by constants into multiplies by magic constants; we have #6049 for this. I'm going to go ahead and close this issue in favor of that as it's a duplicate report.

view this post on Zulip Wasmtime GitHub notifications bot (May 08 2024 at 07:04):

hungryzzz edited issue #8571:

Test Case

cases.zip

Steps to Reproduce

Hi, I run the attached two cases(good.wasm&bad.wasm) in Wasmtime and WasmEdge, and collect their execution time respectively.

Expected Results & Actual Results

Actually the only difference between the attached two cases is as follow, i.e., a multiplication operation to a division one, but the execution time of Wasmtime increases by more than 5 times, while the execution time before and after the modification is nearly the same in Wasmedge and Wasmer.

![截屏2024-05-07 21 18 58](https://github.com/bytecodealliance/wasmtime/assets/32137313/a5c51594-b60f-466e-8517-48977fed9da3)

For good.wasm, the execution time in different runtimes are as follows:

/home/ringzzz/wasm_runtime/wasmer/target/release/wasmer $filename.wasm  1.76s user 0.01s system 100% cpu 1.766 total
/home/ringzzz/wasm_runtime/WasmEdge/build/tools/wasmedge/wasmedge   1.18s user 0.00s system 99% cpu 1.179 total
/home/ringzzz/wasm_runtime/wasmtime/target/release/wasmtime run    0.74s user 0.00s system 100% cpu 0.741 total

For bad.wasm, the execution time in different runtimes are as follows:

/home/ringzzz/wasm_runtime/wasmer/target/release/wasmer $filename.wasm  1.77s user 0.00s system 100% cpu 1.765 total
/home/ringzzz/wasm_runtime/WasmEdge/build/tools/wasmedge/wasmedge   1.18s user 0.00s system 99% cpu 1.181 total
/home/ringzzz/wasm_runtime/wasmtime/target/release/wasmtime run    4.68s user 0.00s system 100% cpu 4.680 total

Versions and Environment

Wasmtime version or commit: d0cf46a098d97bab9c4cf54a2f3aeaddea7d3831

Operating system: Linux ringzzz-OptiPlex-7070 5.15.0-97-generic

Architecture: Intel(R) Core(TM) i5-9500T CPU @ 2.20GHz

view this post on Zulip Wasmtime GitHub notifications bot (May 08 2024 at 07:05):

hungryzzz edited a comment on issue #8571:

Yes, you're right, but the default backend of Wasmer is also Cranelift, the execution time before and after the modification in Wasmer is nearly the same. So I am wondering if it is possible that the subsequent commits introduce some bugs.

  case  time wasmer run --cranelift good.wasm
~/wasmer/target/release/wasmer run --cranelift   1.76s user 0.00s system 100% cpu 1.758 total
➜  case time wasmer run --cranelift bad.wasm
~/wasmer/target/release/wasmer run --cranelift   1.75s user 0.01s system 100% cpu 1.759 total

view this post on Zulip Wasmtime GitHub notifications bot (May 08 2024 at 07:06):

hungryzzz edited issue #8571:

Test Case

cases.zip

Steps to Reproduce

Hi, I run the attached two cases(good.wasm&bad.wasm) in Wasmtime and WasmEdge, and collect their execution time respectively.

Expected Results & Actual Results

Actually the only difference between the attached two cases is as follow, i.e., a multiplication operation to a division one, but the execution time of Wasmtime increases by more than 5 times, while the execution time before and after the modification is nearly the same in Wasmedge and Wasmer.

![截屏2024-05-07 21 18 58](https://github.com/bytecodealliance/wasmtime/assets/32137313/a5c51594-b60f-466e-8517-48977fed9da3)

For good.wasm, the execution time in different runtimes are as follows:

~/wasmer/target/release/wasmer $filename.wasm  1.76s user 0.01s system 100% cpu 1.766 total
~/WasmEdge/build/tools/wasmedge/wasmedge   1.18s user 0.00s system 99% cpu 1.179 total
~/wasmtime/target/release/wasmtime run    0.74s user 0.00s system 100% cpu 0.741 total

For bad.wasm, the execution time in different runtimes are as follows:

~/wasmer/target/release/wasmer $filename.wasm  1.77s user 0.00s system 100% cpu 1.765 total
~/WasmEdge/build/tools/wasmedge/wasmedge   1.18s user 0.00s system 99% cpu 1.181 total
~/wasmtime/target/release/wasmtime run    4.68s user 0.00s system 100% cpu 4.680 total

Versions and Environment

Wasmtime version or commit: d0cf46a098d97bab9c4cf54a2f3aeaddea7d3831

Operating system: Linux ringzzz-OptiPlex-7070 5.15.0-97-generic

Architecture: Intel(R) Core(TM) i5-9500T CPU @ 2.20GHz

view this post on Zulip Wasmtime GitHub notifications bot (Jun 16 2024 at 08:16):

hungryzzz commented on issue #8571:

@cfallin Hi, I check the generated machine code of bad.wasm and good.wasm, and find that, in the wasm code, the result of i32.div_u or i32.mul will be dropped directly, and the machine code of good.wasm will eliminate the multiplication operation, however, the machine code of bad.wasm will still do the division operation.
Does this have anything to do with "rewrite divs by constants into multiplies by magic constants"? (In my view, this optimization aims to speed up the division operation under some magic contants?)
By the way, I also replace i32.mul to other operator, e.g., i32.sub and i32.add, all of this operator will be eliminated directly. But I don't understand why only the division operator will not be eliminated.

;; part of bad.wasm
i32.const 0
i32.const 1
i32.div_u ;; diff
drop

;; part of good.wasm
i32.const 0
i32.const 1
i32.mul ;; diff
drop

report.zip

view this post on Zulip Wasmtime GitHub notifications bot (Jun 16 2024 at 08:16):

hungryzzz edited a comment on issue #8571:

@cfallin Hi, I check the generated machine code of bad.wasm and good.wasm, and find that, in the wasm code, the result of i32.div_u or i32.mul will be dropped directly, and the machine code of good.wasm will eliminate the multiplication operation, however, the machine code of bad.wasm will still do the division operation.
Does this have anything to do with "rewrite divs by constants into multiplies by magic constants"? (In my view, this optimization aims to speed up the division operation?)
By the way, I also replace i32.mul to other operator, e.g., i32.sub and i32.add, all of this operator will be eliminated directly. But I don't understand why only the division operator will not be eliminated.

;; part of bad.wasm
i32.const 0
i32.const 1
i32.div_u ;; diff
drop

;; part of good.wasm
i32.const 0
i32.const 1
i32.mul ;; diff
drop

report.zip

view this post on Zulip Wasmtime GitHub notifications bot (Jun 16 2024 at 08:31):

hungryzzz edited a comment on issue #8571:

@cfallin Hi, I check the generated machine code of bad.wasm and good.wasm, and find that, in the wasm code, the result of i32.div_u or i32.mul will be dropped directly, and the machine code of good.wasm will eliminate the multiplication operation, however, the machine code of bad.wasm will still do the division operation.
Does this have anything to do with "rewrite divs by constants into multiplies by magic constants"? (In my view, this optimization aims to speed up the division operation?)
By the way, I also replace i32.mul to other operator, e.g., i32.sub and i32.add, all of this operator will be eliminated directly. But I don't understand why only the division operator will not be eliminated.

;; part of bad.wasm
i32.const 0
i32.const 1
i32.div_u ;; diff
drop

;; part of good.wasm
i32.const 0
i32.const 1
i32.mul ;; diff
drop
# the division operation in machine code of bad.wasm

xor %eax,%eax
mov $0x1,%esi
xor %rdx,%rdx
div %esi

report.zip

view this post on Zulip Wasmtime GitHub notifications bot (Jun 17 2024 at 01:24):

cfallin commented on issue #8571:

Yes, divides also cannot be removed currently, even if dead: they may have side-effects in general (and Cranelift's predicate for "is this instruction side-effecting" doesn't know about constant input values, so it can't reason that a divide-by-constant-1 will never reach the divide-by-zero trap case). That's why a drop'd i32.div_u still appears in codegen while the i32.mul does not.


Last updated: Dec 23 2024 at 12:05 UTC