hungryzzz added the bug label to Issue #8571.
hungryzzz opened issue #8571:
Test Case
Steps to Reproduce
I run the attached two cases(
good.wasm
&bad.wasm
) inWasmtime
andWasmEdge
, and collect their execution time respectively.Expected Results & Actual Results
Actually the only difference between the attached two cases is as follow, i.e., a multiplication operation to a division one, but the execution time of
Wasmtime
increases by more than 5 times, while the same forWasmedge
andWasmer
.![截屏2024-05-07 21 18 58](https://github.com/bytecodealliance/wasmtime/assets/32137313/a5c51594-b60f-466e-8517-48977fed9da3)
For
good.wasm
, the execution time in different runtimes are as follows:
<img width="1214" alt="截屏2024-05-07 21 32 15" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/41a41f7d-72ab-4233-81eb-2600a6cb021f">For
bad.wasm
, the execution time in different runtimes are as follows:
<img width="1191" alt="截屏2024-05-07 21 33 33" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/2f281b67-4568-40d9-adba-240d069b6fdf">Versions and Environment
Wasmtime version or commit: d0cf46a098d97bab9c4cf54a2f3aeaddea7d3831
Operating system: Linux ringzzz-OptiPlex-7070 5.15.0-97-generic
Architecture: Intel(R) Core(TM) i5-9500T CPU @ 2.20GHz
hungryzzz edited issue #8571:
Test Case
Steps to Reproduce
Hi, I run the attached two cases(
good.wasm
&bad.wasm
) inWasmtime
andWasmEdge
, and collect their execution time respectively.Expected Results & Actual Results
Actually the only difference between the attached two cases is as follow, i.e., a multiplication operation to a division one, but the execution time of
Wasmtime
increases by more than 5 times, while the same forWasmedge
andWasmer
.![截屏2024-05-07 21 18 58](https://github.com/bytecodealliance/wasmtime/assets/32137313/a5c51594-b60f-466e-8517-48977fed9da3)
For
good.wasm
, the execution time in different runtimes are as follows:
<img width="1214" alt="截屏2024-05-07 21 32 15" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/41a41f7d-72ab-4233-81eb-2600a6cb021f">For
bad.wasm
, the execution time in different runtimes are as follows:
<img width="1191" alt="截屏2024-05-07 21 33 33" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/2f281b67-4568-40d9-adba-240d069b6fdf">Versions and Environment
Wasmtime version or commit: d0cf46a098d97bab9c4cf54a2f3aeaddea7d3831
Operating system: Linux ringzzz-OptiPlex-7070 5.15.0-97-generic
Architecture: Intel(R) Core(TM) i5-9500T CPU @ 2.20GHz
hungryzzz edited issue #8571:
Test Case
Steps to Reproduce
Hi, I run the attached two cases(
good.wasm
&bad.wasm
) inWasmtime
andWasmEdge
, and collect their execution time respectively.Expected Results & Actual Results
Actually the only difference between the attached two cases is as follow, i.e., a multiplication operation to a division one, but the execution time of
Wasmtime
increases by more than 5 times, while the execution time before and after the modification is nearly the same forWasmedge
andWasmer
.![截屏2024-05-07 21 18 58](https://github.com/bytecodealliance/wasmtime/assets/32137313/a5c51594-b60f-466e-8517-48977fed9da3)
For
good.wasm
, the execution time in different runtimes are as follows:
<img width="1214" alt="截屏2024-05-07 21 32 15" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/41a41f7d-72ab-4233-81eb-2600a6cb021f">For
bad.wasm
, the execution time in different runtimes are as follows:
<img width="1191" alt="截屏2024-05-07 21 33 33" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/2f281b67-4568-40d9-adba-240d069b6fdf">Versions and Environment
Wasmtime version or commit: d0cf46a098d97bab9c4cf54a2f3aeaddea7d3831
Operating system: Linux ringzzz-OptiPlex-7070 5.15.0-97-generic
Architecture: Intel(R) Core(TM) i5-9500T CPU @ 2.20GHz
hungryzzz edited issue #8571:
Test Case
Steps to Reproduce
Hi, I run the attached two cases(
good.wasm
&bad.wasm
) inWasmtime
andWasmEdge
, and collect their execution time respectively.Expected Results & Actual Results
Actually the only difference between the attached two cases is as follow, i.e., a multiplication operation to a division one, but the execution time of
Wasmtime
increases by more than 5 times, while the execution time before and after the modification is nearly the same inWasmedge
andWasmer
.![截屏2024-05-07 21 18 58](https://github.com/bytecodealliance/wasmtime/assets/32137313/a5c51594-b60f-466e-8517-48977fed9da3)
For
good.wasm
, the execution time in different runtimes are as follows:
<img width="1214" alt="截屏2024-05-07 21 32 15" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/41a41f7d-72ab-4233-81eb-2600a6cb021f">For
bad.wasm
, the execution time in different runtimes are as follows:
<img width="1191" alt="截屏2024-05-07 21 33 33" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/2f281b67-4568-40d9-adba-240d069b6fdf">Versions and Environment
Wasmtime version or commit: d0cf46a098d97bab9c4cf54a2f3aeaddea7d3831
Operating system: Linux ringzzz-OptiPlex-7070 5.15.0-97-generic
Architecture: Intel(R) Core(TM) i5-9500T CPU @ 2.20GHz
bjorn3 commented on issue #8571:
Division is a lot more expensive than multiplication on pretty much every cpu. It may be that the other wasm engines are optimizing it away, or schedule the instructions slightly differently in a way that masks most of the latency on modern super scalar cpu's.
hungryzzz commented on issue #8571:
Yes, you're right, but the default backend of
Wasmer
is alsoCranelift
, the execution time before and after the modification inWasmer
is nearly the same. So I am wondering if it is possible that the subsequent commits introduce some bugs.<img width="1366" alt="截屏2024-05-07 22 27 01" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/650171d3-f6db-4e60-9ad1-0a0811c4d026">
cfallin closed issue #8571:
Test Case
Steps to Reproduce
Hi, I run the attached two cases(
good.wasm
&bad.wasm
) inWasmtime
andWasmEdge
, and collect their execution time respectively.Expected Results & Actual Results
Actually the only difference between the attached two cases is as follow, i.e., a multiplication operation to a division one, but the execution time of
Wasmtime
increases by more than 5 times, while the execution time before and after the modification is nearly the same inWasmedge
andWasmer
.![截屏2024-05-07 21 18 58](https://github.com/bytecodealliance/wasmtime/assets/32137313/a5c51594-b60f-466e-8517-48977fed9da3)
For
good.wasm
, the execution time in different runtimes are as follows:
<img width="1214" alt="截屏2024-05-07 21 32 15" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/41a41f7d-72ab-4233-81eb-2600a6cb021f">For
bad.wasm
, the execution time in different runtimes are as follows:
<img width="1191" alt="截屏2024-05-07 21 33 33" src="https://github.com/bytecodealliance/wasmtime/assets/32137313/2f281b67-4568-40d9-adba-240d069b6fdf">Versions and Environment
Wasmtime version or commit: d0cf46a098d97bab9c4cf54a2f3aeaddea7d3831
Operating system: Linux ringzzz-OptiPlex-7070 5.15.0-97-generic
Architecture: Intel(R) Core(TM) i5-9500T CPU @ 2.20GHz
cfallin commented on issue #8571:
@hungryzzz our current version of the mid-end optimizer can't yet rewrite divs by constants into multiplies by magic constants; we have #6049 for this. I'm going to go ahead and close this issue in favor of that as it's a duplicate report.
hungryzzz edited issue #8571:
Test Case
Steps to Reproduce
Hi, I run the attached two cases(
good.wasm
&bad.wasm
) inWasmtime
andWasmEdge
, and collect their execution time respectively.Expected Results & Actual Results
Actually the only difference between the attached two cases is as follow, i.e., a multiplication operation to a division one, but the execution time of
Wasmtime
increases by more than 5 times, while the execution time before and after the modification is nearly the same inWasmedge
andWasmer
.![截屏2024-05-07 21 18 58](https://github.com/bytecodealliance/wasmtime/assets/32137313/a5c51594-b60f-466e-8517-48977fed9da3)
For
good.wasm
, the execution time in different runtimes are as follows:/home/ringzzz/wasm_runtime/wasmer/target/release/wasmer $filename.wasm 1.76s user 0.01s system 100% cpu 1.766 total /home/ringzzz/wasm_runtime/WasmEdge/build/tools/wasmedge/wasmedge 1.18s user 0.00s system 99% cpu 1.179 total /home/ringzzz/wasm_runtime/wasmtime/target/release/wasmtime run 0.74s user 0.00s system 100% cpu 0.741 total
For
bad.wasm
, the execution time in different runtimes are as follows:/home/ringzzz/wasm_runtime/wasmer/target/release/wasmer $filename.wasm 1.77s user 0.00s system 100% cpu 1.765 total /home/ringzzz/wasm_runtime/WasmEdge/build/tools/wasmedge/wasmedge 1.18s user 0.00s system 99% cpu 1.181 total /home/ringzzz/wasm_runtime/wasmtime/target/release/wasmtime run 4.68s user 0.00s system 100% cpu 4.680 total
Versions and Environment
Wasmtime version or commit: d0cf46a098d97bab9c4cf54a2f3aeaddea7d3831
Operating system: Linux ringzzz-OptiPlex-7070 5.15.0-97-generic
Architecture: Intel(R) Core(TM) i5-9500T CPU @ 2.20GHz
hungryzzz edited a comment on issue #8571:
Yes, you're right, but the default backend of
Wasmer
is alsoCranelift
, the execution time before and after the modification inWasmer
is nearly the same. So I am wondering if it is possible that the subsequent commits introduce some bugs.➜ case ✗ time wasmer run --cranelift good.wasm ~/wasmer/target/release/wasmer run --cranelift 1.76s user 0.00s system 100% cpu 1.758 total ➜ case✗ time wasmer run --cranelift bad.wasm ~/wasmer/target/release/wasmer run --cranelift 1.75s user 0.01s system 100% cpu 1.759 total
hungryzzz edited issue #8571:
Test Case
Steps to Reproduce
Hi, I run the attached two cases(
good.wasm
&bad.wasm
) inWasmtime
andWasmEdge
, and collect their execution time respectively.Expected Results & Actual Results
Actually the only difference between the attached two cases is as follow, i.e., a multiplication operation to a division one, but the execution time of
Wasmtime
increases by more than 5 times, while the execution time before and after the modification is nearly the same inWasmedge
andWasmer
.![截屏2024-05-07 21 18 58](https://github.com/bytecodealliance/wasmtime/assets/32137313/a5c51594-b60f-466e-8517-48977fed9da3)
For
good.wasm
, the execution time in different runtimes are as follows:~/wasmer/target/release/wasmer $filename.wasm 1.76s user 0.01s system 100% cpu 1.766 total ~/WasmEdge/build/tools/wasmedge/wasmedge 1.18s user 0.00s system 99% cpu 1.179 total ~/wasmtime/target/release/wasmtime run 0.74s user 0.00s system 100% cpu 0.741 total
For
bad.wasm
, the execution time in different runtimes are as follows:~/wasmer/target/release/wasmer $filename.wasm 1.77s user 0.00s system 100% cpu 1.765 total ~/WasmEdge/build/tools/wasmedge/wasmedge 1.18s user 0.00s system 99% cpu 1.181 total ~/wasmtime/target/release/wasmtime run 4.68s user 0.00s system 100% cpu 4.680 total
Versions and Environment
Wasmtime version or commit: d0cf46a098d97bab9c4cf54a2f3aeaddea7d3831
Operating system: Linux ringzzz-OptiPlex-7070 5.15.0-97-generic
Architecture: Intel(R) Core(TM) i5-9500T CPU @ 2.20GHz
hungryzzz commented on issue #8571:
@cfallin Hi, I check the generated machine code of
bad.wasm
andgood.wasm
, and find that, in the wasm code, the result ofi32.div_u
ori32.mul
will be dropped directly, and the machine code ofgood.wasm
will eliminate the multiplication operation, however, the machine code ofbad.wasm
will still do the division operation.
Does this have anything to do with "rewrite divs by constants into multiplies by magic constants"? (In my view, this optimization aims to speed up the division operation under some magic contants?)
By the way, I also replacei32.mul
to other operator, e.g.,i32.sub
andi32.add
, all of this operator will be eliminated directly. But I don't understand why only the division operator will not be eliminated.;; part of bad.wasm i32.const 0 i32.const 1 i32.div_u ;; diff drop ;; part of good.wasm i32.const 0 i32.const 1 i32.mul ;; diff drop
hungryzzz edited a comment on issue #8571:
@cfallin Hi, I check the generated machine code of
bad.wasm
andgood.wasm
, and find that, in the wasm code, the result ofi32.div_u
ori32.mul
will be dropped directly, and the machine code ofgood.wasm
will eliminate the multiplication operation, however, the machine code ofbad.wasm
will still do the division operation.
Does this have anything to do with "rewrite divs by constants into multiplies by magic constants"? (In my view, this optimization aims to speed up the division operation?)
By the way, I also replacei32.mul
to other operator, e.g.,i32.sub
andi32.add
, all of this operator will be eliminated directly. But I don't understand why only the division operator will not be eliminated.;; part of bad.wasm i32.const 0 i32.const 1 i32.div_u ;; diff drop ;; part of good.wasm i32.const 0 i32.const 1 i32.mul ;; diff drop
hungryzzz edited a comment on issue #8571:
@cfallin Hi, I check the generated machine code of
bad.wasm
andgood.wasm
, and find that, in the wasm code, the result ofi32.div_u
ori32.mul
will be dropped directly, and the machine code ofgood.wasm
will eliminate the multiplication operation, however, the machine code ofbad.wasm
will still do the division operation.
Does this have anything to do with "rewrite divs by constants into multiplies by magic constants"? (In my view, this optimization aims to speed up the division operation?)
By the way, I also replacei32.mul
to other operator, e.g.,i32.sub
andi32.add
, all of this operator will be eliminated directly. But I don't understand why only the division operator will not be eliminated.;; part of bad.wasm i32.const 0 i32.const 1 i32.div_u ;; diff drop ;; part of good.wasm i32.const 0 i32.const 1 i32.mul ;; diff drop
# the division operation in machine code of bad.wasm xor %eax,%eax mov $0x1,%esi xor %rdx,%rdx div %esi
cfallin commented on issue #8571:
Yes, divides also cannot be removed currently, even if dead: they may have side-effects in general (and Cranelift's predicate for "is this instruction side-effecting" doesn't know about constant input values, so it can't reason that a divide-by-constant-1 will never reach the divide-by-zero trap case). That's why a
drop
'di32.div_u
still appears in codegen while thei32.mul
does not.
Last updated: Nov 22 2024 at 16:03 UTC