liutao-liu added the bug label to Issue #7973.
liutao-liu opened issue #7973:
Test Case
test.c
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> int main() { FILE *fp; char str[14]; for (int i = 1; i < 1000; i++) { int fd = open("test.txt", O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR); for (int j = 0; j < 1000; j++) { lseek(fd, i * 10 + j * 13, SEEK_SET); write(fd, "hello world! ", 13); } fsync(fd); close(fd); } return 0; }
Steps to Reproduce
- first,compile test.c in the preceding test case into WASM bytecode using the WASI SDK
wasi-sdk/bin/clang -O3 test.c -o test.wasm
- second, WASMTIME AOT compile and generate machine code to obtain test.aot.
wasmtime compile -W simd,relaxed-simd test.wasm -o test.aot
- third, Test Case Running Duration. It takes about 40 seconds.
time wasmtime run --allow-precompiled --dir ./ test.aot
Expected Results
wasmtime takes about the same time as native and wamr.
Actual Results
Wasmtime takes about 40 seconds.
The same test.c, native or wamr only takes about 2 seconds.Versions and Environment
Wasmtime version :16.0.0
Operating system: ubuntu 20.04
Architecture: aarch64 (same as x86 for this case)
Extra Info
Profile
# profile for wasmtime perf record -g -k mono wasmtime run --profile=jitdump --allow-precompiled --dir ./ test.aot sudo perf inject --jit --input perf.data --output perf.jit.data perf report -i perf.jit.data
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/082e97b5-3a03-40bf-87fa-9041de881571)
# profile for native perf record -g -k mono ./test perf report
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/995f3fb1-164d-4b6c-93ba-f7c7d71b67b2)
System Call Times Statistics
As shown in the following figure, the number of wasmtime system call times is three times that of native.
Is it because wasmtime uses tokio for file IO operations, and the number of file I/O operations is three times that of native, resulting in poor performance?
```strace for wasmtime
strace -c wasmtime run --allow-precompiled --dir ./ test.aot
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/b964aa40-e4ae-4378-a01d-3a604f8b1447) ``` # strace for native( ths same as wamr ) strace -c ./test
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/c351808f-f8cc-4b28-8a74-c94f600242f2)
liutao-liu edited issue #7973:
Test Case
test.c
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> int main() { FILE *fp; char str[14]; for (int i = 1; i < 1000; i++) { int fd = open("test.txt", O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR); for (int j = 0; j < 1000; j++) { lseek(fd, i * 10 + j * 13, SEEK_SET); write(fd, "hello world! ", 13); } fsync(fd); close(fd); } return 0; }
Steps to Reproduce
- first,compile test.c in the preceding test case into WASM bytecode using the WASI SDK
wasi-sdk/bin/clang -O3 test.c -o test.wasm
- second, WASMTIME AOT compile and generate machine code to obtain test.aot.
wasmtime compile -W simd,relaxed-simd test.wasm -o test.aot
- third, Test Case Running Duration. It takes about 40 seconds.
time wasmtime run --allow-precompiled --dir ./ test.aot
Expected Results
wasmtime takes about the same time as native and wamr.
Actual Results
Wasmtime takes about 40 seconds.
The same test.c, native or wamr only takes about 2 seconds.Versions and Environment
Wasmtime version :16.0.0
Operating system: ubuntu 20.04
Architecture: aarch64 (same as x86 for this case)
Extra Info
Profile
# profile for wasmtime perf record -g -k mono wasmtime run --profile=jitdump --allow-precompiled --dir ./ test.aot sudo perf inject --jit --input perf.data --output perf.jit.data perf report -i perf.jit.data
As shown in the following figure, most performance hotspots are on Tokio. This is because wasmtime uses Tokio to implement the file I/O interface, involving:
__imported_wasi_snapshot_preview1_fd_read __imported_wasi_snapshot_preview1_fd_seek __imported_wasi_snapshot_preview1_fd_sync __imported_wasi_snapshot_preview1_fd_write
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/082e97b5-3a03-40bf-87fa-9041de881571)
# profile for native perf record -g -k mono ./test perf report
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/995f3fb1-164d-4b6c-93ba-f7c7d71b67b2)
System Call Times Statistics
As shown in the following figure, the number of wasmtime system call times is three times that of native.
Is it because wasmtime uses tokio for file IO operations, and the number of file I/O operations is three times that of native, resulting in poor performance?
```strace for wasmtime
strace -c wasmtime run --allow-precompiled --dir ./ test.aot
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/b964aa40-e4ae-4378-a01d-3a604f8b1447) ``` # strace for native( ths same as wamr ) strace -c ./test
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/c351808f-f8cc-4b28-8a74-c94f600242f2)
liutao-liu edited issue #7973:
Test Case
test.c
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> int main() { FILE *fp; char str[14]; for (int i = 1; i < 1000; i++) { int fd = open("test.txt", O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR); for (int j = 0; j < 1000; j++) { lseek(fd, i * 10 + j * 13, SEEK_SET); write(fd, "hello world! ", 13); } fsync(fd); close(fd); } return 0; }
Steps to Reproduce
- first,compile test.c in the preceding test case into WASM bytecode using the WASI SDK
wasi-sdk/bin/clang -O3 test.c -o test.wasm
- second, WASMTIME AOT compile and generate machine code to obtain test.aot.
wasmtime compile -W simd,relaxed-simd test.wasm -o test.aot
- third, Test Case Running Duration. It takes about 40 seconds.
time wasmtime run --allow-precompiled --dir ./ test.aot
Expected Results
wasmtime takes about the same time as native and wamr.
Actual Results
Wasmtime takes about 40 seconds.
The same test.c, native or wamr only takes about 2 seconds.Versions and Environment
Wasmtime version :16.0.0
Operating system: ubuntu 20.04
Architecture: aarch64 (same as x86 for this case)
Extra Info
Profile
# profile for wasmtime perf record -g -k mono wasmtime run --profile=jitdump --allow-precompiled --dir ./ test.aot sudo perf inject --jit --input perf.data --output perf.jit.data perf report -i perf.jit.data
As shown in the following figure, most performance hotspots are on Tokio. This is because wasmtime uses Tokio to implement the file I/O interface, involving:
__imported_wasi_snapshot_preview1_fd_read __imported_wasi_snapshot_preview1_fd_seek __imported_wasi_snapshot_preview1_fd_sync __imported_wasi_snapshot_preview1_fd_write ``` ![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/082e97b5-3a03-40bf-87fa-9041de881571)
profile for native
perf record -g -k mono ./test
perf report![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/995f3fb1-164d-4b6c-93ba-f7c7d71b67b2) #### System Call Times Statistics As shown in the following figure, the number of wasmtime system call times is three times that of native. Is it because wasmtime uses **tokio** for file IO operations, and the number of file I/O operations is three times that of native, resulting in poor performance? ``` # strace for wasmtime strace -c wasmtime run --allow-precompiled --dir ./ test.aot
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/b964aa40-e4ae-4378-a01d-3a604f8b1447)
```
strace for native( ths same as wamr )
strace -c ./test
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/c351808f-f8cc-4b28-8a74-c94f600242f2) ~~~
liutao-liu edited issue #7973:
Test Case
test.c
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> int main() { FILE *fp; char str[14]; for (int i = 1; i < 1000; i++) { int fd = open("test.txt", O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR); for (int j = 0; j < 1000; j++) { lseek(fd, i * 10 + j * 13, SEEK_SET); write(fd, "hello world! ", 13); } fsync(fd); close(fd); } return 0; }
Steps to Reproduce
- first,compile test.c in the preceding test case into WASM bytecode using the WASI SDK
wasi-sdk/bin/clang -O3 test.c -o test.wasm
- second, WASMTIME AOT compile and generate machine code to obtain test.aot.
wasmtime compile -W simd,relaxed-simd test.wasm -o test.aot
- third, Test Case Running Duration. It takes about 40 seconds.
time wasmtime run --allow-precompiled --dir ./ test.aot
Expected Results
wasmtime takes about the same time as native and wamr.
Actual Results
Wasmtime takes about 40 seconds.
The same test.c, native or wamr only takes about 2 seconds.Versions and Environment
Wasmtime version :16.0.0
Operating system: ubuntu 20.04
Architecture: aarch64 (same as x86 for this case)
Extra Info
Profile
# profile for wasmtime perf record -g -k mono wasmtime run --profile=jitdump --allow-precompiled --dir ./ test.aot sudo perf inject --jit --input perf.data --output perf.jit.data perf report -i perf.jit.data
As shown in the following figure, most performance hotspots are on Tokio. This is because wasmtime uses Tokio to implement the file I/O interface, involving:
__imported_wasi_snapshot_preview1_fd_read __imported_wasi_snapshot_preview1_fd_seek __imported_wasi_snapshot_preview1_fd_sync __imported_wasi_snapshot_preview1_fd_write ``` ![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/082e97b5-3a03-40bf-87fa-9041de881571)
profile for native
perf record -g -k mono ./test
perf report![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/995f3fb1-164d-4b6c-93ba-f7c7d71b67b2) #### System Call Times Statistics As shown in the following figure, the number of wasmtime system call times is three times that of native. Is it because wasmtime uses **tokio** for file IO operations, and the number of file I/O operations is three times that of native, resulting in poor performance? ``` # strace for wasmtime strace -c wasmtime run --allow-precompiled --dir ./ test.aot
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/b964aa40-e4ae-4378-a01d-3a604f8b1447)
```
strace for native( ths same as wamr )
strace -c ./test
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/c351808f-f8cc-4b28-8a74-c94f600242f2) **Why do we use Tokio to implement file I/O? Have we considered performance?** ~~~
Last updated: Dec 23 2024 at 13:07 UTC