liutao-liu edited issue #7973:
Test Case
test.c
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> int main() { FILE *fp; char str[14]; for (int i = 1; i < 1000; i++) { int fd = open("test.txt", O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR); for (int j = 0; j < 1000; j++) { lseek(fd, i * 10 + j * 13, SEEK_SET); write(fd, "hello world! ", 13); } fsync(fd); close(fd); } return 0; }
Steps to Reproduce
- first,compile test.c in the preceding test case into WASM bytecode using the WASI SDK
wasi-sdk/bin/clang -O3 test.c -o test.wasm
- second, WASMTIME AOT compile and generate machine code to obtain test.aot.
wasmtime compile -W simd,relaxed-simd test.wasm -o test.aot
- third, Test Case Running Duration. It takes about 40 seconds.
time wasmtime run --allow-precompiled --dir ./ test.aot
Expected Results
wasmtime takes about the same time as native and wamr.
Actual Results
Wasmtime takes about 40 seconds.
The same test.c, native or wamr only takes about 2 seconds.Versions and Environment
Wasmtime version :16.0.0
Operating system: ubuntu 20.04
Architecture: aarch64 (same as x86 for this case)
Extra Info
Profile
# profile for wasmtime perf record -g -k mono wasmtime run --profile=jitdump --allow-precompiled --dir ./ test.aot sudo perf inject --jit --input perf.data --output perf.jit.data perf report -i perf.jit.data
As shown in the following figure, most performance hotspots are on Tokio. This is because wasmtime uses Tokio to implement the file I/O interface, involving:
__imported_wasi_snapshot_preview1_fd_read __imported_wasi_snapshot_preview1_fd_seek __imported_wasi_snapshot_preview1_fd_sync __imported_wasi_snapshot_preview1_fd_write ``` ![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/082e97b5-3a03-40bf-87fa-9041de881571)
profile for native
perf record -g -k mono ./test
perf report![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/995f3fb1-164d-4b6c-93ba-f7c7d71b67b2) #### System Call Times Statistics As shown in the following figure, the number of wasmtime system call times is three times that of native. Is it because wasmtime uses **tokio** for file IO operations, and the number of file I/O operations is three times that of native, resulting in poor performance? ``` # strace for wasmtime strace -c wasmtime run --allow-precompiled --dir ./ test.aot
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/b964aa40-e4ae-4378-a01d-3a604f8b1447)
```
strace for native( ths same as wamr )
strace -c ./test
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/c351808f-f8cc-4b28-8a74-c94f600242f2) **Why do we use Tokio to implement file I/O? Have we considered performance?** ~~~
liutao-liu edited issue #7973:
Test Case
test.c
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> int main() { FILE *fp; char str[14]; for (int i = 1; i < 1000; i++) { int fd = open("test.txt", O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR); for (int j = 0; j < 1000; j++) { lseek(fd, i * 10 + j * 13, SEEK_SET); write(fd, "hello world! ", 13); } fsync(fd); close(fd); } return 0; }
Steps to Reproduce
- first,compile test.c in the preceding test case into WASM bytecode using the WASI SDK
wasi-sdk/bin/clang -O3 test.c -o test.wasm
- second, WASMTIME AOT compile ,got test.aot.
wasmtime compile -W simd,relaxed-simd test.wasm -o test.aot
- third, Test Case Running Duration. It takes about 40 seconds.
time wasmtime run --allow-precompiled --dir ./ test.aot
Expected Results
wasmtime takes about the same time as native and wamr.
Actual Results
Wasmtime takes about 40 seconds.
The same test.c, native or wamr only takes about 2 seconds.Versions and Environment
Wasmtime version :16.0.0
Operating system: ubuntu 20.04
Architecture: aarch64 (same as x86 for this case)
Extra Info
Profile
# profile for wasmtime perf record -g -k mono wasmtime run --profile=jitdump --allow-precompiled --dir ./ test.aot sudo perf inject --jit --input perf.data --output perf.jit.data perf report -i perf.jit.data
As shown in the following figure, most performance hotspots are on Tokio. This is because wasmtime uses Tokio to implement the file I/O interface, involving:
__imported_wasi_snapshot_preview1_fd_read __imported_wasi_snapshot_preview1_fd_seek __imported_wasi_snapshot_preview1_fd_sync __imported_wasi_snapshot_preview1_fd_write ``` ![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/082e97b5-3a03-40bf-87fa-9041de881571)
profile for native
perf record -g -k mono ./test
perf report![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/995f3fb1-164d-4b6c-93ba-f7c7d71b67b2) #### System Call Times Statistics As shown in the following figure, the number of wasmtime system call times is three times that of native. Is it because wasmtime uses **tokio** for file IO operations, and the number of file I/O operations is three times that of native, resulting in poor performance? ``` # strace for wasmtime strace -c wasmtime run --allow-precompiled --dir ./ test.aot
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/b964aa40-e4ae-4378-a01d-3a604f8b1447)
```
strace for native( ths same as wamr )
strace -c ./test
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/c351808f-f8cc-4b28-8a74-c94f600242f2) **Why do we use Tokio to implement file I/O? Have we considered performance?** ~~~
liutao-liu edited issue #7973:
Test Case
test.c
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> int main() { FILE *fp; char str[14]; for (int i = 1; i < 1000; i++) { int fd = open("test.txt", O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR); for (int j = 0; j < 1000; j++) { lseek(fd, i * 10 + j * 13, SEEK_SET); write(fd, "hello world! ", 13); } fsync(fd); close(fd); } return 0; }
Steps to Reproduce
- first,compile test.c in the preceding test case into WASM bytecode using the WASI SDK
wasi-sdk/bin/clang -O3 test.c -o test.wasm
- second, WASMTIME AOT compile ,got test.aot.
wasmtime compile -W simd,relaxed-simd test.wasm -o test.aot
- third, Test Case Running Duration. It takes about 40 seconds.
time wasmtime run --allow-precompiled --dir ./ test.aot
Expected Results
wasmtime takes about the same time as native and wamr.
Actual Results
Wasmtime takes about 40 seconds.
The same test.c, native or wamr only takes about 2 seconds.Versions and Environment
Wasmtime version :16.0.0
Operating system: ubuntu 20.04
Architecture: aarch64 (same as x86 for this case)
Extra Info
Profile
# profile for wasmtime perf record -g -k mono wasmtime run --profile=jitdump --allow-precompiled --dir ./ test.aot sudo perf inject --jit --input perf.data --output perf.jit.data perf report -i perf.jit.data
As shown in the following figure, most performance hotspots are on Tokio. This is because wasmtime uses Tokio to implement the file I/O interface, involving:
__imported_wasi_snapshot_preview1_fd_read __imported_wasi_snapshot_preview1_fd_seek __imported_wasi_snapshot_preview1_fd_sync __imported_wasi_snapshot_preview1_fd_write ``` ![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/082e97b5-3a03-40bf-87fa-9041de881571)
profile for native
perf record -g -k mono ./test
perf report![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/995f3fb1-164d-4b6c-93ba-f7c7d71b67b2) #### System Call Times Statistics As shown in the following figure, the number of wasmtime system call times is three times that of native. Is it because wasmtime uses **tokio** to implement file IO operations, and the number of file I/O operations is three times that of native, resulting in poor performance? ``` # strace for wasmtime strace -c wasmtime run --allow-precompiled --dir ./ test.aot
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/b964aa40-e4ae-4378-a01d-3a604f8b1447)
```
strace for native( ths same as wamr )
strace -c ./test
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/c351808f-f8cc-4b28-8a74-c94f600242f2) **Why do we use Tokio to implement file I/O? Have we considered performance?** ~~~
pchickey commented on issue #7973:
Thanks for this detailed report!
The short answer is, haven't benchmarked performance of file IO yet, and right off the top of my head we have a couple optimization ideas we haven't explored because we were just trying to get things out the door.
I will dig in deeper and see if we can come up with some improvements here.
pchickey edited a comment on issue #7973:
Thanks for this detailed report!
The short answer is, we haven't benchmarked performance of file IO yet, and right off the top of my head we have a couple optimization ideas we haven't explored because we were just trying to get things out the door.
I will dig in deeper and see if we can come up with some improvements here.
liutao-liu commented on issue #7973:
Thank you for your prompt response.
Besides tokio, does wasmtime have any other I/O solutions?
liutao-liu edited a comment on issue #7973:
Thank you for your prompt response.
Besides tokio, does wasmtime have any other I/O solutions?Or is there any configuration on wasmtime that can improve tokio's io performance a bit?
pchickey commented on issue #7973:
Those are great questions. This is a long answer but this is a pretty involved topic, so please excuse this wall of text, and let me know if there is anything about this I should explain better.
We chose to use
tokio
forwasmtime-wasi
's implementation partly because for many production embedders (i.e. not users of thewasmtime-cli
binary, but Fastly and Fermyon and Cosmonic and etc who use thewasmtime
crate directly in their server software), wasmtime and WASI need to be implemented on top of an async runtime, and tokio is the async of choice for all of our known embedders.Another significant factor is that
wasmtime-wasi-http
is built on an HTTP implementation and needs to integrate with thepollable
s andstream
s provided bywasmtime-wasi
. For production HTTP stacks in Rust,hyper
, which is built ontokio
, is essentially the only option. A significant fraction of all internet traffic passes through hyper today - I'm not aware of any credible alternatives.In order to fit in with the same
pollable
andstream
resources as wasi-sockets, wasi-http, and etc, wasi-filesystem is also implemented on top of tokio. Because Linux doesn't provide a non-blocking way to do File IO (except forio_uring
, which afaik no production-ready Rust systems are using yet, though hopefully they will soon), the correct way to do File IO in tokio is to move that work onto a special blocking IO thread-pool managed by tokio, which costs two synchronizations between threads per blocking operation for the executor to hand the work off from an async task, and to get notified of completion to resume the async task - that our hypothesis for where thefutex
andepoll_pwait
syscalls are coming from.I don't actually know where the extra
write
syscalls are coming from, but I haven't had time to dig in deeper yet. There may be an obvious low-hanging fruit there? If we could it cut down to just onewrite
per WASI fileblocking-write-and-flush
that would be a nice win.Finally, why does wasmtime-cli, which is a totally synchronous Rust program, use
tokio
under the hood to implement the WASI interface? Basically, because we only had time to implementwasmtime-wasi
once. When you usewasmtime-wasi
with a synchronous wasmtime (i.e.wasmtime::Config::async_support(false)
, which is the default),wasmtime-wasi
provides theLinker
with a shim that lazily creates a privatetokio
runtime, and calls the async Rust implementations of all thepollable
andstream
bits (and various otherwasi-filesystem
operations that turn into blocking-syscalls) underneath that private tokio.There is no configuration available to change whether blocking file IO is moved to separate thread or not - its very fundamental to how tokio and the wasmtime-wasi implementation work. The only way to change that behavior would be to rewrite wasmtime-wasi with a completely different internal architecture, to solely use synchronous Rust.
A rewrite of
wasmtime-wasi
for the synchronous register of Rust would take a significant amount of time to write and maintain, and it would be challenging to integrate with the rest of the ecosystem - it would essentially cleave the wasmtime ecosystem in two. For example,wasmtime-wasi-http
, or whatever other crates folks write beyond this tree that integrate withwasmtime-wasi
- and thats virtually every interface that does any sort of IO, since streams and pollables are so fundamental - currently use wasmtime-wasi'spub trait Subscribe
,pub trait
HostInputStream, and
pub trait HostOutputStream, which all use async Rust functions and assume a tokio executor. We don't expect that problem to change anytime soon, and in fact we may expect it to become even more deeply integrated with
wasmtime` as the Component Model works on having native async as part of the next big effort.To give historical context: I designed
wasi-common
so that tokio was an optional dependency, and we were able to get away with it during WASI Preview 1 because WASI didnt yet have streams or an extensible poll interface. Our experience embeddingwasi-common
in various contexts where we needed to wait on e.g. HTTP request readiness, or treat HTTP bodies as a stream, highlighted that the design of WASI Preview 1'spoll_oneoff
was not the right design for composable systems. So, we changed the design of WASI itself in Preview 2 to have apollable
resource that can be created by any interface that needs to express waiting on readiness, and with that we more or less got forced into using async Rust forasync trait Subscribe
as the host implementation of eachpollable
.We may be able to find a way, for synchronous embeddings, to break some of tokio's rules about performing blocking syscalls on the "main thread", because that operation should only affect wasmtime-wasi and other crates that build on top of it. If we could break those rules, we could provide a faster path to perform some blocking file IO operations (likely
input-stream.blocking-read
andoutput-stream.blocking-write-and-flush
only) without the cost of context switches. However, thats just a guess we have, and I am apprehensive to break tokio's rules because the consequences of that for other crates (like wasmtime-wasi-http, as well as whatever other code that others implement outside this tree) could both be pretty severe, and difficult to understand or document. We will see if we can investigate that more, but to be totally honest its not at the top of my priorities right now. If you can help us understand the business case for improving performance, that may help us move it up in priority.
pchickey edited a comment on issue #7973:
Those are great questions. This is a long answer but this is a pretty involved topic, so please excuse this wall of text, and let me know if there is anything about this I should explain better.
We chose to use
tokio
forwasmtime-wasi
's implementation partly because for many production embedders (i.e. not users of thewasmtime-cli
binary, but Fastly and Fermyon and Cosmonic and etc who use thewasmtime
crate directly in their server software), wasmtime and WASI need to be implemented on top of an async runtime, and tokio is the async ecosystem of choice for all of our known embedders.Another significant factor is that
wasmtime-wasi-http
is built on an HTTP implementation and needs to integrate with thepollable
s andstream
s provided bywasmtime-wasi
. For production HTTP stacks in Rust,hyper
, which is built ontokio
, is essentially the only option. A significant fraction of all internet traffic passes through hyper today - I'm not aware of any credible alternatives.In order to fit in with the same
pollable
andstream
resources as wasi-sockets, wasi-http, and etc, wasi-filesystem is also implemented on top of tokio. Because Linux doesn't provide a non-blocking way to do File IO (except forio_uring
, which afaik no production-ready Rust systems are using yet, though hopefully they will soon), the correct way to do File IO in tokio is to move that work onto a special blocking IO thread-pool managed by tokio, which costs two synchronizations between threads per blocking operation for the executor to hand the work off from an async task, and to get notified of completion to resume the async task - that our hypothesis for where thefutex
andepoll_pwait
syscalls are coming from.I don't actually know where the extra
write
syscalls are coming from, but I haven't had time to dig in deeper yet. There may be an obvious low-hanging fruit there? If we could it cut down to just onewrite
per WASI fileblocking-write-and-flush
that would be a nice win.Finally, why does wasmtime-cli, which is a totally synchronous Rust program, use
tokio
under the hood to implement the WASI interface? Basically, because we only had time to implementwasmtime-wasi
once. When you usewasmtime-wasi
with a synchronous wasmtime (i.e.wasmtime::Config::async_support(false)
, which is the default),wasmtime-wasi
provides theLinker
with a shim that lazily creates a privatetokio
runtime, and calls the async Rust implementations of all thepollable
andstream
bits (and various otherwasi-filesystem
operations that turn into blocking-syscalls) underneath that private tokio.There is no configuration available to change whether blocking file IO is moved to separate thread or not - its very fundamental to how tokio and the wasmtime-wasi implementation work. The only way to change that behavior would be to rewrite wasmtime-wasi with a completely different internal architecture, to solely use synchronous Rust.
A rewrite of
wasmtime-wasi
for the synchronous register of Rust would take a significant amount of time to write and maintain, and it would be challenging to integrate with the rest of the ecosystem - it would essentially cleave the wasmtime ecosystem in two. For example,wasmtime-wasi-http
, or whatever other crates folks write beyond this tree that integrate withwasmtime-wasi
- and thats virtually every interface that does any sort of IO, since streams and pollables are so fundamental - currently use wasmtime-wasi'spub trait Subscribe
,pub trait
HostInputStream, and
pub trait HostOutputStream, which all use async Rust functions and assume a tokio executor. We don't expect that problem to change anytime soon, and in fact we may expect it to become even more deeply integrated with
wasmtime` as the Component Model works on having native async as part of the next big effort.To give historical context: I designed
wasi-common
so that tokio was an optional dependency, and we were able to get away with it during WASI Preview 1 because WASI didnt yet have streams or an extensible poll interface. Our experience embeddingwasi-common
in various contexts where we needed to wait on e.g. HTTP request readiness, or treat HTTP bodies as a stream, highlighted that the design of WASI Preview 1'spoll_oneoff
was not the right design for composable systems. So, we changed the design of WASI itself in Preview 2 to have apollable
resource that can be created by any interface that needs to express waiting on readiness, and with that we more or less got forced into using async Rust forasync trait Subscribe
as the host implementation of eachpollable
.We may be able to find a way, for synchronous embeddings, to break some of tokio's rules about performing blocking syscalls on the "main thread", because that operation should only affect wasmtime-wasi and other crates that build on top of it. If we could break those rules, we could provide a faster path to perform some blocking file IO operations (likely
input-stream.blocking-read
andoutput-stream.blocking-write-and-flush
only) without the cost of context switches. However, thats just a guess we have, and I am apprehensive to break tokio's rules because the consequences of that for other crates (like wasmtime-wasi-http, as well as whatever other code that others implement outside this tree) could both be pretty severe, and difficult to understand or document. We will see if we can investigate that more, but to be totally honest its not at the top of my priorities right now. If you can help us understand the business case for improving performance, that may help us move it up in priority.
pchickey edited a comment on issue #7973:
Those are great questions. This is a long answer but this is a pretty involved topic, so please excuse this wall of text, and let me know if there is anything about this I should explain better.
We chose to use
tokio
forwasmtime-wasi
's implementation partly because for many production embedders (i.e. not users of thewasmtime-cli
binary, but Fastly and Fermyon and Cosmonic and etc who use thewasmtime
crate directly in their server software), wasmtime and WASI need to be implemented on top of an async runtime, and tokio is the async ecosystem of choice for all of our known embedders.Another significant factor is that
wasmtime-wasi-http
is built on an HTTP implementation and needs to integrate with thepollable
s andstream
s provided bywasmtime-wasi
. For production HTTP stacks in Rust,hyper
, which is built ontokio
, is essentially the only option. A significant fraction of all internet traffic passes through hyper today - I'm not aware of any credible alternatives.In order to fit in with the same
pollable
andstream
resources as wasi-sockets, wasi-http, and etc, wasi-filesystem is also implemented on top of tokio. Because Linux doesn't provide a non-blocking way to do File IO (except forio_uring
, which afaik no production-ready Rust systems are using yet, though hopefully they will soon), the correct way to do File IO in tokio is to move that work onto a special blocking IO thread-pool managed by tokio, which costs two synchronizations between threads per blocking operation for the executor to hand the work off from an async task, and to get notified of completion to resume the async task - that our hypothesis for where thefutex
andepoll_pwait
syscalls are coming from.I don't actually know where the extra
write
syscalls are coming from, but I haven't had time to dig in deeper yet. There may be an obvious low-hanging fruit there? If we could it cut down to just onewrite
per WASI fileblocking-write-and-flush
that would be a nice win.Finally, why does wasmtime-cli, which is a totally synchronous Rust program, use
tokio
under the hood to implement the WASI interface? Basically, because we only had time to implementwasmtime-wasi
once. When you usewasmtime-wasi
with a synchronous wasmtime (i.e.wasmtime::Config::async_support(false)
, which is the default),wasmtime-wasi
provides theLinker
with a shim that lazily creates a privatetokio
runtime, and calls the async Rust implementations of all thepollable
andstream
bits (and various otherwasi-filesystem
operations that turn into blocking-syscalls) underneath that private tokio.There is no configuration available to change whether blocking file IO is moved to separate thread or not - its very fundamental to how tokio and the wasmtime-wasi implementation work. The only way to change that behavior would be to rewrite wasmtime-wasi with a completely different internal architecture, to solely use synchronous Rust.
A rewrite of
wasmtime-wasi
for the synchronous register of Rust would take a significant amount of time to write and maintain, and it would be challenging to integrate with the rest of the ecosystem - it would essentially cleave the wasmtime ecosystem in two. For example,wasmtime-wasi-http
, or whatever other crates folks write beyond this tree that integrate withwasmtime-wasi
- and thats virtually every interface that does any sort of IO, since streams and pollables are so fundamental - currently use wasmtime-wasi'spub trait Subscribe
,pub trait HostInputStream
, andpub trait HostOutputStream
, which all use async Rust functions and assume a tokio executor. We don't expect that problem to change anytime soon, and in fact we may expect it to become even more deeply integrated withwasmtime
as the Component Model works on having native async as part of the next big effort.To give historical context: I designed
wasi-common
so that tokio was an optional dependency, and we were able to get away with it during WASI Preview 1 because WASI didnt yet have streams or an extensible poll interface. Our experience embeddingwasi-common
in various contexts where we needed to wait on e.g. HTTP request readiness, or treat HTTP bodies as a stream, highlighted that the design of WASI Preview 1'spoll_oneoff
was not the right design for composable systems. So, we changed the design of WASI itself in Preview 2 to have apollable
resource that can be created by any interface that needs to express waiting on readiness, and with that we more or less got forced into using async Rust forasync trait Subscribe
as the host implementation of eachpollable
.We may be able to find a way, for synchronous embeddings, to break some of tokio's rules about performing blocking syscalls on the "main thread", because that operation should only affect wasmtime-wasi and other crates that build on top of it. If we could break those rules, we could provide a faster path to perform some blocking file IO operations (likely
input-stream.blocking-read
andoutput-stream.blocking-write-and-flush
only) without the cost of context switches. However, thats just a guess we have, and I am apprehensive to break tokio's rules because the consequences of that for other crates (like wasmtime-wasi-http, as well as whatever other code that others implement outside this tree) could both be pretty severe, and difficult to understand or document. We will see if we can investigate that more, but to be totally honest its not at the top of my priorities right now. If you can help us understand the business case for improving performance, that may help us move it up in priority.
liutao-liu commented on issue #7973:
Thank you very much for explaining the reasons for choosing tokio in such detail.
Currently, I am using FlashDB as my database. FlashDB two file APIs: LIBC file API (like fopen/fread/fwrte/fclose) and Posix file API (like open/read/write/close). After being compiled by the wasi-sdk, the two types of file APIs invoke the same type of interface: __imported_wasi_snapshot_preview1_fd_read.
__imported_wasi_snapshot_preview1_fd_seek
__imported_wasi_snapshot_preview1_fd_sync
__imported_wasi_snapshot_preview1_fd_write...and so on, please.For the FlashDB scenario, I now expect the file IO performance of wasmtime to be close to native, so I plan to try to modify the implementation of wasmtime's file IO, and directly use rust's file IO instead of using tokio. This is just a try, of course, very much look forward to your suggestions for changes.
liutao-liu edited a comment on issue #7973:
Thank you very much for explaining the reasons for choosing tokio in such detail.
Currently, I am using FlashDB as my database. FlashDB has two types of file APIs: LIBC file API (like fopen/fread/fwrte/fclose) and Posix file API (like open/read/write/close). After being compiled by the wasi-sdk, the two types of file APIs invoke the same type of interface: __imported_wasi_snapshot_preview1_fd_read.
__imported_wasi_snapshot_preview1_fd_seek
__imported_wasi_snapshot_preview1_fd_sync
__imported_wasi_snapshot_preview1_fd_write...and so on, please.For the FlashDB scenario, I now expect the file IO performance of wasmtime to be close to native, so I plan to try to modify the implementation of wasmtime's file IO, and directly use rust's file IO instead of using tokio. This is just a try, of course, very much look forward to your suggestions for changes.
liutao-liu edited a comment on issue #7973:
Thank you very much for explaining the reasons for choosing tokio in such detail.
Currently, I am using FlashDB as my database. FlashDB has two types of file APIs: LIBC file API (like fopen/fread/fwrte/fclose) and Posix file API (like open/read/write/close). After being compiled by the wasi-sdk, the two types of file APIs invoke the same type of interface:
__imported_wasi_snapshot_preview1_fd_read
__imported_wasi_snapshot_preview1_fd_seek
__imported_wasi_snapshot_preview1_fd_sync
__imported_wasi_snapshot_preview1_fd_write...and so on, please.For the FlashDB scenario, I now expect the file IO performance of wasmtime to be close to native, so I plan to try to modify the implementation of wasmtime's file IO, and directly use rust's file IO instead of using tokio. This is just a try, of course, very much look forward to your suggestions for changes.
liutao-liu edited a comment on issue #7973:
Thank you very much for explaining the reasons for choosing tokio in such detail.
Currently, I am using FlashDB as my database. FlashDB has two types of file APIs: LIBC file API (like fopen/fread/fwrte/fclose) and Posix file API (like open/read/write/close). After being compiled by the wasi-sdk, the two types of file APIs invoke the same type of interface:
__imported_wasi_snapshot_preview1_fd_read
__imported_wasi_snapshot_preview1_fd_seek
__imported_wasi_snapshot_preview1_fd_sync
__imported_wasi_snapshot_preview1_fd_write...and so on, please.For the FlashDB scenario, I now expect the file IO performance of wasmtime to be close to native, so I plan to try to modify the implementation of wasmtime's file IO, and directly use rust's file IO instead of using tokio. This is just a try, of course, very much look forward to your suggestions.
squillace commented on issue #7973:
@pchickey: Just to add to the users of async wasmtime at scale, Microsoft also uses this, and has built it into the containerd/runwasi project for generalized Kubernetes usage as well. Great description of how this came about, thank you for the time and effort.
pchickey commented on issue #7973:
@liutao-liu Thanks for your response, that use case makes sense and its one that many users might encounter whether using FlashDB or sqlite or etc.
One detail I glossed over: we are using
cap_std::fs::File
(a thin wrapper on std File) to actually perform the io operations, nottokio::fs::File
, but we are doing so inside atokio::task::spawn_blocking
, which makes it morally equivalent to the way tokio File wraps std File. The trick to avoiding the thread synchronizations is getting rid ofspawn_blocking
, which is architecturally kinda tricky to avoid, but it may be possible to hack through it, especially if you only needwasmtime-cli
to work and not the more difficult composition scenarios I outlined.
liutao-liu edited a comment on issue #7973:
Thank you very much for explaining the reasons for choosing tokio in such detail.
Currently, I am using FlashDB as my database. FlashDB has two types of file APIs: LIBC file API (like fopen/fread/fwrte/fclose) and Posix file API (like open/read/write/close). After being compiled by the wasi-sdk, the two types of file APIs invoke the same type of interface:
__imported_wasi_snapshot_preview1_fd_read
__imported_wasi_snapshot_preview1_fd_seek
__imported_wasi_snapshot_preview1_fd_sync
__imported_wasi_snapshot_preview1_fd_write...and so on,.For the FlashDB scenario, I now expect the file IO performance of wasmtime to be close to native, so I plan to try to modify the implementation of wasmtime's file IO, and directly use rust's file IO instead of using tokio. This is just a try, of course, very much look forward to your suggestions.
liutao-liu edited issue #7973:
Test Case
test.c
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> int main() { FILE *fp; char str[14]; for (int i = 1; i < 1000; i++) { int fd = open("test.txt", O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR); for (int j = 0; j < 1000; j++) { lseek(fd, i * 10 + j * 13, SEEK_SET); write(fd, "hello world! ", 13); } fsync(fd); close(fd); } return 0; }
Steps to Reproduce
- first,compile test.c in the preceding test case into WASM bytecode using the WASI SDK
wasi-sdk/bin/clang -O3 test.c -o test.wasm
- second, WASMTIME AOT compile ,got test.aot.
wasmtime compile -W simd,relaxed-simd test.wasm -o test.aot
- third, Test Case Running Duration. It takes about 40 seconds.
time wasmtime run --allow-precompiled --dir ./ test.aot
Expected Results
wasmtime takes about the same time as native and wamr.
Actual Results
Wasmtime takes about 23 seconds.
The same test.c, native or wamr only takes about 2 seconds.Versions and Environment
Wasmtime version :16.0.0
Operating system: ubuntu 20.04
Architecture: aarch64 (same as x86 for this case)
Extra Info
Profile
# profile for wasmtime perf record -g -k mono wasmtime run --profile=jitdump --allow-precompiled --dir ./ test.aot sudo perf inject --jit --input perf.data --output perf.jit.data perf report -i perf.jit.data
As shown in the following figure, most performance hotspots are on Tokio. This is because wasmtime uses Tokio to implement the file I/O interface, involving:
__imported_wasi_snapshot_preview1_fd_read __imported_wasi_snapshot_preview1_fd_seek __imported_wasi_snapshot_preview1_fd_sync __imported_wasi_snapshot_preview1_fd_write ``` ![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/082e97b5-3a03-40bf-87fa-9041de881571)
profile for native
perf record -g -k mono ./test
perf report![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/995f3fb1-164d-4b6c-93ba-f7c7d71b67b2) #### System Call Times Statistics As shown in the following figure, the number of wasmtime system call times is three times that of native. Is it because wasmtime uses **tokio** to implement file IO operations, and the number of file I/O operations is three times that of native, resulting in poor performance? ``` # strace for wasmtime strace -c wasmtime run --allow-precompiled --dir ./ test.aot
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/b964aa40-e4ae-4378-a01d-3a604f8b1447)
```
strace for native( ths same as wamr )
strace -c ./test
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/c351808f-f8cc-4b28-8a74-c94f600242f2) **Why do we use Tokio to implement file I/O? Have we considered performance?** ~~~
liutao-liu commented on issue #7973:
Hi @pchickey,thank you for your modification tips. After I made the following changes according to your tips, the performance of wasmtime has been greatly improved.
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/1ceb425a-a1d7-4344-b970-ab1ed0e6bfa0)
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/3988e271-3584-4ec9-9e06-1f17824ec487)In the above test case, the wasmtime time has been optimized from 23 seconds to 6 seconds. As you might expect, the
futex
system call is gone. Also, thewrite
andread
system calls becomepwrite64
andpread64
. I haven't figured out why yet.Also, is it necessary to submit a PR for my changes? I added a run option to control whether or not to block in tokio.
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/85a40960-7471-4988-b8c7-a5336245900c)
liutao-liu edited a comment on issue #7973:
Hi @pchickey,thank you for your modification tips. After I made the following changes according to your tips, the performance of wasmtime has been greatly improved.
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/1ceb425a-a1d7-4344-b970-ab1ed0e6bfa0)
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/3988e271-3584-4ec9-9e06-1f17824ec487)In the above test case, the wasmtime time has been optimized from 23 seconds to 6 seconds. As you might expect, the
futex
system call is gone. Also, thewrite
andread
system calls becomepwrite64
andpread64
,i haven't figured out why yet.Is it necessary to submit a PR for my changes? I added a run option to control whether or not to block in tokio.
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/85a40960-7471-4988-b8c7-a5336245900c)
liutao-liu edited a comment on issue #7973:
Hi @pchickey,thank you for your modification tips. After I made the following changes according to your tips, the performance of wasmtime has been greatly improved.
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/1ceb425a-a1d7-4344-b970-ab1ed0e6bfa0)
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/3988e271-3584-4ec9-9e06-1f17824ec487)In the above test case, the wasmtime time has been optimized from 23 seconds to 6 seconds. As you might expect, the
futex
system call is gone. Also, thewrite
andread
system calls becomepwrite64
andpread64
,i haven't figured out why yet.Is it necessary to submit a PR for my changes? I added a run option to control whether or not to block in tokio. I think it is still necessary to provide an option for users to choose whether to use tokio
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/85a40960-7471-4988-b8c7-a5336245900c)
alexcrichton commented on issue #7973:
Thanks for testing that out @liutao-liu! I think it'd be reasonable to land something along these lines into the CLI itself, although I'd personally prefer to avoid a flag here since it'd be best to have the behavior turned on by default. What I might propose is something like:
- Add a new flag to
WasiCtx
that indicates whether blocking the current thread is ok. This defaults tofaults
but the CLI would set it totrue
. That would then be consulted duringspawn_blocking
to do what you're doing, but from configuration viaWasiCtx
instead.- For the second slowdown, avoiding
in_tokio
, I think the best solution would be to do that as part of the CLI. For example the CLI could callin_tokio
originally and that way all of wasm is executed within a tokio context. I believe that would hit the fast path in thein_tokio
function where nothing is done.Would that work for you?
liutao-liu commented on issue #7973:
Thanks for testing that out @liutao-liu! I think it'd be reasonable to land something along these lines into the CLI itself, although I'd personally prefer to avoid a flag here since it'd be best to have the behavior turned on by default. What I might propose is something like:
- Add a new flag to
WasiCtx
that indicates whether blocking the current thread is ok. This defaults tofaults
but the CLI would set it totrue
. That would then be consulted duringspawn_blocking
to do what you're doing, but from configuration viaWasiCtx
instead.- For the second slowdown, avoiding
in_tokio
, I think the best solution would be to do that as part of the CLI. For example the CLI could callin_tokio
originally and that way all of wasm is executed within a tokio context. I believe that would hit the fast path in thein_tokio
function where nothing is done.Would that work for you?
I understand your first propose, I can move the flag to wasictx.
But I don't understand your second propose. Do you mean to run the whole wasmtime in tokio? Can you explain that in more detail?
liutao-liu edited a comment on issue #7973:
Thanks for testing that out @liutao-liu! I think it'd be reasonable to land something along these lines into the CLI itself, although I'd personally prefer to avoid a flag here since it'd be best to have the behavior turned on by default. What I might propose is something like:
- Add a new flag to
WasiCtx
that indicates whether blocking the current thread is ok. This defaults tofaults
but the CLI would set it totrue
. That would then be consulted duringspawn_blocking
to do what you're doing, but from configuration viaWasiCtx
instead.- For the second slowdown, avoiding
in_tokio
, I think the best solution would be to do that as part of the CLI. For example the CLI could callin_tokio
originally and that way all of wasm is executed within a tokio context. I believe that would hit the fast path in thein_tokio
function where nothing is done.Would that work for you?
I understand your first propose, I can move the flag to wasictx. That's a good idea, it would be simpler.
But I don't understand your second propose. Do you mean to run the whole wasmtime in tokio? Can you explain that in more detail?
alexcrichton commented on issue #7973:
For the second point, sorry I think this is actually the right function, namely
with_ambient_tokio_runtime
.Can you try wrapping this invocation of
Func::call
in that function and see if it improves the performance you're seeing?
liutao-liu commented on issue #7973:
Thanks for testing that out @liutao-liu! I think it'd be reasonable to land something along these lines into the CLI itself, although I'd personally prefer to avoid a flag here since it'd be best to have the behavior turned on by default. What I might propose is something like:
- Add a new flag to
WasiCtx
that indicates whether blocking the current thread is ok. This defaults tofaults
but the CLI would set it totrue
. That would then be consulted duringspawn_blocking
to do what you're doing, but from configuration viaWasiCtx
instead.- For the second slowdown, avoiding
in_tokio
, I think the best solution would be to do that as part of the CLI. For example the CLI could callin_tokio
originally and that way all of wasm is executed within a tokio context. I believe that would hit the fast path in thein_tokio
function where nothing is done.Would that work for you?
Hello @alexcrichton , I tested the solution you proposed and it actually took an average of 14 seconds. Compared with the original solution (23 seconds), this improvement is not ideal. This is because each I/O operation is performed by invoking spawn_blocking, which still causes a large number of asynchronous waits.
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/af1930ad-aa22-43d2-beac-444ba7d2b850)
alexcrichton commented on issue #7973:
Sorry, but to confirm, did you keep the changes mentioned above, e.g. the
--block-file-io-in-tokio
flag?I sketched out what those changes might look like in this commit, but I think we'll both want to skip the
spawn_blocking
(conditionally) and additionally have thewith_ambient_tokio_runtime
bits.Can you confirm whether that commit has the performance that you're looking for?
liutao-liu commented on issue #7973:
Sorry, but to confirm, did you keep the changes mentioned above, e.g. the
--block-file-io-in-tokio
flag?I sketched out what those changes might look like in this commit, but I think we'll both want to skip the
spawn_blocking
(conditionally) and additionally have thewith_ambient_tokio_runtime
bits.Can you confirm whether that commit has the performance that you're looking for?
With these two changes, it takes 6 seconds, no further improvement.
Also, you mentioned the new flags added in wasictx, which can't be read here.in_tokio
liutao-liu edited a comment on issue #7973:
Sorry, but to confirm, did you keep the changes mentioned above, e.g. the
--block-file-io-in-tokio
flag?I sketched out what those changes might look like in this commit, but I think we'll both want to skip the
spawn_blocking
(conditionally) and additionally have thewith_ambient_tokio_runtime
bits.Can you confirm whether that commit has the performance that you're looking for?
With these two changes, it takes 6 seconds, hardly any further improvement.
Also, you mentioned the new flags added in wasictx, which can't be read here.in_tokio
alexcrichton commented on issue #7973:
Oops sorry I forgot to actually turn the option on. I do realize that the goal is to avoid
spawn_blocking
, and most of the patch I linked was doing that. If you try this branch's latest commit that should remove thespawn_blocking
. Can you test and see if that performance is what you're looking for?
liutao-liu commented on issue #7973:
Oops sorry I forgot to actually turn the option on. I do realize that the goal is to avoid
spawn_blocking
, and most of the patch I linked was doing that. If you try this branch's latest commit that should remove thespawn_blocking
. Can you test and see if that performance is what you're looking for?Using your latest commit, it took 7 seconds. You haven't changed the code here. After I changed it here, it took 6 seconds.
fn in_tokio<F: Future>(future: F) -> Result<F::Output> { Ok(futures::executor::block_on(future)) }
liutao-liu edited a comment on issue #7973:
Oops sorry I forgot to actually turn the option on. I do realize that the goal is to avoid
spawn_blocking
, and most of the patch I linked was doing that. If you try this branch's latest commit that should remove thespawn_blocking
. Can you test and see if that performance is what you're looking for?Using your latest commit, it took 7 seconds. You haven't changed the code here. After I modified it based on yours, it took 6 seconds.
fn in_tokio<F: Future>(future: F) -> Result<F::Output> { Ok(futures::executor::block_on(future)) }
alexcrichton commented on issue #7973:
Ok, thanks for confirming!
You're right that I didn't change
in_tokio
and that was intentional. Changingin_tokio
would break other functionality, even on an opt-in basis, so we can't quite so easily switch to a different executor and assume that it works. Some users ofin_tokio
require the tokio event loop to get turned, and other users don't necessarily need it. To use a totally different executor we'd have to classify which is which.
alexcrichton commented on issue #7973:
I've opened https://github.com/bytecodealliance/wasmtime/pull/8190 with the changes I made above cleaned up a bit. That's probably at least a good new baseline to start from in terms of optimizing.
liutao-liu commented on issue #7973:
hi @pchickey,
I'm still working on optimizing wasmtime's fd_read\fd_write performance, and I've foundBorrowChecker
to be a performance hotspot, at around 7% cpu. I looked at the history of commits forcrates/wiggle
and couldn't see whyBorrowChecker
was introduced. Do you know the historical background to the introduction ofBorrowChecker
?
liutao-liu edited a comment on issue #7973:
hi @pchickey,
I'm still working on optimizing wasmtime's fd_read\fd_write performance, and I've foundBorrowChecker
to be a performance hotspot, at around 7% cpu. I looked at the history of commits forcrates/wiggle
and couldn't see whyBorrowChecker
was introduced. Do you know the historical background to the introduction ofBorrowChecker
?![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/1e53b770-66a1-47c6-92be-a0ce71ba1500)
liutao-liu edited a comment on issue #7973:
/hi @pchickey,
I'm still working on optimizing wasmtime's fd_read\fd_write performance, and I've foundBorrowChecker
to be a performance hotspot, at around 7% cpu. I looked at the history of commits forcrates/wiggle
and couldn't see whyBorrowChecker
was introduced. Do you know the historical background to the introduction ofBorrowChecker
?
Which practical application scenarios require the use ofBorrowChecker
, and does the cli method also require the use ofBorrowChecker
?![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/1e53b770-66a1-47c6-92be-a0ce71ba1500)
liutao-liu commented on issue #7973:
Wasmtime's
borrow checker
should be modeled after Rust'sborrow checker
, right? Theborrow checker
of Rust will only be executed by the compiler, so it will not have a negative impact on runtime performance. Theborrow checker
of wasmtime is executed during runtime, which can have a negative impact on performance. I don't know in what scenario there will be a simultaneous occurrence ofshare borrow
andmut borrow
forguest memory
, especiallywasmtime-cli
.
liutao-liu edited a comment on issue #7973:
//hi @pchickey,
I'm still working on optimizing wasmtime's fd_read\fd_write performance, and I've foundBorrowChecker
to be a performance hotspot, at around 7% cpu. I looked at the history of commits forcrates/wiggle
and couldn't see whyBorrowChecker
was introduced. Do you know the historical background to the introduction ofBorrowChecker
?
Which practical application scenarios require the use ofBorrowChecker
, and does the wasmtime-cli also require the use ofBorrowChecker
?![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/1e53b770-66a1-47c6-92be-a0ce71ba1500)
liutao-liu edited a comment on issue #7973:
Wasmtime's
borrow checker
should be modeled after Rust'sborrow checker
, right? Theborrow checker
of Rust will only be executed by the compiler in the static code analysis phase, so it will not have a negative impact on runtime performance. Theborrow checker
of wasmtime is executed during runtime, which can have a negative impact on performance. I don't know in what scenario there will be a simultaneous occurrence ofshare borrow
andmut borrow
forguest memory
, especiallywasmtime-cli
.
liutao-liu commented on issue #7973:
Issue#734 have some explaination of the backgroud of
borrow checker
, theproposal
mentioned in the issue is which proposal?
liutao-liu edited a comment on issue #7973:
Wasmtime's
borrow checker
should be modeled after Rust'sborrow checker
, right? Theborrow checker
of Rust will only be executed by the compiler in the static code analysis phase, so it will not have a negative impact on runtime performance. Theborrow checker
of wasmtime is executed during runtime, which can have a negative impact on performance. I don't know why is wasmtime not allowing hosts to hold simultaneouslyshare borrow
andmut borrow
pointer ofguest memory
. I feel like this can be relaxed like C language, with.wasm
ensuring memory security.
liutao-liu edited a comment on issue #7973:
Wasmtime's
borrow checker
should be modeled after Rust'sborrow checker
, right? Theborrow checker
of Rust will only be executed by the compiler in the static code analysis phase, so it will not have a negative impact on runtime performance. Theborrow checker
of wasmtime is executed during runtime, which can have a negative impact on performance. I don't know why is wasmtime not allowing hosts to hold simultaneouslyshare borrow
andmut borrow
pointer ofguest memory
. I feel like this can be relaxed like C/C++, with.wasm
ensuring memory security.
iximeow commented on issue #7973:
I don't know why is wasmtime not allowing hosts to hold simultaneously
share borrow
andmut borrow
pointer ofguest memory
. I feel like this can be relaxed like C/C++, with.wasm
ensuring memory security.wasm memory sandboxing is not really related to
BorrowChecker
-BorrowChecker
ensures that regardless of what a wasm program may provide to Wasmtime as hostcall arguments, Wasmtime cannot be caused to violate Rust safety rules. imagine if an embedder exposed a host-optimizedmemcpy(dest, src, len)
; if a wasm program called that with overlappingdest
andsrc
, implementing such a function by just forwarding the arguments to libc'smemcpy()
would yield UB in Wasmtime, even though the wasm guest was safely sandboxed from directly accessing memory outside its heap.(
BorrowChecker
also ensures references are properly aligned and lie entirely inside the wasm memory they ought to be in, which are checks even a C/C++ embedder of wasmtime would need and would cause some overhead. this is all pretty similar to the kinds of checkspread(2)
and similar do before using user inputs)while it might be possible to defer these checks in some cases (passthrough to a host
pread(2)
is probably a good example..) it's still tricky in Wasmtime because we don't know that pread(2) isn't happening at the end of a call chain like:
wasm -> embedder_hostcall(buf, len) -> BorrowChecker in wasmtime tracks &[buf] -> call back into wasm -> wasm calls wasi-common's fd_read(buf, len)
. so fromfd_read
's perspective it's only safe to pass through arguments to the host OS _if Wasmtime has no other borrow overlapping with the buffer to be read into_.iirc the proposal in #734 is basically what became
BorrowChecker
andBorrowChecker
is still what what we plan to use even aswig/wit/witx
has evolved, though Pat or Alex would definitely be more familiar with that.re. simultaneously holding
share borrow
andmut borrow
of guest memory,BorrowChecker
exists partially to support that, erroring if those borrows would overlap. disallowing concurrent borrows of guest memory would makeBorrowChecker
much simpler and faster, but a much more limited host interface :sweat_smile:
alexcrichton commented on issue #7973:
The
BorrowChecker
type is quite old in Wasmtime and I don't have anything to add about its rationale over what @iximeow already said. That being said I believe it's over-powered relative to what we need it to do, hence the cost you're seeing, so I posted https://github.com/bytecodealliance/wasmtime/pull/8277 which should remove the performance issue you're seeing related toBorrowChecker
. That comes at a cost of redefining what it does, but I believe that should be ok given how it's used today.
alexcrichton closed issue #7973:
Test Case
test.c
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> int main() { FILE *fp; char str[14]; for (int i = 1; i < 1000; i++) { int fd = open("test.txt", O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR); for (int j = 0; j < 1000; j++) { lseek(fd, i * 10 + j * 13, SEEK_SET); write(fd, "hello world! ", 13); } fsync(fd); close(fd); } return 0; }
Steps to Reproduce
- first,compile test.c in the preceding test case into WASM bytecode using the WASI SDK
wasi-sdk/bin/clang -O3 test.c -o test.wasm
- second, WASMTIME AOT compile ,got test.aot.
wasmtime compile -W simd,relaxed-simd test.wasm -o test.aot
- third, Test Case Running Duration. It takes about 40 seconds.
time wasmtime run --allow-precompiled --dir ./ test.aot
Expected Results
wasmtime takes about the same time as native and wamr.
Actual Results
Wasmtime takes about 23 seconds.
The same test.c, native or wamr only takes about 2 seconds.Versions and Environment
Wasmtime version :16.0.0
Operating system: ubuntu 20.04
Architecture: aarch64 (same as x86 for this case)
Extra Info
Profile
# profile for wasmtime perf record -g -k mono wasmtime run --profile=jitdump --allow-precompiled --dir ./ test.aot sudo perf inject --jit --input perf.data --output perf.jit.data perf report -i perf.jit.data
As shown in the following figure, most performance hotspots are on Tokio. This is because wasmtime uses Tokio to implement the file I/O interface, involving:
__imported_wasi_snapshot_preview1_fd_read __imported_wasi_snapshot_preview1_fd_seek __imported_wasi_snapshot_preview1_fd_sync __imported_wasi_snapshot_preview1_fd_write ``` ![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/082e97b5-3a03-40bf-87fa-9041de881571)
profile for native
perf record -g -k mono ./test
perf report![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/995f3fb1-164d-4b6c-93ba-f7c7d71b67b2) #### System Call Times Statistics As shown in the following figure, the number of wasmtime system call times is three times that of native. Is it because wasmtime uses **tokio** to implement file IO operations, and the number of file I/O operations is three times that of native, resulting in poor performance? ``` # strace for wasmtime strace -c wasmtime run --allow-precompiled --dir ./ test.aot
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/b964aa40-e4ae-4378-a01d-3a604f8b1447)
```
strace for native( ths same as wamr )
strace -c ./test
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/c351808f-f8cc-4b28-8a74-c94f600242f2) **Why do we use Tokio to implement file I/O? Have we considered performance?** ~~~
alexcrichton commented on issue #7973:
IIRC this was fixed in #8303, so I'm going to close this
Last updated: Jan 24 2025 at 00:11 UTC