Stream: git-wasmtime

Topic: wasmtime / issue #7973 performance of wasmtime file I/O i...

Wasmtime GitHub notifications bot (Feb 21 2024 at 06:54):

liutao-liu added the bug label to Issue #7973.

Wasmtime GitHub notifications bot (Feb 21 2024 at 06:54):

liutao-liu opened issue #7973:

Test Case

test.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>

int main()
{
    FILE *fp;
    char str[14];
    for (int i = 1; i < 1000; i++)
    {

        int fd = open("test.txt", O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR);
        for (int j = 0; j < 1000; j++)
        {

            lseek(fd, i * 10 + j * 13, SEEK_SET);
            write(fd, "hello world! ", 13);
        }
        fsync(fd);
        close(fd);
    }

    return 0;
}
Steps to Reproduce

first，compile test.c in the preceding test case into WASM bytecode using the WASI SDK
wasi-sdk/bin/clang -O3 test.c -o test.wasm
second, WASMTIME AOT compile and generate machine code to obtain test.aot.
wasmtime compile  -W simd,relaxed-simd test.wasm -o test.aot
third, Test Case Running Duration. It takes about 40 seconds.
time wasmtime run --allow-precompiled --dir ./ test.aot
Expected Results

wasmtime takes about the same time as native and wamr.

Actual Results

Wasmtime takes about 40 seconds.
The same test.c, native or wamr only takes about 2 seconds.

Versions and Environment

Wasmtime version :16.0.0

Operating system: ubuntu 20.04

Architecture: aarch64 (same as x86 for this case)

Extra Info

Profile
#  profile for wasmtime
perf record -g -k mono wasmtime run --profile=jitdump --allow-precompiled --dir ./ test.aot
sudo perf inject --jit --input perf.data --output perf.jit.data
perf report -i perf.jit.data
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/082e97b5-3a03-40bf-87fa-9041de881571)
#  profile for native
perf record -g -k mono ./test
perf report
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/995f3fb1-164d-4b6c-93ba-f7c7d71b67b2)

System Call Times Statistics

As shown in the following figure, the number of wasmtime system call times is three times that of native.
Is it because wasmtime uses tokio for file IO operations, and the number of file I/O operations is three times that of native, resulting in poor performance?
```

strace for wasmtime

strace -c wasmtime run --allow-precompiled --dir ./ test.aot
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/b964aa40-e4ae-4378-a01d-3a604f8b1447)

 ```
# strace for native( ths same as wamr )
strace -c ./test
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/c351808f-f8cc-4b28-8a74-c94f600242f2)

Wasmtime GitHub notifications bot (Feb 21 2024 at 06:57):

liutao-liu edited issue #7973:

Test Case

test.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>

int main()
{
    FILE *fp;
    char str[14];
    for (int i = 1; i < 1000; i++)
    {

        int fd = open("test.txt", O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR);
        for (int j = 0; j < 1000; j++)
        {

            lseek(fd, i * 10 + j * 13, SEEK_SET);
            write(fd, "hello world! ", 13);
        }
        fsync(fd);
        close(fd);
    }

    return 0;
}
Steps to Reproduce

first，compile test.c in the preceding test case into WASM bytecode using the WASI SDK
wasi-sdk/bin/clang -O3 test.c -o test.wasm
second, WASMTIME AOT compile and generate machine code to obtain test.aot.
wasmtime compile  -W simd,relaxed-simd test.wasm -o test.aot
third, Test Case Running Duration. It takes about 40 seconds.
time wasmtime run --allow-precompiled --dir ./ test.aot
Expected Results

wasmtime takes about the same time as native and wamr.

Actual Results

Wasmtime takes about 40 seconds.
The same test.c, native or wamr only takes about 2 seconds.

Versions and Environment

Wasmtime version :16.0.0

Operating system: ubuntu 20.04

Architecture: aarch64 (same as x86 for this case)

Extra Info

Profile
#  profile for wasmtime
perf record -g -k mono wasmtime run --profile=jitdump --allow-precompiled --dir ./ test.aot
sudo perf inject --jit --input perf.data --output perf.jit.data
perf report -i perf.jit.data
As shown in the following figure, most performance hotspots are on Tokio. This is because wasmtime uses Tokio to implement the file I/O interface, involving:__imported_wasi_snapshot_preview1_fd_read __imported_wasi_snapshot_preview1_fd_seek __imported_wasi_snapshot_preview1_fd_sync __imported_wasi_snapshot_preview1_fd_write

![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/082e97b5-3a03-40bf-87fa-9041de881571)
#  profile for native
perf record -g -k mono ./test
perf report
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/995f3fb1-164d-4b6c-93ba-f7c7d71b67b2)

System Call Times Statistics

As shown in the following figure, the number of wasmtime system call times is three times that of native.
Is it because wasmtime uses tokio for file IO operations, and the number of file I/O operations is three times that of native, resulting in poor performance?
```

strace for wasmtime

strace -c wasmtime run --allow-precompiled --dir ./ test.aot
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/b964aa40-e4ae-4378-a01d-3a604f8b1447)

 ```
# strace for native( ths same as wamr )
strace -c ./test
![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/c351808f-f8cc-4b28-8a74-c94f600242f2)

Wasmtime GitHub notifications bot (Feb 21 2024 at 06:58):

liutao-liu edited issue #7973:

Test Case

test.c

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>

int main()
{
    FILE *fp;
    char str[14];
    for (int i = 1; i < 1000; i++)
    {

        int fd = open("test.txt", O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR);
        for (int j = 0; j < 1000; j++)
        {

            lseek(fd, i * 10 + j * 13, SEEK_SET);
            write(fd, "hello world! ", 13);
        }
        fsync(fd);
        close(fd);
    }

    return 0;
}

Steps to Reproduce

first，compile test.c in the preceding test case into WASM bytecode using the WASI SDK

wasi-sdk/bin/clang -O3 test.c -o test.wasm

second, WASMTIME AOT compile and generate machine code to obtain test.aot.

wasmtime compile  -W simd,relaxed-simd test.wasm -o test.aot

third, Test Case Running Duration. It takes about 40 seconds.

time wasmtime run --allow-precompiled --dir ./ test.aot

Expected Results

wasmtime takes about the same time as native and wamr.

Actual Results

Wasmtime takes about 40 seconds.
The same test.c, native or wamr only takes about 2 seconds.

Versions and Environment

Wasmtime version :16.0.0

Operating system: ubuntu 20.04

Architecture: aarch64 (same as x86 for this case)

Extra Info

Profile

#  profile for wasmtime
perf record -g -k mono wasmtime run --profile=jitdump --allow-precompiled --dir ./ test.aot
sudo perf inject --jit --input perf.data --output perf.jit.data
perf report -i perf.jit.data

As shown in the following figure, most performance hotspots are on Tokio. This is because wasmtime uses Tokio to implement the file I/O interface, involving:

__imported_wasi_snapshot_preview1_fd_read
__imported_wasi_snapshot_preview1_fd_seek
__imported_wasi_snapshot_preview1_fd_sync
__imported_wasi_snapshot_preview1_fd_write
 ```

![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/082e97b5-3a03-40bf-87fa-9041de881571)

profile for native

perf record -g -k mono ./test
perf report

![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/995f3fb1-164d-4b6c-93ba-f7c7d71b67b2)

#### System Call Times Statistics
As shown in the following figure, the number of wasmtime system call times is three times that of native.
Is it because wasmtime uses **tokio** for file IO operations, and the number of file I/O operations is three times that of native, resulting in poor performance?
 ```
# strace for wasmtime
strace -c wasmtime run --allow-precompiled --dir ./ test.aot

![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/b964aa40-e4ae-4378-a01d-3a604f8b1447)

```

strace for native( ths same as wamr )

strace -c ./test

![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/c351808f-f8cc-4b28-8a74-c94f600242f2)

~~~

Wasmtime GitHub notifications bot (Feb 21 2024 at 06:59):

liutao-liu edited issue #7973:

Test Case

test.c

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>

int main()
{
    FILE *fp;
    char str[14];
    for (int i = 1; i < 1000; i++)
    {

        int fd = open("test.txt", O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR);
        for (int j = 0; j < 1000; j++)
        {

            lseek(fd, i * 10 + j * 13, SEEK_SET);
            write(fd, "hello world! ", 13);
        }
        fsync(fd);
        close(fd);
    }

    return 0;
}

Steps to Reproduce

first，compile test.c in the preceding test case into WASM bytecode using the WASI SDK

wasi-sdk/bin/clang -O3 test.c -o test.wasm

second, WASMTIME AOT compile and generate machine code to obtain test.aot.

wasmtime compile  -W simd,relaxed-simd test.wasm -o test.aot

third, Test Case Running Duration. It takes about 40 seconds.

time wasmtime run --allow-precompiled --dir ./ test.aot

Expected Results

wasmtime takes about the same time as native and wamr.

Actual Results

Wasmtime takes about 40 seconds.
The same test.c, native or wamr only takes about 2 seconds.

Versions and Environment

Wasmtime version :16.0.0

Operating system: ubuntu 20.04

Architecture: aarch64 (same as x86 for this case)

Extra Info

Profile

#  profile for wasmtime
perf record -g -k mono wasmtime run --profile=jitdump --allow-precompiled --dir ./ test.aot
sudo perf inject --jit --input perf.data --output perf.jit.data
perf report -i perf.jit.data

As shown in the following figure, most performance hotspots are on Tokio. This is because wasmtime uses Tokio to implement the file I/O interface, involving:

__imported_wasi_snapshot_preview1_fd_read
__imported_wasi_snapshot_preview1_fd_seek
__imported_wasi_snapshot_preview1_fd_sync
__imported_wasi_snapshot_preview1_fd_write
 ```

![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/082e97b5-3a03-40bf-87fa-9041de881571)

profile for native

perf record -g -k mono ./test
perf report

![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/995f3fb1-164d-4b6c-93ba-f7c7d71b67b2)

#### System Call Times Statistics
As shown in the following figure, the number of wasmtime system call times is three times that of native.
Is it because wasmtime uses **tokio** for file IO operations, and the number of file I/O operations is three times that of native, resulting in poor performance?
 ```
# strace for wasmtime
strace -c wasmtime run --allow-precompiled --dir ./ test.aot

![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/b964aa40-e4ae-4378-a01d-3a604f8b1447)

```

strace for native( ths same as wamr )

strace -c ./test

![image](https://github.com/bytecodealliance/wasmtime/assets/10509166/c351808f-f8cc-4b28-8a74-c94f600242f2)


**Why do we use Tokio to implement file I/O? Have we considered performance?**

~~~

Last updated: Apr 18 2025 at 07:03 UTC