wycats opened Issue #2662:
Thanks for opening a bug report! Please answer the questions below
if they're relevant and delete this text before submitting.
- What are the steps to reproduce the issue?
I have pushed a repository at https://github.com/wycats/wand/tree/str-wasmtime. The branch
str-wasmtime
is self-contained.The script
./script/all.sh
runs all of the build steps. The final step,./script/run.sh
, runs an executable, which is a Rust executable that uses thewasmtime
embedding API. Its source is located in./crates/wand-cli
. The Rust source for the wasm is located in./src
.The build scripts use
wasm-pack
to compile and optimize the wasm. They copy the.wasm
file into adata
directory in thewand-cli
crate, which usesinclude_bytes!
to get the bytes.The executable has a bunch of
println!
s that list out the execution steps and how much time has elapsed.
- What do you expect to happen? What does actually happen? Does it panic, and
if so, with which assertion?I expect the
Module::new
step to be relatively fast. Instead, it takes several seconds on my machine.
- Which Wasmtime version / commit hash / branch are you using?
0.22.
- If relevant, can you include some extra information about your environment?
(Rust version, operating system, architecture...)$ rustc --version rustc 1.52.0-nightly (d1206f950 2021-02-15) $ cargo --version cargo 1.51.0-nightly (ab64d1393 2021-02-10) $ uname -a Linux XPS-13-2019 4.19.84-microsoft-standard #1 SMP Wed Nov 13 11:44:37 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.2 LTS Release: 20.04 Codename: focal
I'm on Ubuntu 20.04 inside of WSL2 on Windows 10.
wycats labeled Issue #2662:
Thanks for opening a bug report! Please answer the questions below
if they're relevant and delete this text before submitting.
- What are the steps to reproduce the issue?
I have pushed a repository at https://github.com/wycats/wand/tree/str-wasmtime. The branch
str-wasmtime
is self-contained.The script
./script/all.sh
runs all of the build steps. The final step,./script/run.sh
, runs an executable, which is a Rust executable that uses thewasmtime
embedding API. Its source is located in./crates/wand-cli
. The Rust source for the wasm is located in./src
.The build scripts use
wasm-pack
to compile and optimize the wasm. They copy the.wasm
file into adata
directory in thewand-cli
crate, which usesinclude_bytes!
to get the bytes.The executable has a bunch of
println!
s that list out the execution steps and how much time has elapsed.
- What do you expect to happen? What does actually happen? Does it panic, and
if so, with which assertion?I expect the
Module::new
step to be relatively fast. Instead, it takes several seconds on my machine.
- Which Wasmtime version / commit hash / branch are you using?
0.22.
- If relevant, can you include some extra information about your environment?
(Rust version, operating system, architecture...)$ rustc --version rustc 1.52.0-nightly (d1206f950 2021-02-15) $ cargo --version cargo 1.51.0-nightly (ab64d1393 2021-02-10) $ uname -a Linux XPS-13-2019 4.19.84-microsoft-standard #1 SMP Wed Nov 13 11:44:37 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.2 LTS Release: 20.04 Codename: focal
I'm on Ubuntu 20.04 inside of WSL2 on Windows 10.
alexcrichton commented on Issue #2662:
Thanks for the report! To help minimize this a bit further, this is the wasm file in question and as you've mentioned the slower-than-expected part here is
Module::new
, aka compiling the module.Executing your project I see compile times of ~2s, and to poke around a bit I made a smaller script which simply compiles this module. The timings on the
main
branch (soon to be 0.23.0) I get are:
- old backend, opt speed - 634.ms
- old backend, opt size - 641ms
- old backend, opt none - 561ms
- new backend, opt speed - 766ms
- new backend, opt size - 752ms
- new backend, opt none - 869ms
So it looks like the new backend is not actually a win here. I'm not entirely sure how to explain the discrepancy of your project taking 2.5s to compile and an isolated script taking ~2x faster. If I use the release commit of 0.22.0 I get slightly slower compile times by ~250ms, but still not in the ~2s range.
alexcrichton commented on Issue #2662:
Aha I have found the source of the 2x slowdown over master. Your project enables logging with tracing-subscriber (no doubt because our wasi example does the same) but this has the adverse affect apparently of having
log_enabled!(LogLevel::Debug)
returningtrue
. This means that very expensive debug logs like this are happening at runtime. That's accounting for a very large portion of the runtime in your example.After commenting out the tracing-subscriber business it runs in the same amount of time I saw in my script (e.g. 2x faster)
alexcrichton edited a comment on Issue #2662:
Aha I have found the source of the 2x slowdown over main. Your project enables logging with tracing-subscriber (no doubt because our wasi example does the same) but this has the adverse affect apparently of having
log_enabled!(LogLevel::Debug)
returningtrue
. This means that very expensive debug logs like this are happening at runtime. That's accounting for a very large portion of the runtime in your example.After commenting out the tracing-subscriber business it runs in the same amount of time I saw in my script (e.g. 2x faster)
alexcrichton commented on Issue #2662:
For further speedups beyond the numbers above there's a zulip thread going now as well -- https://bytecodealliance.zulipchat.com/#narrow/stream/225524-cranelift-new-backend/topic/wasm.20file.20that.20is.20slower.20to.20compile.20in.20new.20BE.20than.20old.20BE
peterhuene commented on Issue #2662:
So I believe the way
tracing-subscriber
uses thetracing-log
crate has the unfortunate consequence of formatting log strings for every level, regardless of whether or not the logger is "enabled" for that level (in this case, dependent upon the env filter respectingRUST_LOG
).The
tracing-log
feature is enabled by default intracing-subscriber
, which causes initialization of the global logger.This uses a default
Builder
that ends up setting the maximum level to the most verbose level.All the level-specific log macros end up as a invocation of the
log!
macro, which checks the level againstlog::max_level()
to see if it needs to format the string and calls__private_api_log
.__private_api_log
callsLog::enabled
on the logger to check to see if it wants the event; but at that point the string has already been formatted as onlylog::max_level
is checked for that.And since the max level is the most verbose, all strings are formatted, but the logger in
tracing-log
ends up telling thelog
crate it isn't interested in the event after all.
peterhuene edited a comment on Issue #2662:
So I believe the way
tracing-subscriber
uses thetracing-log
crate has the unfortunate consequence of formatting log strings for every level, regardless of whether or not the logger is "enabled" for that level (in this case, dependent upon the env filter respectingRUST_LOG
).The
tracing-log
feature is enabled by default intracing-subscriber
, which causes initialization of the global logger.This uses a default
Builder
that ends up setting the maximum level to the most verbose level.All the level-specific log macros end up as a invocation of the
log!
macro, which checks the level againstlog::max_level()
to see if it needs to format the string and calls__private_api_log
.__private_api_log
callsLog::enabled
on the logger to check to see if it wants the event; but at that point the string has already been formatted as onlylog::max_level
is checked for that.And since the max level is the most verbose, all strings are formatted, but the logger in
tracing-log
ends up telling thelog
crate it isn't interested in the event after all.As Alex also noted, this would also affect any explicit checks to see if the log level is enabled (
log_enabled!
has the same check againstlog::max_level
), so extra work is also being performed outside of the macro string formatting too.
wycats commented on Issue #2662:
Your project enables logging with tracing-subscriber (no doubt because our wasi example does the same)
You're right! I stared at that line a bunch of times and kept telling myself "I copied it from an official example", and never got around to trying the whole thing with the line commented out.
peterhuene edited a comment on Issue #2662:
So I believe the way
tracing-subscriber
uses thetracing-log
crate also has the unfortunate consequence of formatting log strings for every level, regardless of whether or not the logger is "enabled" for that level (in this case, dependent upon the env filter respectingRUST_LOG
).The
tracing-log
feature is enabled by default intracing-subscriber
, which causes initialization of the global logger.This uses a default
Builder
that ends up setting the maximum level to the most verbose level.All the level-specific log macros end up as a invocation of the
log!
macro, which checks the level againstlog::max_level()
to see if it needs to format the string and calls__private_api_log
.__private_api_log
callsLog::enabled
on the logger to check to see if it wants the event; but at that point the string has already been formatted as onlylog::max_level
is checked for that.And since the max level is the most verbose, all strings are formatted, but the logger in
tracing-log
ends up telling thelog
crate it isn't interested in the event after all.As Alex also noted, this would also affect any explicit checks to see if the log level is enabled (
log_enabled!
has the same check againstlog::max_level
), so extra work is also being performed outside of the macro string formatting too.
peterhuene edited a comment on Issue #2662:
So I believe the waytracing-subscriber
uses thetracing-log
crate also has the unfortunate consequence of formatting log strings for every level, regardless of whether or not the logger is "enabled" for that level (in this case, dependent upon the env filter respectingRUST_LOG
).
Thetracing-log
feature is enabled by default intracing-subscriber
, which causes initialization of the global logger.
This uses a defaultBuilder
that ends up setting the maximum level to the most verbose level.
All the level-specific log macros end up as a invocation of thelog!
macro, which checks the level againstlog::max_level()
to see if it needs to format the string and calls__private_api_log
.__private_api_log
callsLog::enabled
on the logger to check to see if it wants the event; but at that point the string has already been formatted as onlylog::max_level
is checked for that.
And since the max level is the most verbose, all strings are formatted, but the logger intracing-log
ends up telling thelog
crate it isn't interested in the event after all.
As Alex also noted, this would also affect any explicit checks to see if the log level is enabled (log_enabled!
has the same check againstlog::max_level
), so extra work is also being performed outside of the macro string formatting too.Edit: my analysis was wrong as I was looking at the wrong function. String formatting is delayed until the logger is asked if the level is enabled.
This only affects the
log_enabled!
macro it seems.
peterhuene edited a comment on Issue #2662:
So I believe the waytracing-subscriber
uses thetracing-log
crate also has the unfortunate consequence of formatting log strings for every level, regardless of whether or not the logger is "enabled" for that level (in this case, dependent upon the env filter respectingRUST_LOG
).
Thetracing-log
feature is enabled by default intracing-subscriber
, which causes initialization of the global logger.
This uses a defaultBuilder
that ends up setting the maximum level to the most verbose level.
All the level-specific log macros end up as a invocation of thelog!
macro, which checks the level againstlog::max_level()
to see if it needs to format the string and calls__private_api_log
.__private_api_log
callsLog::enabled
on the logger to check to see if it wants the event; but at that point the string has already been formatted as onlylog::max_level
is checked for that.
And since the max level is the most verbose, all strings are formatted, but the logger intracing-log
ends up telling thelog
crate it isn't interested in the event after all.
As Alex also noted, this would also affect any explicit checks to see if the log level is enabled (log_enabled!
has the same check againstlog::max_level
), so extra work is also being performed outside of the macro string formatting too.Edit: my analysis was wrong as I was looking at the wrong function (accidentally was looking at
__private_api_enabled
and saw the&str
arg and my brain assumed a formatted message rather than the target name). String formatting is delayed until the logger is asked if the level is enabled.This only affects the
log_enabled!
macro it seems.
softprops commented on Issue #2662:
I was just going to reach out about this when I found this open issue.
I was wondering if anyone has done a baseline comparison with wasmer. I recently came across this article which mentions a breakthrough in compile time performance. I was wondering how wasmtime compares.
The context for me is a small project that runs fastly wasm apps inside a local http server https://github.com/softprops/fasttime
The compilation step where I load the wasm module can take seconds. I was wondering if work was being done to improve this
bjorn3 commented on Issue #2662:
That article mentions parallel compilation as reason that it is faster. This is already the default with Wasmtime.
alexcrichton commented on Issue #2662:
@softprops do you have a wasm module that you can share so we can dig in?
softprops commented on Issue #2662:
Sure. Where's the best place to upload it?
In ci, I build a test app and then run it though some integration tests. It's uploaded as an artifact so I think you can download it directly.
Here's a recent GitHub actions job https://github.com/softprops/fasttime/actions/runs/586883035
Alternatively if there's anything I can do on my end, trace logging etc, I'd be happy to help. Wasmtime is such an amazing project!
alexcrichton commented on Issue #2662:
That works! You can also upload directly here in your comment by compressing it with gzip (GitHub doesn't allow raw wasm file uploads).
I did some timing though and your wasm module compiles in 86ms for me. That means that something deeper may be at play here, perhaps you could open a separate issue for this?
softprops commented on Issue #2662:
holy smokes that's fast! could it be the wasmtime version I'm using (0.22) by chance?
I'm mainly getting a sense from this code which loads the wasm file from a path.
alexcrichton commented on Issue #2662:
What sort of timings are you seeing for compiling that module? I don't think we had major perf improvements in 0.22, so if you're an order of magnitude different or so from what I'm seeing then there's probably a different issue.
pchickey commented on Issue #2662:
The latest tracing-subscriber 0.2.16 resolves the root cause of the issue Alex determined early in this thread. Downstream users should be able to
cargo update
to get this fix.
softprops commented on Issue #2662:
What sort of timings are you seeing for compiling that module? I don't think we had major perf improvements in 0.22, so if you're an order of magnitude different or so from what I'm seeing then there's probably a different issue.
I'll look into upgrading to wasmtime@0.23 this weekend but with wasmtime@0.22 is not uncommon for a 2.7mb wasm file to take multiple seconds to compile with
Module::from_file(&engine, file)?;
is about 6s on average for me which is why my head started spinning when you mentioned 82ms◌ Loading module... ✔ Loaded module in 6.273588s
I'm using the default Engine instance and calling Module::from_file once here. I haven't yet tried loading the file separately and then calling Module::new with just the bytes.
If its helpful for extra context I'm on a mac using rust 1.49
rustc --version rustc 1.49.0 (e1884a8e3 2020-12-29) cargo --version cargo 1.49.0 (d00d64df9 2020-12-05) uname -a Darwin doug-mbp 20.3.0 Darwin Kernel Version 20.3.0: Thu Jan 21 00:07:06 PST 2021; root:xnu-7195.81.3~1/RELEASE_X86_64 x86_64
If I could get my 6s module compile time down to your 82ms it would be game changing for turn around time on testing for me.
softprops commented on Issue #2662:
Upgrading to 0.23 for me didn't seem to make a different in module compile times for me
lqd commented on Issue #2662:
@softprops I'm not exactly sure about these steps to reproduce but I cloned
fasttime
just now (but locally changeduser-agent-parser
to depend ononig
this way to make my life easier on windows), built thetests/app
forwasm32-wasi
, and have these results:$ ll -h tests/app/target/wasm32-wasi/release/app.wasm -rw-r--r-- 2 lqd 197121 3.2M Feb 27 11:02 tests/app/target/wasm32-wasi/release/app.wasm $ cargo run --release -- -w ./tests/app/target/wasm32-wasi/release/app.wasm Finished release [optimized] target(s) in 0.30s Running `target\release\fasttime.exe -w ./tests/app/target/wasm32-wasi/release/app.wasm` ◌ Loading module... ✔ Loaded module in 150.6526ms ✨ DEBUG: {} ● Listening on http://127.0.0.1:3000
lqd edited a comment on Issue #2662:
@softprops I'm not exactly sure about these steps to reproduce but I cloned
fasttime
just now (but locally changeduser-agent-parser
to depend ononig
this way to make my life easier on windows), built thetests/app
forwasm32-wasi
, and have these results:$ ll -h tests/app/target/wasm32-wasi/release/app.wasm -rw-r--r-- 2 lqd 197121 3.2M Feb 27 11:02 tests/app/target/wasm32-wasi/release/app.wasm $ cargo run --release -- -w ./tests/app/target/wasm32-wasi/release/app.wasm Finished release [optimized] target(s) in 0.30s Running `target\release\fasttime.exe -w ./tests/app/target/wasm32-wasi/release/app.wasm` ◌ Loading module... ✔ Loaded module in 150.6526ms ✨ DEBUG: {} ● Listening on http://127.0.0.1:3000
EDIT: those measurements are on nightly from a couple days ago.
On 1.46.0, the wasm is smaller (the 2.7mb size you mentioned) and
fasttime
loads it here in 113ms as well)
lqd edited a comment on Issue #2662:
@softprops I'm not exactly sure about these steps to reproduce but I cloned
fasttime
just now (but locally changeduser-agent-parser
to depend ononig
this way to make my life easier on windows), built thetests/app
forwasm32-wasi
, and have these results:$ ll -h tests/app/target/wasm32-wasi/release/app.wasm -rw-r--r-- 2 lqd 197121 3.2M Feb 27 11:02 tests/app/target/wasm32-wasi/release/app.wasm $ cargo run --release -- -w ./tests/app/target/wasm32-wasi/release/app.wasm Finished release [optimized] target(s) in 0.30s Running `target\release\fasttime.exe -w ./tests/app/target/wasm32-wasi/release/app.wasm` ◌ Loading module... ✔ Loaded module in 150.6526ms ✨ DEBUG: {} ● Listening on http://127.0.0.1:3000
EDIT: those measurements are on a nightly from a couple days ago.
On 1.46.0, the wasm is smaller (the 2.7mb size you mentioned) and
fasttime
loads it here in 113ms as well)
softprops commented on Issue #2662:
Wow. I wonder if this is just a macOS specific issue
softprops edited a comment on Issue #2662:
Upgrading to 0.23 for me didn't seem to make a difference in module compile times for me
lqd edited a comment on Issue #2662:
@softprops I'm not exactly sure about these steps to reproduce but I cloned
fasttime
just now (but locally changeduser-agent-parser
to depend ononig
this way to make my life easier on windows), built thetests/app
forwasm32-wasi
, and have these results:$ ll -h tests/app/target/wasm32-wasi/release/app.wasm -rw-r--r-- 2 lqd 197121 3.2M Feb 27 11:02 tests/app/target/wasm32-wasi/release/app.wasm $ cargo run --release -- -w ./tests/app/target/wasm32-wasi/release/app.wasm Finished release [optimized] target(s) in 0.30s Running `target\release\fasttime.exe -w ./tests/app/target/wasm32-wasi/release/app.wasm` ◌ Loading module... ✔ Loaded module in 150.6526ms ✨ DEBUG: {} ● Listening on http://127.0.0.1:3000
EDIT: those measurements are on a nightly from a couple days ago.
On 1.46.0, the wasm is smaller (the 2.7mb size you mentioned) and
fasttime
loads it here in 113ms as well)EDIT 2: I've tested this on linux as well and the numbers are similar
lqd edited a comment on Issue #2662:
@softprops I'm not exactly sure about these steps to reproduce but I cloned
fasttime
just now (but locally changeduser-agent-parser
to depend ononig
this way to make my life easier on windows), built thetests/app
forwasm32-wasi
, and have these results:$ ll -h tests/app/target/wasm32-wasi/release/app.wasm -rw-r--r-- 2 lqd 197121 3.2M Feb 27 11:02 tests/app/target/wasm32-wasi/release/app.wasm $ cargo run --release -- -w ./tests/app/target/wasm32-wasi/release/app.wasm Finished release [optimized] target(s) in 0.30s Running `target\release\fasttime.exe -w ./tests/app/target/wasm32-wasi/release/app.wasm` ◌ Loading module... ✔ Loaded module in 150.6526ms ✨ DEBUG: {} ● Listening on http://127.0.0.1:3000
EDIT: those measurements are on a nightly from a couple days ago.
On 1.46.0, the wasm is smaller (the 2.7mb size you mentioned) and
fasttime
loads it here in 113ms as well)EDIT 2: I've tested this on linux on the same machine as well and the numbers are similar
lqd commented on Issue #2662:
Maybe.
At this point this looks different enough from the OP that it's probably better to open a dedicated issue.
FWIW, I myself can't reproduce a
Module::from_file
slowdown with your wasm test app on windows, linux, or a 10-year old macbook AIR
lqd edited a comment on Issue #2662:
Maybe.
At this point this looks different enough from the OP that it's probably better to open a dedicated issue.
FWIW, I myself can't reproduce a
Module::from_file
slowdown with your wasm test app on windows, linux, or a 10-year old MacBook Air
alexcrichton commented on Issue #2662:
@softprops Checking out the latest softprops/fasttime@94faa0e7bd351b67e8d1444e0bc09c32764111b2 on macOS I get the same timings as @lqd got above (using the same app as well). Loading the module you referenced above it says it compiles in ~90ms.
To confirm, are you sure that you're building fasttime with
--release
? 60ms to 6s sounds like a Rust release mode thing that would cause the slowdown. Otherwise do you perhaps have any environment variables configured likedRUST_LOG
or similar?
softprops commented on Issue #2662:
@alexcrichton you win! there's a world of a difference with wasm compile times between debug and release mode. I tried with both and I go from ~6s to ~300ms which is a night and day improvement for me. That's magic!
./target/debug/fasttime -w tests/app/target/wasm32-wasi/release/app.wasm ◌ Loading module... ✔ Loaded module in 6.793324s ✨ DEBUG: {} ● Listening on http://127.0.0.1:3000
target/release/fasttime -w tests/app/target/wasm32-wasi/release/app.wasm ◌ Loading module... ✔ Loaded module in 309.38ms ✨ DEBUG: {} ● Listening on http://127.0.0.1:3000
alexcrichton commented on Issue #2662:
Ok great! I'm gonna go ahead and close this issue since I believe this was all largely handled.
alexcrichton closed Issue #2662:
Thanks for opening a bug report! Please answer the questions below
if they're relevant and delete this text before submitting.
- What are the steps to reproduce the issue?
I have pushed a repository at https://github.com/wycats/wand/tree/str-wasmtime. The branch
str-wasmtime
is self-contained.The script
./script/all.sh
runs all of the build steps. The final step,./script/run.sh
, runs an executable, which is a Rust executable that uses thewasmtime
embedding API. Its source is located in./crates/wand-cli
. The Rust source for the wasm is located in./src
.The build scripts use
wasm-pack
to compile and optimize the wasm. They copy the.wasm
file into adata
directory in thewand-cli
crate, which usesinclude_bytes!
to get the bytes.The executable has a bunch of
println!
s that list out the execution steps and how much time has elapsed.
- What do you expect to happen? What does actually happen? Does it panic, and
if so, with which assertion?I expect the
Module::new
step to be relatively fast. Instead, it takes several seconds on my machine.
- Which Wasmtime version / commit hash / branch are you using?
0.22.
- If relevant, can you include some extra information about your environment?
(Rust version, operating system, architecture...)$ rustc --version rustc 1.52.0-nightly (d1206f950 2021-02-15) $ cargo --version cargo 1.51.0-nightly (ab64d1393 2021-02-10) $ uname -a Linux XPS-13-2019 4.19.84-microsoft-standard #1 SMP Wed Nov 13 11:44:37 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.2 LTS Release: 20.04 Codename: focal
I'm on Ubuntu 20.04 inside of WSL2 on Windows 10.
Last updated: Nov 22 2024 at 16:03 UTC