bongjunj requested wasmtime-compiler-reviewers for a review on PR #12303.
bongjunj opened PR #12303 from bongjunj:optimize-isle-compilation to bytecodealliance:main:
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->Overview
The current implementation of ISLE code generation emits a single Rust function for each ISLE term, and
rustccompiles the Rust code to integrate ISLE with other modules of Cranelift.This can cause a compilation bottleneck when a ISLE term contains numerous rules ending up with producing a very very large Rust function, since
rustccannot efficiently compile such functions. Notably, the termsimplifywhich implements the mid-end peephole optimizations has about 500 rules, and the symptom is already visible. (And the ruleset is growing!) You can check, compiling Cranelift,cranelift-codegentakes most of the time in the following report: cargo-timing.htmlPlus: a timing report for compiling wasmtime:
With this PR, the ISLE codegen helps
rustcby wrapping a large match statement generated byislecin a closure. By introducing closures, a portion of a huge Rust function (for a ISLE term) can be split into several smaller compilation units forrustc, reducing the compilation time.Evaluation
On the
mainbranch, the build times forcranelift-codegenbefore/after this optimization are measured. This PR saves ~2 seconds.
- Before: 23.81 sec
~/wasmtime (optimize-isle-compilation)> cargo clean Removed 10031 files, 12.6GiB total ~/wasmtime (optimize-isle-compilation)> time cargo build -q -p cranelift-codegen ________________________________________________________ Executed in 23.81 secs fish external usr time 43.59 secs 0.00 micros 43.59 secs sys time 5.04 secs 592.00 micros 5.04 secs
- After: 26.04 sec
~/wasmtime ((dev))> cargo clean t Removed 1356 files, 1.0GiB total ~/wasmtime ((dev))> time cargo build -q -p cranelift-codegen ________________________________________________________ Executed in 26.04 secs fish external usr time 46.58 secs 0.00 micros 46.58 secs sys time 5.15 secs 583.00 micros 5.15 secs
Cargo Timing Reports
Before:
cargo-timing-old.htmlAfter:
cargo-timing-new.htmlExtra
I am experimenting with over 6,000 ISLE
simplifyrules.
The compilation time reduced from 841.37 seconds to 69.34 seconds with this PR. (+11x speedup)~/w/cranelift ((many-rules-no-opt))> cargo clean && time cargo build -p cranelift-codegen ________________________________________________________ Executed in 841.37 secs fish external usr time 826.92 secs 325.00 micros 826.92 secs sys time 49.98 secs 135.00 micros 49.98 secs ~/w/cranelift (many-rules-opt)> cargo clean && time cargo build -q -p cranelift-codegen ________________________________________________________ Executed in 69.34 secs fish external usr time 98.32 secs 362.00 micros 98.32 secs sys time 6.42 secs 149.00 micros 6.42 secs
bongjunj requested fitzgen for a review on PR #12303.
bongjunj edited PR #12303:
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->Overview
The current implementation of ISLE code generation emits a single Rust function for each ISLE term, and
rustccompiles the Rust code to integrate ISLE with other modules of Cranelift.This can cause a compilation bottleneck when a ISLE term contains numerous rules ending up with producing a very very large Rust function, since
rustccannot efficiently compile such functions. Notably, the termsimplifywhich implements the mid-end peephole optimizations has about 500 rules, and the symptom is already visible. (And the ruleset is growing!) You can check, compiling Cranelift,cranelift-codegentakes most of the time in the following report: cargo-timing.htmlPlus: a timing report for compiling wasmtime:
With this PR, the ISLE codegen helps
rustcby wrapping a large match statement generated byislecin a closure. By introducing closures, a portion of a huge Rust function (for a ISLE term) can be split into several smaller compilation units forrustc, reducing the compilation time.Evaluation
On the
mainbranch, the build times forcranelift-codegenbefore/after this optimization are measured. This PR saves ~2 seconds.
- Before: 23.81 sec
~/wasmtime (optimize-isle-compilation)> cargo clean Removed 10031 files, 12.6GiB total ~/wasmtime (optimize-isle-compilation)> time cargo build -q -p cranelift-codegen ________________________________________________________ Executed in 23.81 secs fish external usr time 43.59 secs 0.00 micros 43.59 secs sys time 5.04 secs 592.00 micros 5.04 secs
- After: 26.04 sec
~/wasmtime ((dev))> cargo clean t Removed 1356 files, 1.0GiB total ~/wasmtime ((dev))> time cargo build -q -p cranelift-codegen ________________________________________________________ Executed in 26.04 secs fish external usr time 46.58 secs 0.00 micros 46.58 secs sys time 5.15 secs 583.00 micros 5.15 secs
- Cargo Timing Reports
- Before: cargo-timing-old.html
- After: cargo-timing-new.html
Extra
I am experimenting with over 6,000 ISLE
simplifyrules.
The compilation time reduced from 841.37 seconds to 69.34 seconds with this PR. (+11x speedup)~/w/cranelift ((many-rules-no-opt))> cargo clean && time cargo build -p cranelift-codegen ________________________________________________________ Executed in 841.37 secs fish external usr time 826.92 secs 325.00 micros 826.92 secs sys time 49.98 secs 135.00 micros 49.98 secs ~/w/cranelift (many-rules-opt)> cargo clean && time cargo build -q -p cranelift-codegen ________________________________________________________ Executed in 69.34 secs fish external usr time 98.32 secs 362.00 micros 98.32 secs sys time 6.42 secs 149.00 micros 6.42 secs
bongjunj edited PR #12303:
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->Overview
The current implementation of ISLE code generation emits a single Rust function for each ISLE term, and
rustccompiles the Rust code to integrate ISLE with other modules of Cranelift.This can cause a compilation bottleneck when a ISLE term contains numerous rules ending up with producing a very very large Rust function, since
rustccannot efficiently compile such functions. Notably, the termsimplifywhich implements the mid-end peephole optimizations has about 500 rules, and the symptom is already visible. (And the ruleset is growing!) You can check, compiling Cranelift,cranelift-codegentakes most of the time in the following report: cargo-timing.htmlPlus: a timing report for compiling wasmtime:
With this PR, the ISLE codegen helps
rustcby wrapping a large match statement generated byislecin a closure. By introducing closures, a portion of a huge Rust function (for a ISLE term) can be split into several smaller compilation units forrustc, reducing the compilation time.Evaluation
On the
mainbranch, the build times forcranelift-codegenbefore/after this optimization are measured. This PR saves ~2 seconds on my machine (x86-64, 64 core, 512GB memory)
- Before: 23.81 sec
~/wasmtime (optimize-isle-compilation)> cargo clean Removed 10031 files, 12.6GiB total ~/wasmtime (optimize-isle-compilation)> time cargo build -q -p cranelift-codegen ________________________________________________________ Executed in 23.81 secs fish external usr time 43.59 secs 0.00 micros 43.59 secs sys time 5.04 secs 592.00 micros 5.04 secs
- After: 26.04 sec
~/wasmtime ((dev))> cargo clean t Removed 1356 files, 1.0GiB total ~/wasmtime ((dev))> time cargo build -q -p cranelift-codegen ________________________________________________________ Executed in 26.04 secs fish external usr time 46.58 secs 0.00 micros 46.58 secs sys time 5.15 secs 583.00 micros 5.15 secs
- Cargo Timing Reports
- Before: cargo-timing-old.html
- After: cargo-timing-new.html
Extra
I am experimenting with over 6,000 ISLE
simplifyrules.
The compilation time reduced from 841.37 seconds to 69.34 seconds with this PR. (+11x speedup)~/w/cranelift ((many-rules-no-opt))> cargo clean && time cargo build -p cranelift-codegen ________________________________________________________ Executed in 841.37 secs fish external usr time 826.92 secs 325.00 micros 826.92 secs sys time 49.98 secs 135.00 micros 49.98 secs ~/w/cranelift (many-rules-opt)> cargo clean && time cargo build -q -p cranelift-codegen ________________________________________________________ Executed in 69.34 secs fish external usr time 98.32 secs 362.00 micros 98.32 secs sys time 6.42 secs 149.00 micros 6.42 secs
cfallin commented on PR #12303:
@bongjunj thanks for this experimentation!
Before moving further, would you mind benchmarking compilation time as well? One of the reasons for putting all of the codegen for an ISLE term in one function is so that the compiler can optimize the code together; splitting that code between functions and (especially) introducing ABI boundaries may produce slowdowns when Cranelift runs. IMHO, it's worth it to spend a few extra seconds when compiling Cranelift if it makes Cranelift itself run faster. If no effect, of course, then no problem and I'll be happy to review this. Thanks!
github-actions[bot] commented on PR #12303:
Subscribe to Label Action
cc @cfallin, @fitzgen
<details>
This issue or pull request has been labeled: "cranelift", "isle"Thus the following users have been cc'd because of the following labels:
- cfallin: isle
- fitzgen: isle
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
alexcrichton commented on PR #12303:
An 11x reduction in compile time is quite massive, so even if this is a ~NN% regression in compile-time-of-wasm-code I think this would be a great thing to land at least for debug builds and maybe optionally for release builds with some sort of tunable too.
cfallin commented on PR #12303:
Note, that's an 11x reduction with Bongjun's huge new ruleset; ~10% on
main. That's neat but I wouldn't regress compile time for it IMHO.
fitzgen commented on PR #12303:
If it does result in Wasm compilation time regressions, it may still make sense to have this as an option of the ISLE compiler that we can enable during development or something, if we can make the maintenance burden minimal.
fitzgen submitted PR review.
fitzgen created PR review comment:
Regarding my previous comment: it could potentially make sense to make
MATCH_ARM_BODY_CLOSURE_THRESHOLDa dynamic ISLE compilation option, rather than a constant, and then tweak that value in Cranelift's invocation of the ISLE compiler depending on if a cargo feature is enabled or whether this is a release vs debug build of Cranelift or something.
bongjunj edited PR #12303:
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->Overview
The current implementation of ISLE code generation emits a single Rust function for each ISLE term, and
rustccompiles the Rust code to integrate ISLE with other modules of Cranelift.This can cause a compilation bottleneck when a ISLE term contains numerous rules ending up with producing a very very large Rust function, since
rustccannot efficiently compile such functions. Notably, the termsimplifywhich implements the mid-end peephole optimizations has about 500 rules, and the symptom is already visible. (And the ruleset is growing!) You can check, compiling Cranelift,cranelift-codegentakes most of the time in the following report: cargo-timing.htmlPlus: a timing report for compiling wasmtime:
With this PR, the ISLE codegen helps
rustcby wrapping a large match statement generated byislecin a closure. By introducing closures, a portion of a huge Rust function (for a ISLE term) can be split into several smaller compilation units forrustc, reducing the compilation time.Evaluation
On the
mainbranch, the build times forcranelift-codegenbefore/after this optimization are measured. This PR saves ~2 seconds on my machine (x86-64, 64 core, 512GB memory)
- After: 23.81 sec
~/wasmtime (optimize-isle-compilation)> cargo clean Removed 10031 files, 12.6GiB total ~/wasmtime (optimize-isle-compilation)> time cargo build -q -p cranelift-codegen ________________________________________________________ Executed in 23.81 secs fish external usr time 43.59 secs 0.00 micros 43.59 secs sys time 5.04 secs 592.00 micros 5.04 secs
- Before: 26.04 sec
~/wasmtime ((dev))> cargo clean t Removed 1356 files, 1.0GiB total ~/wasmtime ((dev))> time cargo build -q -p cranelift-codegen ________________________________________________________ Executed in 26.04 secs fish external usr time 46.58 secs 0.00 micros 46.58 secs sys time 5.15 secs 583.00 micros 5.15 secs
- Cargo Timing Reports
- Before: cargo-timing-old.html
- After: cargo-timing-new.html
Extra
I am experimenting with over 6,000 ISLE
simplifyrules.
The compilation time reduced from 841.37 seconds to 69.34 seconds with this PR. (+11x speedup)~/w/cranelift ((many-rules-no-opt))> cargo clean && time cargo build -p cranelift-codegen ________________________________________________________ Executed in 841.37 secs fish external usr time 826.92 secs 325.00 micros 826.92 secs sys time 49.98 secs 135.00 micros 49.98 secs ~/w/cranelift (many-rules-opt)> cargo clean && time cargo build -q -p cranelift-codegen ________________________________________________________ Executed in 69.34 secs fish external usr time 98.32 secs 362.00 micros 98.32 secs sys time 6.42 secs 149.00 micros 6.42 secs
bongjunj edited PR #12303:
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->Overview
The current implementation of ISLE code generation emits a single Rust function for each ISLE term, and
rustccompiles the Rust code to integrate ISLE with other modules of Cranelift.This can cause a compilation bottleneck when a ISLE term contains numerous rules ending up with producing a very very large Rust function, since
rustccannot efficiently compile such functions. Notably, the termsimplifywhich implements the mid-end peephole optimizations has about 500 rules, and the symptom is already visible. (And the ruleset is growing!) You can check, compiling Cranelift,cranelift-codegentakes most of the time in the following report: cargo-timing.htmlWith this PR, the ISLE codegen helps
rustcby wrapping a large match statement generated byislecin a closure. By introducing closures, a portion of a huge Rust function (for a ISLE term) can be split into several smaller compilation units forrustc, reducing the compilation time.Evaluation
On the
mainbranch, the build times forcranelift-codegenbefore/after this optimization are measured. This PR saves ~2 seconds on my machine (x86-64, 64 core, 512GB memory)
- After: 23.81 sec
~/wasmtime (optimize-isle-compilation)> cargo clean Removed 10031 files, 12.6GiB total ~/wasmtime (optimize-isle-compilation)> time cargo build -q -p cranelift-codegen ________________________________________________________ Executed in 23.81 secs fish external usr time 43.59 secs 0.00 micros 43.59 secs sys time 5.04 secs 592.00 micros 5.04 secs
- Before: 26.04 sec
~/wasmtime ((dev))> cargo clean t Removed 1356 files, 1.0GiB total ~/wasmtime ((dev))> time cargo build -q -p cranelift-codegen ________________________________________________________ Executed in 26.04 secs fish external usr time 46.58 secs 0.00 micros 46.58 secs sys time 5.15 secs 583.00 micros 5.15 secs
- Cargo Timing Reports
- Before: cargo-timing-old.html
- After: cargo-timing-new.html
Extra
I am experimenting with over 6,000 ISLE
simplifyrules.
The compilation time reduced from 841.37 seconds to 69.34 seconds with this PR. (+11x speedup)~/w/cranelift ((many-rules-no-opt))> cargo clean && time cargo build -p cranelift-codegen ________________________________________________________ Executed in 841.37 secs fish external usr time 826.92 secs 325.00 micros 826.92 secs sys time 49.98 secs 135.00 micros 49.98 secs ~/w/cranelift (many-rules-opt)> cargo clean && time cargo build -q -p cranelift-codegen ________________________________________________________ Executed in 69.34 secs fish external usr time 98.32 secs 362.00 micros 98.32 secs sys time 6.42 secs 149.00 micros 6.42 secs
bongjunj commented on PR #12303:
@cfallin
Thanks for the comment! I've ran the benchmarks and measure the compilation time.
It turned out that the compilation overhead is almost next to zero.
This is probably because it introduces closures only for large pattern matches.
The raw data (for 10 iterations -- default setting of sightglass-cli) is as below:
bench optimize-isle-compilation main overhead bz2 273,462,445 271,631,283 0.67% pulldown-cmark 623,480,512 622,976,525 0.08% spidermonkey 21,592,595,980 21,523,101,083 0.32%
bongjunj submitted PR review.
bongjunj created PR review comment:
FYI I experimented three values for this: 128, 256, 512.
The best one was 256 for the current ruleset (~500 rules), and 512 for my massive ruleset (~6,700 rules).
There could be some relation between threshold value and the ruleset size. Additionally, the parallelism available of the development environment could affect too.However, as this could affect the quality of the compiled cranelift, I assumed at first sticking to the best value was a good idea.
cfallin commented on PR #12303:
Sorry for the delayed response here -- was traveling then lost track of this PR.
Thanks for the experiments! I think I agree with Nick that this would be good to have as an option, probably off by default. 0.67% compile time is still an unfortunate regression to take (on the flip side, 1% speedups make us pretty happy and take work to find), and is more important than the time taken to compile Cranelift, since usually you compile Cranelift once then use that compiled Cranelift-containing program many times. But when developing Cranelift, it would be useful to have this option.
Could you put this under a Cargo feature that alters the behavior of
islecas invoked bycranelift-codegen'sbuild.rs?
cfallin edited a comment on PR #12303:
Sorry for the delayed response here -- was traveling then lost track of this PR.
Thanks for the experiments! I think I agree with Nick that this would be good to have as an option, probably off by default. 0.67% compile time is still an unfortunate regression to take (on the flip side, 1% speedups make us pretty happy and take work to find), and is (usually) more important than the time taken to compile Cranelift, since usually you compile Cranelift once then use that compiled Cranelift-containing program many times. But when developing Cranelift, it would be useful to have this option.
Could you put this under a Cargo feature that alters the behavior of
islecas invoked bycranelift-codegen'sbuild.rs?
bongjunj commented on PR #12303:
@cfallin thanks for the comment!
I think I could handle this until this weekend. Will mention you when I finish the feature.Thank you.
bongjunj updated PR #12303.
bongjunj updated PR #12303.
bongjunj updated PR #12303.
bongjunj commented on PR #12303:
@cfallin This is now wrapped under a Cargo feature
isle-split-matchon the cratecranelift-codegen.@fitzgen Now we have a controllable splitting threshold, via setting an environment variable:
ISLE_SPLIT_MATCH_THRESHOLD=512
cfallin submitted PR review:
LGTM -- thanks very much for the patience here!
cfallin added PR #12303 Optimize ISLE compilation to the merge queue.
cfallin merged PR #12303.
cfallin removed PR #12303 Optimize ISLE compilation from the merge queue.
Last updated: Jan 29 2026 at 13:25 UTC