Stream: git-wasmtime

Topic: wasmtime / PR #12303 Optimize ISLE compilation


view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2026 at 08:48):

bongjunj requested wasmtime-compiler-reviewers for a review on PR #12303.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2026 at 08:48):

bongjunj opened PR #12303 from bongjunj:optimize-isle-compilation to bytecodealliance:main:

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

Overview

The current implementation of ISLE code generation emits a single Rust function for each ISLE term, and rustc compiles the Rust code to integrate ISLE with other modules of Cranelift.

This can cause a compilation bottleneck when a ISLE term contains numerous rules ending up with producing a very very large Rust function, since rustc cannot efficiently compile such functions. Notably, the term simplify which implements the mid-end peephole optimizations has about 500 rules, and the symptom is already visible. (And the ruleset is growing!) You can check, compiling Cranelift, cranelift-codegen takes most of the time in the following report: cargo-timing.html

Plus: a timing report for compiling wasmtime:

With this PR, the ISLE codegen helps rustc by wrapping a large match statement generated by islec in a closure. By introducing closures, a portion of a huge Rust function (for a ISLE term) can be split into several smaller compilation units for rustc, reducing the compilation time.

Evaluation

On the main branch, the build times for cranelift-codegen before/after this optimization are measured. This PR saves ~2 seconds.

~/wasmtime (optimize-isle-compilation)> cargo clean
     Removed 10031 files, 12.6GiB total
~/wasmtime (optimize-isle-compilation)> time cargo build -q -p cranelift-codegen

________________________________________________________
Executed in   23.81 secs    fish           external
   usr time   43.59 secs    0.00 micros   43.59 secs
   sys time    5.04 secs  592.00 micros    5.04 secs
~/wasmtime ((dev))> cargo clean
t     Removed 1356 files, 1.0GiB total
~/wasmtime ((dev))> time cargo build -q -p cranelift-codegen

________________________________________________________
Executed in   26.04 secs    fish           external
   usr time   46.58 secs    0.00 micros   46.58 secs
   sys time    5.15 secs  583.00 micros    5.15 secs

Extra

I am experimenting with over 6,000 ISLE simplify rules.
The compilation time reduced from 841.37 seconds to 69.34 seconds with this PR. (+11x speedup)

~/w/cranelift ((many-rules-no-opt))> cargo clean && time cargo build -p cranelift-codegen
________________________________________________________
Executed in  841.37 secs    fish           external
   usr time  826.92 secs  325.00 micros  826.92 secs
   sys time   49.98 secs  135.00 micros   49.98 secs

 ~/w/cranelift (many-rules-opt)> cargo clean && time cargo build -q -p cranelift-codegen
________________________________________________________
Executed in   69.34 secs    fish           external
   usr time   98.32 secs  362.00 micros   98.32 secs
   sys time    6.42 secs  149.00 micros    6.42 secs

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2026 at 08:48):

bongjunj requested fitzgen for a review on PR #12303.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2026 at 08:48):

bongjunj edited PR #12303:

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

Overview

The current implementation of ISLE code generation emits a single Rust function for each ISLE term, and rustc compiles the Rust code to integrate ISLE with other modules of Cranelift.

This can cause a compilation bottleneck when a ISLE term contains numerous rules ending up with producing a very very large Rust function, since rustc cannot efficiently compile such functions. Notably, the term simplify which implements the mid-end peephole optimizations has about 500 rules, and the symptom is already visible. (And the ruleset is growing!) You can check, compiling Cranelift, cranelift-codegen takes most of the time in the following report: cargo-timing.html

Plus: a timing report for compiling wasmtime:

With this PR, the ISLE codegen helps rustc by wrapping a large match statement generated by islec in a closure. By introducing closures, a portion of a huge Rust function (for a ISLE term) can be split into several smaller compilation units for rustc, reducing the compilation time.

Evaluation

On the main branch, the build times for cranelift-codegen before/after this optimization are measured. This PR saves ~2 seconds.

~/wasmtime (optimize-isle-compilation)> cargo clean
     Removed 10031 files, 12.6GiB total
~/wasmtime (optimize-isle-compilation)> time cargo build -q -p cranelift-codegen

________________________________________________________
Executed in   23.81 secs    fish           external
   usr time   43.59 secs    0.00 micros   43.59 secs
   sys time    5.04 secs  592.00 micros    5.04 secs
~/wasmtime ((dev))> cargo clean
t     Removed 1356 files, 1.0GiB total
~/wasmtime ((dev))> time cargo build -q -p cranelift-codegen

________________________________________________________
Executed in   26.04 secs    fish           external
   usr time   46.58 secs    0.00 micros   46.58 secs
   sys time    5.15 secs  583.00 micros    5.15 secs

Extra

I am experimenting with over 6,000 ISLE simplify rules.
The compilation time reduced from 841.37 seconds to 69.34 seconds with this PR. (+11x speedup)

~/w/cranelift ((many-rules-no-opt))> cargo clean && time cargo build -p cranelift-codegen
________________________________________________________
Executed in  841.37 secs    fish           external
   usr time  826.92 secs  325.00 micros  826.92 secs
   sys time   49.98 secs  135.00 micros   49.98 secs

 ~/w/cranelift (many-rules-opt)> cargo clean && time cargo build -q -p cranelift-codegen
________________________________________________________
Executed in   69.34 secs    fish           external
   usr time   98.32 secs  362.00 micros   98.32 secs
   sys time    6.42 secs  149.00 micros    6.42 secs

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2026 at 08:52):

bongjunj edited PR #12303:

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

Overview

The current implementation of ISLE code generation emits a single Rust function for each ISLE term, and rustc compiles the Rust code to integrate ISLE with other modules of Cranelift.

This can cause a compilation bottleneck when a ISLE term contains numerous rules ending up with producing a very very large Rust function, since rustc cannot efficiently compile such functions. Notably, the term simplify which implements the mid-end peephole optimizations has about 500 rules, and the symptom is already visible. (And the ruleset is growing!) You can check, compiling Cranelift, cranelift-codegen takes most of the time in the following report: cargo-timing.html

Plus: a timing report for compiling wasmtime:

With this PR, the ISLE codegen helps rustc by wrapping a large match statement generated by islec in a closure. By introducing closures, a portion of a huge Rust function (for a ISLE term) can be split into several smaller compilation units for rustc, reducing the compilation time.

Evaluation

On the main branch, the build times for cranelift-codegen before/after this optimization are measured. This PR saves ~2 seconds on my machine (x86-64, 64 core, 512GB memory)

~/wasmtime (optimize-isle-compilation)> cargo clean
     Removed 10031 files, 12.6GiB total
~/wasmtime (optimize-isle-compilation)> time cargo build -q -p cranelift-codegen

________________________________________________________
Executed in   23.81 secs    fish           external
   usr time   43.59 secs    0.00 micros   43.59 secs
   sys time    5.04 secs  592.00 micros    5.04 secs
~/wasmtime ((dev))> cargo clean
t     Removed 1356 files, 1.0GiB total
~/wasmtime ((dev))> time cargo build -q -p cranelift-codegen

________________________________________________________
Executed in   26.04 secs    fish           external
   usr time   46.58 secs    0.00 micros   46.58 secs
   sys time    5.15 secs  583.00 micros    5.15 secs

Extra

I am experimenting with over 6,000 ISLE simplify rules.
The compilation time reduced from 841.37 seconds to 69.34 seconds with this PR. (+11x speedup)

~/w/cranelift ((many-rules-no-opt))> cargo clean && time cargo build -p cranelift-codegen
________________________________________________________
Executed in  841.37 secs    fish           external
   usr time  826.92 secs  325.00 micros  826.92 secs
   sys time   49.98 secs  135.00 micros   49.98 secs

 ~/w/cranelift (many-rules-opt)> cargo clean && time cargo build -q -p cranelift-codegen
________________________________________________________
Executed in   69.34 secs    fish           external
   usr time   98.32 secs  362.00 micros   98.32 secs
   sys time    6.42 secs  149.00 micros    6.42 secs

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2026 at 09:16):

cfallin commented on PR #12303:

@bongjunj thanks for this experimentation!

Before moving further, would you mind benchmarking compilation time as well? One of the reasons for putting all of the codegen for an ISLE term in one function is so that the compiler can optimize the code together; splitting that code between functions and (especially) introducing ABI boundaries may produce slowdowns when Cranelift runs. IMHO, it's worth it to spend a few extra seconds when compiling Cranelift if it makes Cranelift itself run faster. If no effect, of course, then no problem and I'll be happy to review this. Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2026 at 13:03):

github-actions[bot] commented on PR #12303:

Subscribe to Label Action

cc @cfallin, @fitzgen

<details>
This issue or pull request has been labeled: "cranelift", "isle"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2026 at 16:09):

alexcrichton commented on PR #12303:

An 11x reduction in compile time is quite massive, so even if this is a ~NN% regression in compile-time-of-wasm-code I think this would be a great thing to land at least for debug builds and maybe optionally for release builds with some sort of tunable too.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2026 at 16:11):

cfallin commented on PR #12303:

Note, that's an 11x reduction with Bongjun's huge new ruleset; ~10% on main. That's neat but I wouldn't regress compile time for it IMHO.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2026 at 17:32):

fitzgen commented on PR #12303:

If it does result in Wasm compilation time regressions, it may still make sense to have this as an option of the ISLE compiler that we can enable during development or something, if we can make the maintenance burden minimal.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2026 at 17:35):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2026 at 17:35):

fitzgen created PR review comment:

Regarding my previous comment: it could potentially make sense to make MATCH_ARM_BODY_CLOSURE_THRESHOLD a dynamic ISLE compilation option, rather than a constant, and then tweak that value in Cranelift's invocation of the ISLE compiler depending on if a cargo feature is enabled or whether this is a release vs debug build of Cranelift or something.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 12 2026 at 02:02):

bongjunj edited PR #12303:

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

Overview

The current implementation of ISLE code generation emits a single Rust function for each ISLE term, and rustc compiles the Rust code to integrate ISLE with other modules of Cranelift.

This can cause a compilation bottleneck when a ISLE term contains numerous rules ending up with producing a very very large Rust function, since rustc cannot efficiently compile such functions. Notably, the term simplify which implements the mid-end peephole optimizations has about 500 rules, and the symptom is already visible. (And the ruleset is growing!) You can check, compiling Cranelift, cranelift-codegen takes most of the time in the following report: cargo-timing.html

Plus: a timing report for compiling wasmtime:

With this PR, the ISLE codegen helps rustc by wrapping a large match statement generated by islec in a closure. By introducing closures, a portion of a huge Rust function (for a ISLE term) can be split into several smaller compilation units for rustc, reducing the compilation time.

Evaluation

On the main branch, the build times for cranelift-codegen before/after this optimization are measured. This PR saves ~2 seconds on my machine (x86-64, 64 core, 512GB memory)

~/wasmtime (optimize-isle-compilation)> cargo clean
     Removed 10031 files, 12.6GiB total
~/wasmtime (optimize-isle-compilation)> time cargo build -q -p cranelift-codegen

________________________________________________________
Executed in   23.81 secs    fish           external
   usr time   43.59 secs    0.00 micros   43.59 secs
   sys time    5.04 secs  592.00 micros    5.04 secs
~/wasmtime ((dev))> cargo clean
t     Removed 1356 files, 1.0GiB total
~/wasmtime ((dev))> time cargo build -q -p cranelift-codegen

________________________________________________________
Executed in   26.04 secs    fish           external
   usr time   46.58 secs    0.00 micros   46.58 secs
   sys time    5.15 secs  583.00 micros    5.15 secs

Extra

I am experimenting with over 6,000 ISLE simplify rules.
The compilation time reduced from 841.37 seconds to 69.34 seconds with this PR. (+11x speedup)

~/w/cranelift ((many-rules-no-opt))> cargo clean && time cargo build -p cranelift-codegen
________________________________________________________
Executed in  841.37 secs    fish           external
   usr time  826.92 secs  325.00 micros  826.92 secs
   sys time   49.98 secs  135.00 micros   49.98 secs

 ~/w/cranelift (many-rules-opt)> cargo clean && time cargo build -q -p cranelift-codegen
________________________________________________________
Executed in   69.34 secs    fish           external
   usr time   98.32 secs  362.00 micros   98.32 secs
   sys time    6.42 secs  149.00 micros    6.42 secs

view this post on Zulip Wasmtime GitHub notifications bot (Jan 12 2026 at 04:25):

bongjunj edited PR #12303:

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

Overview

The current implementation of ISLE code generation emits a single Rust function for each ISLE term, and rustc compiles the Rust code to integrate ISLE with other modules of Cranelift.

This can cause a compilation bottleneck when a ISLE term contains numerous rules ending up with producing a very very large Rust function, since rustc cannot efficiently compile such functions. Notably, the term simplify which implements the mid-end peephole optimizations has about 500 rules, and the symptom is already visible. (And the ruleset is growing!) You can check, compiling Cranelift, cranelift-codegen takes most of the time in the following report: cargo-timing.html

With this PR, the ISLE codegen helps rustc by wrapping a large match statement generated by islec in a closure. By introducing closures, a portion of a huge Rust function (for a ISLE term) can be split into several smaller compilation units for rustc, reducing the compilation time.

Evaluation

On the main branch, the build times for cranelift-codegen before/after this optimization are measured. This PR saves ~2 seconds on my machine (x86-64, 64 core, 512GB memory)

~/wasmtime (optimize-isle-compilation)> cargo clean
     Removed 10031 files, 12.6GiB total
~/wasmtime (optimize-isle-compilation)> time cargo build -q -p cranelift-codegen

________________________________________________________
Executed in   23.81 secs    fish           external
   usr time   43.59 secs    0.00 micros   43.59 secs
   sys time    5.04 secs  592.00 micros    5.04 secs
~/wasmtime ((dev))> cargo clean
t     Removed 1356 files, 1.0GiB total
~/wasmtime ((dev))> time cargo build -q -p cranelift-codegen

________________________________________________________
Executed in   26.04 secs    fish           external
   usr time   46.58 secs    0.00 micros   46.58 secs
   sys time    5.15 secs  583.00 micros    5.15 secs

Extra

I am experimenting with over 6,000 ISLE simplify rules.
The compilation time reduced from 841.37 seconds to 69.34 seconds with this PR. (+11x speedup)

~/w/cranelift ((many-rules-no-opt))> cargo clean && time cargo build -p cranelift-codegen
________________________________________________________
Executed in  841.37 secs    fish           external
   usr time  826.92 secs  325.00 micros  826.92 secs
   sys time   49.98 secs  135.00 micros   49.98 secs

 ~/w/cranelift (many-rules-opt)> cargo clean && time cargo build -q -p cranelift-codegen
________________________________________________________
Executed in   69.34 secs    fish           external
   usr time   98.32 secs  362.00 micros   98.32 secs
   sys time    6.42 secs  149.00 micros    6.42 secs

view this post on Zulip Wasmtime GitHub notifications bot (Jan 12 2026 at 05:04):

bongjunj commented on PR #12303:

@cfallin

Thanks for the comment! I've ran the benchmarks and measure the compilation time.
It turned out that the compilation overhead is almost next to zero.
This is probably because it introduces closures only for large pattern matches.
The raw data (for 10 iterations -- default setting of sightglass-cli) is as below:

bench optimize-isle-compilation main overhead
bz2 273,462,445 271,631,283 0.67%
pulldown-cmark 623,480,512 622,976,525 0.08%
spidermonkey 21,592,595,980 21,523,101,083 0.32%

view this post on Zulip Wasmtime GitHub notifications bot (Jan 12 2026 at 05:11):

bongjunj submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 12 2026 at 05:11):

bongjunj created PR review comment:

FYI I experimented three values for this: 128, 256, 512.
The best one was 256 for the current ruleset (~500 rules), and 512 for my massive ruleset (~6,700 rules).
There could be some relation between threshold value and the ruleset size. Additionally, the parallelism available of the development environment could affect too.

However, as this could affect the quality of the compiled cranelift, I assumed at first sticking to the best value was a good idea.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 20 2026 at 00:44):

cfallin commented on PR #12303:

Sorry for the delayed response here -- was traveling then lost track of this PR.

Thanks for the experiments! I think I agree with Nick that this would be good to have as an option, probably off by default. 0.67% compile time is still an unfortunate regression to take (on the flip side, 1% speedups make us pretty happy and take work to find), and is more important than the time taken to compile Cranelift, since usually you compile Cranelift once then use that compiled Cranelift-containing program many times. But when developing Cranelift, it would be useful to have this option.

Could you put this under a Cargo feature that alters the behavior of islec as invoked by cranelift-codegen's build.rs?

view this post on Zulip Wasmtime GitHub notifications bot (Jan 20 2026 at 00:44):

cfallin edited a comment on PR #12303:

Sorry for the delayed response here -- was traveling then lost track of this PR.

Thanks for the experiments! I think I agree with Nick that this would be good to have as an option, probably off by default. 0.67% compile time is still an unfortunate regression to take (on the flip side, 1% speedups make us pretty happy and take work to find), and is (usually) more important than the time taken to compile Cranelift, since usually you compile Cranelift once then use that compiled Cranelift-containing program many times. But when developing Cranelift, it would be useful to have this option.

Could you put this under a Cargo feature that alters the behavior of islec as invoked by cranelift-codegen's build.rs?

view this post on Zulip Wasmtime GitHub notifications bot (Jan 21 2026 at 10:42):

bongjunj commented on PR #12303:

@cfallin thanks for the comment!
I think I could handle this until this weekend. Will mention you when I finish the feature.

Thank you.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2026 at 03:41):

bongjunj updated PR #12303.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2026 at 04:15):

bongjunj updated PR #12303.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2026 at 04:20):

bongjunj updated PR #12303.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2026 at 04:22):

bongjunj commented on PR #12303:

@cfallin This is now wrapped under a Cargo feature isle-split-match on the crate cranelift-codegen.

@fitzgen Now we have a controllable splitting threshold, via setting an environment variable: ISLE_SPLIT_MATCH_THRESHOLD=512

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2026 at 16:39):

cfallin submitted PR review:

LGTM -- thanks very much for the patience here!

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2026 at 16:39):

cfallin added PR #12303 Optimize ISLE compilation to the merge queue.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2026 at 17:08):

cfallin merged PR #12303.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2026 at 17:08):

cfallin removed PR #12303 Optimize ISLE compilation from the merge queue.


Last updated: Jan 29 2026 at 13:25 UTC