wasmtime / issue #12086 Cranelift: deoptimizing rules in ... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #12086 Cranelift: deoptimizing rules in ...

Wasmtime GitHub notifications bot (Nov 25 2025 at 07:49):

bongjunj opened issue #12086:

Hi, this is a follow up from #cranelift > Deoptimizing ISLE rules?.

As mentioned in the Zulip topic above,
I suspect these ISLE rules can degrade performance, since it can increase the number of instructions and thus the overall computation cost.

I evaluated the impact of the rule on the sightglass benchmark.
From the main branch of cranelift, I removed the rule to instantiate no-demorgan version,
and compared the execution and compilation performance using sightglass-cli in CPU cycles.
The average value for 10 repetitions are presented in the table below.

Removing the rules lowers the execution time by 22.82% and the compilation overhead by 13.34% for shootout-keccak, the impact of which being negligible for other cases.

speedup = (main - nodemorgan ) / nodemorgan

overhead = (nodemorgan - main) / main

The rules exist for normalization, pushing bnot instructions down the tree and further exploiting it via other simplification rules and GVN.
However, this data says the normalization is under-exploited and can degrade performance for keccak.

Benchmark Execution (main) Execution (no-demorgan) Speedup Compilation (main) Compilation (no-demorgan) Overhead

blake3-scalar 821,287 820,526 0.09% 334,025,595 335,494,881 0.44%

blake3-simd 904,391 902,968 0.16% 215,000,060 216,887,637 0.88%

bz2 123,375,319 123,972,237 -0.48% 323,600,329 325,059,640 0.45%

pulldown-cmark 7,527,276 7,528,312 -0.01% 685,357,632 687,241,047 0.27%

regex 287,113,532 287,013,209 0.03% 1,623,122,606 1,628,150,390 0.31%

shootout-ackermann 7,766,207 7,769,915 -0.05% 98,442,831 99,049,461 0.62%

shootout-base64 377,876,186 377,986,725 -0.03% 94,119,875 94,616,896 0.53%

shootout-ctype 796,212,604 796,195,661 0.00% 90,769,795 90,728,785 -0.05%

shootout-ed25519 11,062,252,786 11,041,529,973 0.19% 505,160,708 511,230,333 1.20%

shootout-fib2 2,991,817,344 2,991,783,776 0.00% 67,992,267 68,207,110 0.32%

shootout-gimli 5,143,297 5,153,384 -0.20% 5,846,157 5,843,358 -0.05%

shootout-heapsort 2,374,978,997 2,375,615,158 -0.03% 29,560,353 29,690,677 0.44%

shootout-keccak 48,797,823 39,731,401 22.82% 292,241,108 253,254,413 -13.34%

shootout-matrix 697,653,531 697,060,503 0.09% 93,389,415 93,786,432 0.43%

shootout-memmove 37,572,864 37,679,507 -0.28% 95,438,341 95,867,972 0.45%

shootout-minicsv 1,239,552,532 1,241,534,405 -0.16% 15,630,009 15,681,735 0.33%

shootout-nestedloop 645 621 3.93% 66,921,453 67,062,411 0.21%

shootout-random 439,552,157 439,582,225 -0.01% 67,809,477 68,091,002 0.42%

shootout-ratelimit 50,251,247 50,384,183 -0.26% 92,983,888 92,922,059 -0.07%

shootout-seqhash 15,249,759,981 15,255,584,809 -0.04% 126,530,505 127,360,907 0.66%

shootout-sieve 844,263,240 844,508,149 -0.03% 67,092,099 67,628,053 0.80%

shootout-switch 153,597,929 153,627,912 -0.02% 144,493,947 144,955,423 0.32%

shootout-xblabla20 4,924,967 4,926,304 -0.03% 96,463,543 96,976,115 0.53%

shootout-xchacha20 6,468,729 6,467,746 0.02% 96,534,043 96,825,264 0.30%

spidermonkey 742,879,491 744,941,257 -0.28% 23,687,766,691 23,749,945,287 0.26%

Benchmark	Execution (main)	Execution (no-demorgan)	Speedup	Compilation (main)	Compilation (no-demorgan)	Overhead
blake3-scalar	821,287	820,526	0.09%	334,025,595	335,494,881	0.44%
blake3-simd	904,391	902,968	0.16%	215,000,060	216,887,637	0.88%
bz2	123,375,319	123,972,237	-0.48%	323,600,329	325,059,640	0.45%
pulldown-cmark	7,527,276	7,528,312	-0.01%	685,357,632	687,241,047	0.27%
regex	287,113,532	287,013,209	0.03%	1,623,122,606	1,628,150,390	0.31%
shootout-ackermann	7,766,207	7,769,915	-0.05%	98,442,831	99,049,461	0.62%
shootout-base64	377,876,186	377,986,725	-0.03%	94,119,875	94,616,896	0.53%
shootout-ctype	796,212,604	796,195,661	0.00%	90,769,795	90,728,785	-0.05%
shootout-ed25519	11,062,252,786	11,041,529,973	0.19%	505,160,708	511,230,333	1.20%
shootout-fib2	2,991,817,344	2,991,783,776	0.00%	67,992,267	68,207,110	0.32%
shootout-gimli	5,143,297	5,153,384	-0.20%	5,846,157	5,843,358	-0.05%
shootout-heapsort	2,374,978,997	2,375,615,158	-0.03%	29,560,353	29,690,677	0.44%
shootout-keccak	48,797,823	39,731,401	22.82%	292,241,108	253,254,413	-13.34%
shootout-matrix	697,653,531	697,060,503	0.09%	93,389,415	93,786,432	0.43%
shootout-memmove	37,572,864	37,679,507	-0.28%	95,438,341	95,867,972	0.45%
shootout-minicsv	1,239,552,532	1,241,534,405	-0.16%	15,630,009	15,681,735	0.33%
shootout-nestedloop	645	621	3.93%	66,921,453	67,062,411	0.21%
shootout-random	439,552,157	439,582,225	-0.01%	67,809,477	68,091,002	0.42%
shootout-ratelimit	50,251,247	50,384,183	-0.26%	92,983,888	92,922,059	-0.07%
shootout-seqhash	15,249,759,981	15,255,584,809	-0.04%	126,530,505	127,360,907	0.66%
shootout-sieve	844,263,240	844,508,149	-0.03%	67,092,099	67,628,053	0.80%
shootout-switch	153,597,929	153,627,912	-0.02%	144,493,947	144,955,423	0.32%
shootout-xblabla20	4,924,967	4,926,304	-0.03%	96,463,543	96,976,115	0.53%
shootout-xchacha20	6,468,729	6,467,746	0.02%	96,534,043	96,825,264	0.30%
spidermonkey	742,879,491	744,941,257	-0.28%	23,687,766,691	23,749,945,287	0.26%

Wasmtime GitHub notifications bot (Nov 25 2025 at 07:53):

bongjunj edited issue #12086:

Hi, this is a follow up from #cranelift > Deoptimizing ISLE rules?.

As mentioned in the Zulip topic above,
I suspect these ISLE rules can degrade performance, since it can increase the number of instructions and thus the overall computation cost.

I evaluated the impact of the rule on the sightglass benchmark.
From the main branch of cranelift, I removed the rule to instantiate no-demorgan version,
and compared the execution and compilation performance using sightglass-cli in CPU cycles.
The average value for 10 repetitions are presented in the table below.
My machine is x86-64, and runs with 64-Core and 512GB memory.

Removing the rules lowers the execution time by 22.82% and the compilation overhead by 13.34% for shootout-keccak, the impact of which being negligible for other cases.

speedup = (main - nodemorgan ) / nodemorgan

overhead = (nodemorgan - main) / main

The rules exist for normalization, pushing bnot instructions down the tree and further exploiting it via other simplification rules and GVN.
However, this data says the normalization is under-exploited and can degrade performance for keccak.

Benchmark Execution (main) Execution (no-demorgan) Speedup Compilation (main) Compilation (no-demorgan) Overhead

blake3-scalar 821,287 820,526 0.09% 334,025,595 335,494,881 0.44%

blake3-simd 904,391 902,968 0.16% 215,000,060 216,887,637 0.88%

bz2 123,375,319 123,972,237 -0.48% 323,600,329 325,059,640 0.45%

pulldown-cmark 7,527,276 7,528,312 -0.01% 685,357,632 687,241,047 0.27%

regex 287,113,532 287,013,209 0.03% 1,623,122,606 1,628,150,390 0.31%

shootout-ackermann 7,766,207 7,769,915 -0.05% 98,442,831 99,049,461 0.62%

shootout-base64 377,876,186 377,986,725 -0.03% 94,119,875 94,616,896 0.53%

shootout-ctype 796,212,604 796,195,661 0.00% 90,769,795 90,728,785 -0.05%

shootout-ed25519 11,062,252,786 11,041,529,973 0.19% 505,160,708 511,230,333 1.20%

shootout-fib2 2,991,817,344 2,991,783,776 0.00% 67,992,267 68,207,110 0.32%

shootout-gimli 5,143,297 5,153,384 -0.20% 5,846,157 5,843,358 -0.05%

shootout-heapsort 2,374,978,997 2,375,615,158 -0.03% 29,560,353 29,690,677 0.44%

shootout-keccak 48,797,823 39,731,401 22.82% 292,241,108 253,254,413 -13.34%

shootout-matrix 697,653,531 697,060,503 0.09% 93,389,415 93,786,432 0.43%

shootout-memmove 37,572,864 37,679,507 -0.28% 95,438,341 95,867,972 0.45%

shootout-minicsv 1,239,552,532 1,241,534,405 -0.16% 15,630,009 15,681,735 0.33%

shootout-nestedloop 645 621 3.93% 66,921,453 67,062,411 0.21%

shootout-random 439,552,157 439,582,225 -0.01% 67,809,477 68,091,002 0.42%

shootout-ratelimit 50,251,247 50,384,183 -0.26% 92,983,888 92,922,059 -0.07%

shootout-seqhash 15,249,759,981 15,255,584,809 -0.04% 126,530,505 127,360,907 0.66%

shootout-sieve 844,263,240 844,508,149 -0.03% 67,092,099 67,628,053 0.80%

shootout-switch 153,597,929 153,627,912 -0.02% 144,493,947 144,955,423 0.32%

shootout-xblabla20 4,924,967 4,926,304 -0.03% 96,463,543 96,976,115 0.53%

shootout-xchacha20 6,468,729 6,467,746 0.02% 96,534,043 96,825,264 0.30%

spidermonkey 742,879,491 744,941,257 -0.28% 23,687,766,691 23,749,945,287 0.26%

Benchmark	Execution (main)	Execution (no-demorgan)	Speedup	Compilation (main)	Compilation (no-demorgan)	Overhead
blake3-scalar	821,287	820,526	0.09%	334,025,595	335,494,881	0.44%
blake3-simd	904,391	902,968	0.16%	215,000,060	216,887,637	0.88%
bz2	123,375,319	123,972,237	-0.48%	323,600,329	325,059,640	0.45%
pulldown-cmark	7,527,276	7,528,312	-0.01%	685,357,632	687,241,047	0.27%
regex	287,113,532	287,013,209	0.03%	1,623,122,606	1,628,150,390	0.31%
shootout-ackermann	7,766,207	7,769,915	-0.05%	98,442,831	99,049,461	0.62%
shootout-base64	377,876,186	377,986,725	-0.03%	94,119,875	94,616,896	0.53%
shootout-ctype	796,212,604	796,195,661	0.00%	90,769,795	90,728,785	-0.05%
shootout-ed25519	11,062,252,786	11,041,529,973	0.19%	505,160,708	511,230,333	1.20%
shootout-fib2	2,991,817,344	2,991,783,776	0.00%	67,992,267	68,207,110	0.32%
shootout-gimli	5,143,297	5,153,384	-0.20%	5,846,157	5,843,358	-0.05%
shootout-heapsort	2,374,978,997	2,375,615,158	-0.03%	29,560,353	29,690,677	0.44%
shootout-keccak	48,797,823	39,731,401	22.82%	292,241,108	253,254,413	-13.34%
shootout-matrix	697,653,531	697,060,503	0.09%	93,389,415	93,786,432	0.43%
shootout-memmove	37,572,864	37,679,507	-0.28%	95,438,341	95,867,972	0.45%
shootout-minicsv	1,239,552,532	1,241,534,405	-0.16%	15,630,009	15,681,735	0.33%
shootout-nestedloop	645	621	3.93%	66,921,453	67,062,411	0.21%
shootout-random	439,552,157	439,582,225	-0.01%	67,809,477	68,091,002	0.42%
shootout-ratelimit	50,251,247	50,384,183	-0.26%	92,983,888	92,922,059	-0.07%
shootout-seqhash	15,249,759,981	15,255,584,809	-0.04%	126,530,505	127,360,907	0.66%
shootout-sieve	844,263,240	844,508,149	-0.03%	67,092,099	67,628,053	0.80%
shootout-switch	153,597,929	153,627,912	-0.02%	144,493,947	144,955,423	0.32%
shootout-xblabla20	4,924,967	4,926,304	-0.03%	96,463,543	96,976,115	0.53%
shootout-xchacha20	6,468,729	6,467,746	0.02%	96,534,043	96,825,264	0.30%
spidermonkey	742,879,491	744,941,257	-0.28%	23,687,766,691	23,749,945,287	0.26%

Wasmtime GitHub notifications bot (Nov 25 2025 at 07:53):

bongjunj edited issue #12086:

Hi, this is a follow up from #cranelift > Deoptimizing ISLE rules?.

As mentioned in the Zulip topic above,
I suspect these ISLE rules can degrade performance, since it can increase the number of instructions and thus the overall computation cost.

I evaluated the impact of the rule on the sightglass benchmark.
From the main branch of cranelift, I removed the rule to instantiate no-demorgan version,
and compared the execution and compilation performance using sightglass-cli in CPU cycles.
The average value for 10 repetitions are presented in the table below.
My machine is x86-64 and runs with 64-Core and 512GB memory.

Removing the rules lowers the execution time by 22.82% and the compilation overhead by 13.34% for shootout-keccak, the impact of which being negligible for other cases.

speedup = (main - nodemorgan ) / nodemorgan

overhead = (nodemorgan - main) / main

The rules exist for normalization, pushing bnot instructions down the tree and further exploiting it via other simplification rules and GVN.
However, this data says the normalization is under-exploited and can degrade performance for keccak.

Benchmark Execution (main) Execution (no-demorgan) Speedup Compilation (main) Compilation (no-demorgan) Overhead

blake3-scalar 821,287 820,526 0.09% 334,025,595 335,494,881 0.44%

blake3-simd 904,391 902,968 0.16% 215,000,060 216,887,637 0.88%

bz2 123,375,319 123,972,237 -0.48% 323,600,329 325,059,640 0.45%

pulldown-cmark 7,527,276 7,528,312 -0.01% 685,357,632 687,241,047 0.27%

regex 287,113,532 287,013,209 0.03% 1,623,122,606 1,628,150,390 0.31%

shootout-ackermann 7,766,207 7,769,915 -0.05% 98,442,831 99,049,461 0.62%

shootout-base64 377,876,186 377,986,725 -0.03% 94,119,875 94,616,896 0.53%

shootout-ctype 796,212,604 796,195,661 0.00% 90,769,795 90,728,785 -0.05%

shootout-ed25519 11,062,252,786 11,041,529,973 0.19% 505,160,708 511,230,333 1.20%

shootout-fib2 2,991,817,344 2,991,783,776 0.00% 67,992,267 68,207,110 0.32%

shootout-gimli 5,143,297 5,153,384 -0.20% 5,846,157 5,843,358 -0.05%

shootout-heapsort 2,374,978,997 2,375,615,158 -0.03% 29,560,353 29,690,677 0.44%

shootout-keccak 48,797,823 39,731,401 22.82% 292,241,108 253,254,413 -13.34%

shootout-matrix 697,653,531 697,060,503 0.09% 93,389,415 93,786,432 0.43%

shootout-memmove 37,572,864 37,679,507 -0.28% 95,438,341 95,867,972 0.45%

shootout-minicsv 1,239,552,532 1,241,534,405 -0.16% 15,630,009 15,681,735 0.33%

shootout-nestedloop 645 621 3.93% 66,921,453 67,062,411 0.21%

shootout-random 439,552,157 439,582,225 -0.01% 67,809,477 68,091,002 0.42%

shootout-ratelimit 50,251,247 50,384,183 -0.26% 92,983,888 92,922,059 -0.07%

shootout-seqhash 15,249,759,981 15,255,584,809 -0.04% 126,530,505 127,360,907 0.66%

shootout-sieve 844,263,240 844,508,149 -0.03% 67,092,099 67,628,053 0.80%

shootout-switch 153,597,929 153,627,912 -0.02% 144,493,947 144,955,423 0.32%

shootout-xblabla20 4,924,967 4,926,304 -0.03% 96,463,543 96,976,115 0.53%

shootout-xchacha20 6,468,729 6,467,746 0.02% 96,534,043 96,825,264 0.30%

spidermonkey 742,879,491 744,941,257 -0.28% 23,687,766,691 23,749,945,287 0.26%

Benchmark	Execution (main)	Execution (no-demorgan)	Speedup	Compilation (main)	Compilation (no-demorgan)	Overhead
blake3-scalar	821,287	820,526	0.09%	334,025,595	335,494,881	0.44%
blake3-simd	904,391	902,968	0.16%	215,000,060	216,887,637	0.88%
bz2	123,375,319	123,972,237	-0.48%	323,600,329	325,059,640	0.45%
pulldown-cmark	7,527,276	7,528,312	-0.01%	685,357,632	687,241,047	0.27%
regex	287,113,532	287,013,209	0.03%	1,623,122,606	1,628,150,390	0.31%
shootout-ackermann	7,766,207	7,769,915	-0.05%	98,442,831	99,049,461	0.62%
shootout-base64	377,876,186	377,986,725	-0.03%	94,119,875	94,616,896	0.53%
shootout-ctype	796,212,604	796,195,661	0.00%	90,769,795	90,728,785	-0.05%
shootout-ed25519	11,062,252,786	11,041,529,973	0.19%	505,160,708	511,230,333	1.20%
shootout-fib2	2,991,817,344	2,991,783,776	0.00%	67,992,267	68,207,110	0.32%
shootout-gimli	5,143,297	5,153,384	-0.20%	5,846,157	5,843,358	-0.05%
shootout-heapsort	2,374,978,997	2,375,615,158	-0.03%	29,560,353	29,690,677	0.44%
shootout-keccak	48,797,823	39,731,401	22.82%	292,241,108	253,254,413	-13.34%
shootout-matrix	697,653,531	697,060,503	0.09%	93,389,415	93,786,432	0.43%
shootout-memmove	37,572,864	37,679,507	-0.28%	95,438,341	95,867,972	0.45%
shootout-minicsv	1,239,552,532	1,241,534,405	-0.16%	15,630,009	15,681,735	0.33%
shootout-nestedloop	645	621	3.93%	66,921,453	67,062,411	0.21%
shootout-random	439,552,157	439,582,225	-0.01%	67,809,477	68,091,002	0.42%
shootout-ratelimit	50,251,247	50,384,183	-0.26%	92,983,888	92,922,059	-0.07%
shootout-seqhash	15,249,759,981	15,255,584,809	-0.04%	126,530,505	127,360,907	0.66%
shootout-sieve	844,263,240	844,508,149	-0.03%	67,092,099	67,628,053	0.80%
shootout-switch	153,597,929	153,627,912	-0.02%	144,493,947	144,955,423	0.32%
shootout-xblabla20	4,924,967	4,926,304	-0.03%	96,463,543	96,976,115	0.53%
shootout-xchacha20	6,468,729	6,467,746	0.02%	96,534,043	96,825,264	0.30%
spidermonkey	742,879,491	744,941,257	-0.28%	23,687,766,691	23,749,945,287	0.26%

Wasmtime GitHub notifications bot (Nov 25 2025 at 07:53):

bongjunj edited issue #12086:

Hi, this is a follow up from #cranelift > Deoptimizing ISLE rules?.

As mentioned in the Zulip topic above,
I suspect these ISLE rules can degrade performance, since it can increase the number of instructions and thus the overall computation cost.

I evaluated the impact of the rule on the sightglass benchmark.
From the main branch of cranelift, I removed the rule to instantiate no-demorgan version,
and compared the execution and compilation performance using sightglass-cli in CPU cycles.
The average value for 10 repetitions are presented in the table below.
My machine is x86-64 linux and runs with 64-Core and 512GB memory.

Removing the rules lowers the execution time by 22.82% and the compilation overhead by 13.34% for shootout-keccak, the impact of which being negligible for other cases.

speedup = (main - nodemorgan ) / nodemorgan

overhead = (nodemorgan - main) / main

The rules exist for normalization, pushing bnot instructions down the tree and further exploiting it via other simplification rules and GVN.
However, this data says the normalization is under-exploited and can degrade performance for keccak.

Benchmark Execution (main) Execution (no-demorgan) Speedup Compilation (main) Compilation (no-demorgan) Overhead

blake3-scalar 821,287 820,526 0.09% 334,025,595 335,494,881 0.44%

blake3-simd 904,391 902,968 0.16% 215,000,060 216,887,637 0.88%

bz2 123,375,319 123,972,237 -0.48% 323,600,329 325,059,640 0.45%

pulldown-cmark 7,527,276 7,528,312 -0.01% 685,357,632 687,241,047 0.27%

regex 287,113,532 287,013,209 0.03% 1,623,122,606 1,628,150,390 0.31%

shootout-ackermann 7,766,207 7,769,915 -0.05% 98,442,831 99,049,461 0.62%

shootout-base64 377,876,186 377,986,725 -0.03% 94,119,875 94,616,896 0.53%

shootout-ctype 796,212,604 796,195,661 0.00% 90,769,795 90,728,785 -0.05%

shootout-ed25519 11,062,252,786 11,041,529,973 0.19% 505,160,708 511,230,333 1.20%

shootout-fib2 2,991,817,344 2,991,783,776 0.00% 67,992,267 68,207,110 0.32%

shootout-gimli 5,143,297 5,153,384 -0.20% 5,846,157 5,843,358 -0.05%

shootout-heapsort 2,374,978,997 2,375,615,158 -0.03% 29,560,353 29,690,677 0.44%

shootout-keccak 48,797,823 39,731,401 22.82% 292,241,108 253,254,413 -13.34%

shootout-matrix 697,653,531 697,060,503 0.09% 93,389,415 93,786,432 0.43%

shootout-memmove 37,572,864 37,679,507 -0.28% 95,438,341 95,867,972 0.45%

shootout-minicsv 1,239,552,532 1,241,534,405 -0.16% 15,630,009 15,681,735 0.33%

shootout-nestedloop 645 621 3.93% 66,921,453 67,062,411 0.21%

shootout-random 439,552,157 439,582,225 -0.01% 67,809,477 68,091,002 0.42%

shootout-ratelimit 50,251,247 50,384,183 -0.26% 92,983,888 92,922,059 -0.07%

shootout-seqhash 15,249,759,981 15,255,584,809 -0.04% 126,530,505 127,360,907 0.66%

shootout-sieve 844,263,240 844,508,149 -0.03% 67,092,099 67,628,053 0.80%

shootout-switch 153,597,929 153,627,912 -0.02% 144,493,947 144,955,423 0.32%

shootout-xblabla20 4,924,967 4,926,304 -0.03% 96,463,543 96,976,115 0.53%

shootout-xchacha20 6,468,729 6,467,746 0.02% 96,534,043 96,825,264 0.30%

spidermonkey 742,879,491 744,941,257 -0.28% 23,687,766,691 23,749,945,287 0.26%

Benchmark	Execution (main)	Execution (no-demorgan)	Speedup	Compilation (main)	Compilation (no-demorgan)	Overhead
blake3-scalar	821,287	820,526	0.09%	334,025,595	335,494,881	0.44%
blake3-simd	904,391	902,968	0.16%	215,000,060	216,887,637	0.88%
bz2	123,375,319	123,972,237	-0.48%	323,600,329	325,059,640	0.45%
pulldown-cmark	7,527,276	7,528,312	-0.01%	685,357,632	687,241,047	0.27%
regex	287,113,532	287,013,209	0.03%	1,623,122,606	1,628,150,390	0.31%
shootout-ackermann	7,766,207	7,769,915	-0.05%	98,442,831	99,049,461	0.62%
shootout-base64	377,876,186	377,986,725	-0.03%	94,119,875	94,616,896	0.53%
shootout-ctype	796,212,604	796,195,661	0.00%	90,769,795	90,728,785	-0.05%
shootout-ed25519	11,062,252,786	11,041,529,973	0.19%	505,160,708	511,230,333	1.20%
shootout-fib2	2,991,817,344	2,991,783,776	0.00%	67,992,267	68,207,110	0.32%
shootout-gimli	5,143,297	5,153,384	-0.20%	5,846,157	5,843,358	-0.05%
shootout-heapsort	2,374,978,997	2,375,615,158	-0.03%	29,560,353	29,690,677	0.44%
shootout-keccak	48,797,823	39,731,401	22.82%	292,241,108	253,254,413	-13.34%
shootout-matrix	697,653,531	697,060,503	0.09%	93,389,415	93,786,432	0.43%
shootout-memmove	37,572,864	37,679,507	-0.28%	95,438,341	95,867,972	0.45%
shootout-minicsv	1,239,552,532	1,241,534,405	-0.16%	15,630,009	15,681,735	0.33%
shootout-nestedloop	645	621	3.93%	66,921,453	67,062,411	0.21%
shootout-random	439,552,157	439,582,225	-0.01%	67,809,477	68,091,002	0.42%
shootout-ratelimit	50,251,247	50,384,183	-0.26%	92,983,888	92,922,059	-0.07%
shootout-seqhash	15,249,759,981	15,255,584,809	-0.04%	126,530,505	127,360,907	0.66%
shootout-sieve	844,263,240	844,508,149	-0.03%	67,092,099	67,628,053	0.80%
shootout-switch	153,597,929	153,627,912	-0.02%	144,493,947	144,955,423	0.32%
shootout-xblabla20	4,924,967	4,926,304	-0.03%	96,463,543	96,976,115	0.53%
shootout-xchacha20	6,468,729	6,467,746	0.02%	96,534,043	96,825,264	0.30%
spidermonkey	742,879,491	744,941,257	-0.28%	23,687,766,691	23,749,945,287	0.26%

Wasmtime GitHub notifications bot (Nov 25 2025 at 08:01):

cfallin commented on issue #12086:

Thanks for this analysis!

I am reading the data as: small but positive impact on most benchmarks; with one large negative outlier (keccak) as you mention. To me that suggests some adverse impact in an inner loop; perhaps we can fix it. Would you be able to dig a bit deeper on the benchmark (e.g. by profiling and comparing the hottest basic blocks before and after) to see if you can narrow down the cause?

Wasmtime GitHub notifications bot (Nov 25 2025 at 11:27):

bongjunj commented on issue #12086:

Thanks for the suggestion! Gonna run those analysis.

Wasmtime GitHub notifications bot (Dec 08 2025 at 05:22):

bongjunj closed issue #12086:

Hi, this is a follow up from #cranelift > Deoptimizing ISLE rules?.

As mentioned in the Zulip topic above,
I suspect these ISLE rules can degrade performance, since it can increase the number of instructions and thus the overall computation cost.

I evaluated the impact of the rule on the sightglass benchmark.
From the main branch of cranelift, I removed the rule to instantiate no-demorgan version,
and compared the execution and compilation performance using sightglass-cli in CPU cycles.
The average value for 10 repetitions are presented in the table below.
My machine is x86-64 linux and runs with 64-Core and 512GB memory.

Removing the rules lowers the execution time by 22.82% and the compilation overhead by 13.34% for shootout-keccak, the impact of which being negligible for other cases.

speedup = (main - nodemorgan ) / nodemorgan

overhead = (nodemorgan - main) / main

The rules exist for normalization, pushing bnot instructions down the tree and further exploiting it via other simplification rules and GVN.
However, this data says the normalization is under-exploited and can degrade performance for keccak.

Benchmark Execution (main) Execution (no-demorgan) Speedup Compilation (main) Compilation (no-demorgan) Overhead

blake3-scalar 821,287 820,526 0.09% 334,025,595 335,494,881 0.44%

blake3-simd 904,391 902,968 0.16% 215,000,060 216,887,637 0.88%

bz2 123,375,319 123,972,237 -0.48% 323,600,329 325,059,640 0.45%

pulldown-cmark 7,527,276 7,528,312 -0.01% 685,357,632 687,241,047 0.27%

regex 287,113,532 287,013,209 0.03% 1,623,122,606 1,628,150,390 0.31%

shootout-ackermann 7,766,207 7,769,915 -0.05% 98,442,831 99,049,461 0.62%

shootout-base64 377,876,186 377,986,725 -0.03% 94,119,875 94,616,896 0.53%

shootout-ctype 796,212,604 796,195,661 0.00% 90,769,795 90,728,785 -0.05%

shootout-ed25519 11,062,252,786 11,041,529,973 0.19% 505,160,708 511,230,333 1.20%

shootout-fib2 2,991,817,344 2,991,783,776 0.00% 67,992,267 68,207,110 0.32%

shootout-gimli 5,143,297 5,153,384 -0.20% 5,846,157 5,843,358 -0.05%

shootout-heapsort 2,374,978,997 2,375,615,158 -0.03% 29,560,353 29,690,677 0.44%

shootout-keccak 48,797,823 39,731,401 22.82% 292,241,108 253,254,413 -13.34%

shootout-matrix 697,653,531 697,060,503 0.09% 93,389,415 93,786,432 0.43%

shootout-memmove 37,572,864 37,679,507 -0.28% 95,438,341 95,867,972 0.45%

shootout-minicsv 1,239,552,532 1,241,534,405 -0.16% 15,630,009 15,681,735 0.33%

shootout-nestedloop 645 621 3.93% 66,921,453 67,062,411 0.21%

shootout-random 439,552,157 439,582,225 -0.01% 67,809,477 68,091,002 0.42%

shootout-ratelimit 50,251,247 50,384,183 -0.26% 92,983,888 92,922,059 -0.07%

shootout-seqhash 15,249,759,981 15,255,584,809 -0.04% 126,530,505 127,360,907 0.66%

shootout-sieve 844,263,240 844,508,149 -0.03% 67,092,099 67,628,053 0.80%

shootout-switch 153,597,929 153,627,912 -0.02% 144,493,947 144,955,423 0.32%

shootout-xblabla20 4,924,967 4,926,304 -0.03% 96,463,543 96,976,115 0.53%

shootout-xchacha20 6,468,729 6,467,746 0.02% 96,534,043 96,825,264 0.30%

spidermonkey 742,879,491 744,941,257 -0.28% 23,687,766,691 23,749,945,287 0.26%

Benchmark	Execution (main)	Execution (no-demorgan)	Speedup	Compilation (main)	Compilation (no-demorgan)	Overhead
blake3-scalar	821,287	820,526	0.09%	334,025,595	335,494,881	0.44%
blake3-simd	904,391	902,968	0.16%	215,000,060	216,887,637	0.88%
bz2	123,375,319	123,972,237	-0.48%	323,600,329	325,059,640	0.45%
pulldown-cmark	7,527,276	7,528,312	-0.01%	685,357,632	687,241,047	0.27%
regex	287,113,532	287,013,209	0.03%	1,623,122,606	1,628,150,390	0.31%
shootout-ackermann	7,766,207	7,769,915	-0.05%	98,442,831	99,049,461	0.62%
shootout-base64	377,876,186	377,986,725	-0.03%	94,119,875	94,616,896	0.53%
shootout-ctype	796,212,604	796,195,661	0.00%	90,769,795	90,728,785	-0.05%
shootout-ed25519	11,062,252,786	11,041,529,973	0.19%	505,160,708	511,230,333	1.20%
shootout-fib2	2,991,817,344	2,991,783,776	0.00%	67,992,267	68,207,110	0.32%
shootout-gimli	5,143,297	5,153,384	-0.20%	5,846,157	5,843,358	-0.05%
shootout-heapsort	2,374,978,997	2,375,615,158	-0.03%	29,560,353	29,690,677	0.44%
shootout-keccak	48,797,823	39,731,401	22.82%	292,241,108	253,254,413	-13.34%
shootout-matrix	697,653,531	697,060,503	0.09%	93,389,415	93,786,432	0.43%
shootout-memmove	37,572,864	37,679,507	-0.28%	95,438,341	95,867,972	0.45%
shootout-minicsv	1,239,552,532	1,241,534,405	-0.16%	15,630,009	15,681,735	0.33%
shootout-nestedloop	645	621	3.93%	66,921,453	67,062,411	0.21%
shootout-random	439,552,157	439,582,225	-0.01%	67,809,477	68,091,002	0.42%
shootout-ratelimit	50,251,247	50,384,183	-0.26%	92,983,888	92,922,059	-0.07%
shootout-seqhash	15,249,759,981	15,255,584,809	-0.04%	126,530,505	127,360,907	0.66%
shootout-sieve	844,263,240	844,508,149	-0.03%	67,092,099	67,628,053	0.80%
shootout-switch	153,597,929	153,627,912	-0.02%	144,493,947	144,955,423	0.32%
shootout-xblabla20	4,924,967	4,926,304	-0.03%	96,463,543	96,976,115	0.53%
shootout-xchacha20	6,468,729	6,467,746	0.02%	96,534,043	96,825,264	0.30%
spidermonkey	742,879,491	744,941,257	-0.28%	23,687,766,691	23,749,945,287	0.26%

Wasmtime GitHub notifications bot (Dec 08 2025 at 05:22):

bongjunj commented on issue #12086:

https://github.com/bytecodealliance/wasmtime/pull/12127

Last updated: Jul 29 2026 at 05:03 UTC