It seems Pulley is 30% slower than Wasmi. I am curious why it is.
Hey there, thanks for noting the results of your exploration -- do you happen to have some more details on the use case you were benchmarking and/or some code for a reproduction or benchmark?
While I'm sure others may chime in with speculation, there is so much that could possibly contribute to performance differences in setup, execution, use case, and more that I don't think there's much anyone could meaningfully comment on with your current comment alone!
There was a good amount of discussion around this in the Wasmtime and Cranelift meetings a few months ago when @Alex Crichton was working on optimizing Pulley -- we landed on some hypotheses, and some TODOs to investigate further (basically: count opcodes executed to determine whether it was per-opcode dispatch overhead, or something else)
All that to say (i) yes, this is known and (ii) it needs more investigation and work
Victor Adossi said:
Hey there, thanks for noting the results of your exploration -- do you happen to have some more details on the use case you were benchmarking and/or some code for a reproduction or benchmark?
While I'm sure others may chime in with speculation, there is so much that could possibly contribute to performance differences in setup, execution, use case, and more that I don't think there's much anyone could meaningfully comment on with your current comment alone!
My Test code is simple, just linear computations.
use glam::{Mat3A, Vec2};
#[unsafe(no_mangle)]
extern "system" fn test() -> f32 {
let mut a = Vec2::new(0.0, 0.0);
for i in 0..1000000 {
let p = Mat3A::from_angle(i as f32);
a = p.transform_point2(Vec2::from_angle(i as f32));
}
a.x
}
And pulley costs about 0.5s, wasmi costs about 0.35s, cranelift costs about 0.018
Chris Fallin said:
There was a good amount of discussion around this in the Wasmtime and Cranelift meetings a few months ago when Alex Crichton was working on optimizing Pulley -- we landed on some hypotheses, and some TODOs to investigate further (basically: count opcodes executed to determine whether it was per-opcode dispatch overhead, or something else)
That's good news. The Wasmtime community is always the best. :+1:
Thanks for the sample program, and makes sense pulley is slower to me, I havent spent any time optimizing float ops in pulley so while things work they're not fast
For example there are no folded loads, branch and compare ops, or immediate folding
All easy ish to implement though! Could you open an issue for this?
Alex Crichton said:
All easy ish to implement though! Could you open an issue for this?
Issue created. Do you have a time plan?
Last updated: Dec 06 2025 at 05:03 UTC