Stream: general

Topic: pulley vs wasmi


view this post on Zulip Hoping White (Apr 07 2025 at 08:33):

It seems Pulley is 30% slower than Wasmi. I am curious why it is.

view this post on Zulip Victor Adossi (Apr 07 2025 at 14:03):

Hey there, thanks for noting the results of your exploration -- do you happen to have some more details on the use case you were benchmarking and/or some code for a reproduction or benchmark?

While I'm sure others may chime in with speculation, there is so much that could possibly contribute to performance differences in setup, execution, use case, and more that I don't think there's much anyone could meaningfully comment on with your current comment alone!

view this post on Zulip Chris Fallin (Apr 07 2025 at 17:22):

There was a good amount of discussion around this in the Wasmtime and Cranelift meetings a few months ago when @Alex Crichton was working on optimizing Pulley -- we landed on some hypotheses, and some TODOs to investigate further (basically: count opcodes executed to determine whether it was per-opcode dispatch overhead, or something else)

view this post on Zulip Chris Fallin (Apr 07 2025 at 17:22):

All that to say (i) yes, this is known and (ii) it needs more investigation and work

view this post on Zulip Hoping White (Apr 08 2025 at 01:53):

Victor Adossi said:

Hey there, thanks for noting the results of your exploration -- do you happen to have some more details on the use case you were benchmarking and/or some code for a reproduction or benchmark?

While I'm sure others may chime in with speculation, there is so much that could possibly contribute to performance differences in setup, execution, use case, and more that I don't think there's much anyone could meaningfully comment on with your current comment alone!

My Test code is simple, just linear computations.

use glam::{Mat3A, Vec2};

#[unsafe(no_mangle)]
extern "system" fn test() -> f32 {
    let mut a = Vec2::new(0.0, 0.0);
    for i in 0..1000000 {
        let p = Mat3A::from_angle(i as f32);
        a = p.transform_point2(Vec2::from_angle(i as f32));
    }
    a.x
}

And pulley costs about 0.5s, wasmi costs about 0.35s, cranelift costs about 0.018

view this post on Zulip Hoping White (Apr 08 2025 at 01:54):

Chris Fallin said:

There was a good amount of discussion around this in the Wasmtime and Cranelift meetings a few months ago when Alex Crichton was working on optimizing Pulley -- we landed on some hypotheses, and some TODOs to investigate further (basically: count opcodes executed to determine whether it was per-opcode dispatch overhead, or something else)

That's good news. The Wasmtime community is always the best. :+1:

view this post on Zulip Alex Crichton (Apr 08 2025 at 05:06):

Thanks for the sample program, and makes sense pulley is slower to me, I havent spent any time optimizing float ops in pulley so while things work they're not fast

view this post on Zulip Alex Crichton (Apr 08 2025 at 05:07):

For example there are no folded loads, branch and compare ops, or immediate folding

view this post on Zulip Alex Crichton (Apr 08 2025 at 05:07):

All easy ish to implement though! Could you open an issue for this?

view this post on Zulip Hoping White (Apr 08 2025 at 05:47):

Alex Crichton said:

All easy ish to implement though! Could you open an issue for this?

Issue created. Do you have a time plan?


Last updated: Dec 06 2025 at 05:03 UTC