lazytiger opened issue #10545:
use glam::{Mat3A, Vec2}; #[unsafe(no_mangle)] extern "system" fn test() -> f32 { let mut a = Vec2::new(0.0, 0.0); for i in 0..1000000 { let p = Mat3A::from_angle(i as f32); a = p.transform_point2(Vec2::from_angle(i as f32)); } a.x }The above code runs 30% slower with pulley than in wasmi.
alexcrichton added the pulley label to Issue #10545.
github-actions[bot] commented on issue #10545:
Subscribe to Label Action
cc @fitzgen
<details>
This issue or pull request has been labeled: "pulley"Thus the following users have been cc'd because of the following labels:
- fitzgen: pulley
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
alexcrichton commented on issue #10545:
Thanks for this! This is expected right now in the sense that non-integer related operations basically haven't been optimized at all. There's a fair amount of low-hanging fruit here:
- There are no compare-and-branch instructions for floats, only integers.
- There are no immediate-related optimizations for floats, such as add-reg-and-immediate.
- Pulley's opcode design right now is 1-byte "base" opcodes and 3-byte "extended" opcodes, and all float ops are 3-byte extended ops meaning they take 2 turns of the interpreter loop to process.
The first and second are mostly just a matter of adding more instructions and adding Cranelift lowerings in a similar manner to integer lowerings. The second is probably going to require Pulley to switch to a 2-byte opcode namespace instead of a simple/extended split. That is a larger refactoring which should also be measured to see the impact of integer ops.
I'll note that I won't personally have time to work on this in the near future, but I wanted to at least write these down.
Last updated: Dec 06 2025 at 07:03 UTC