Stream: git-wasmtime

Topic: wasmtime / issue #11506 Compiling wasmtime for the embedd...


view this post on Zulip Wasmtime GitHub notifications bot (Aug 22 2025 at 00:36):

jsturtevant opened issue #11506:

Test Case

a module/component with a simple c function that will trigger this:

int RoundToNearestInt()
{
    float c = 1.331* 24.0;
    float r = lrintf(c);
    printf("rounded answer: %f\n", r); // should print 32
    return r;
}

Steps to Reproduce

  1. Build wasmtime for an emedding using the x86_64-unknown-none. Similar to how https://github.com/bytecodealliance/wasmtime/tree/release-36.0.0/examples/min-platform does it.
    2.Then pre-compile module/component https://github.com/bytecodealliance/wasmtime/blob/ebce5d453464d3b5fcc6f9391a9b21fd6307844d/examples/min-platform/src/main.rs#L66-L82

run the wasm

Expected Results

rounding is completed properly

Actual Results

incorrect result. Example above prints 31

Versions and Environment

Wasmtime version or commit: was using 34+

Operating system: Linux

Architecture: x86

Extra Info

Anything else you'd like to add?
I wasn't able to reproduce this with the min-platform example. I believe this is because its actually compiling to a platform Linux platform?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 22 2025 at 00:36):

jsturtevant added the bug label to Issue #11506.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 22 2025 at 00:52):

cfallin commented on issue #11506:

Is it possible that you have a nonstandard FPU configuration in your embedded environment? (MXCSR settings for example)

That's the only thing I can think of personally -- we otherwise generate exactly the same machine code for Wasm-on-x86_64 whether that's running in a standard Wasmtime on Linux or a no_std build (given the same compiler settings for ISA level, etc., but those are all orthogonal to platform).

view this post on Zulip Wasmtime GitHub notifications bot (Aug 22 2025 at 15:40):

jsturtevant commented on issue #11506:

Is it possible that you have a nonstandard FPU configuration in your embedded environment? (MXCSR settings for example)

I did initially check this, and we have the setting of 0x1f80.

I believe I tracked it down to the fact rustc doesn't generate SSE and SSE2 instructions by default with the target x86_64-unknown-none where are cranelift does. So when wasmtime transitions through lib calls the wrong registers are set up when passing floating points arguements to the floating point builtin functions. Adding SSE and SSE2 to features in rustc fixed the issue.

I guess I expected cranelift to generate similar code to rustc when targeting x86_64-unknown-none. I also thought Cranelift might perform a compatibility check to ensure the target configuration aligns with rustc's assumptions. Or maybe provide some docs on what features should be enabled as a baseline when compiling wasmtime to x86_64-unknown-none with rustc to match the same target in cranelift. I realize there needs to be some level of knowledge here when tuning it further, but that initial miss match was hard to detect.

Is there something else I might have missed that causes this mis-match?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 22 2025 at 15:52):

alexcrichton commented on issue #11506:

Ah yes there's definitely an ABI mismatch here that I missed with x86_64-unknown-none. Effectively we're not doing proper checking in Wasmtime that SSE/SSE2 are enabled at compile time. Cranelift assumes SSE/SSE2 are enabled, but the x86_64-unknown-none is "soft float" which means that the libcalls aren't matched up in their ABIs.

@jsturtevant questions for you:

The "naive" fix for this is to check that the host has SSE/SSE2 enabled. If we added that to Wasmtime though then all embeddings would cease to work with x86_64-unknown-none because that doesn't have these features enabled. Given that I'd ideally like to make sure there's a path to keeping things working on your end first.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 22 2025 at 16:00):

jsturtevant commented on issue #11506:

We would like to enable floats in the guest and potentially in the future add support for more advance SIMD. I was able to get this working by setting target-feature=-soft-float,+sse,+sse2 in the .cargo/config.toml to add SSE when building wasmtime for the target x86_64-unknown-none. Is that a valid option or do I need to be going about it differently?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 22 2025 at 16:06):

alexcrichton commented on issue #11506:

That technically can work for now but you should be getting warnings along the lines of:

warning: target feature `soft-float` must be enabled to ensure that the ABI of the current target can be implemented correctly
  |
  = note: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
  = note: for more information, see issue #116344 <https://github.com/rust-lang/rust/issues/116344>

warning: unstable feature specified for `-Ctarget-feature`: `soft-float`
  |
  = note: this feature is not stably supported; its behavior can change in the future

warning: 2 warnings emitted

The main problem is that these features affect the ABI which means the entire world needs to agree on that, and the Rust standard library wasn't built with that (unless you're using -Zbuild-std). That's where a custom JSON target spec comes in of, for the entire target, forcibly enabling the SSE2 ABI for floats on the target. I was looking and I don't believe Rust has a pre-baked "bare metal" target which is allowed to use floats

view this post on Zulip Wasmtime GitHub notifications bot (Aug 22 2025 at 16:16):

jsturtevant commented on issue #11506:

That technically can work for now but you should be getting warnings along the lines of:

I saw that and reading through the issue saw it seem to be related to i686 which we currently don't target but maybe I read this wrong or didn't understand all the implications. Are we going to see other ABI issues with these settings?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 22 2025 at 16:30):

alexcrichton commented on issue #11506:

Ah yeah these warnings definitely affect x86_64 as well. This has to do with a thorny set of issues where more-or-less when you change the ABI the whole world has to agree on it. The -Ctarget-feature=-soft-float flag is effectively fundamentally incompatible with this because it means that you're only changing the ABI for part of the world (your crates) and not the whole world (e.g. the precompiled Rust standard library). This will cause issues if a float value is passed between the two, so for example if f32::from_bits or something like that wasn't inlined (e.g. maybe you made it a function pointer) then the standard library would return the value in a GPR while your local compilation would expect it in a XMM register (due to differing ABIs).

Effectively ABI-changing flags shouldn't have existed in the first place and ABI-changing things need to be part of target, not flags to a local compilation unit. Two ideas to solve this:

Ideally what would happen today is we would add a check into Wasmtime about "which float ABI is being used?" and assert that it's not "soft". If it's "soft" we would emit a first-class error preventing a Module from being created that is allowed to use floats because it would mean that Cranelift's ABI is mismatched with Wasmtime's ABI.

@jsturtevant how viable is it to use -Zbuild-std plus a custom JSON target spec? If not it might be worth poking around Cranelift to see how hard it would be to implement a "soft float" variant of the system-v ABI

view this post on Zulip Wasmtime GitHub notifications bot (Aug 22 2025 at 16:36):

cfallin commented on issue #11506:

Hmm -- I am realizing that we internally have the same issue with our "Wasmtime in weird embedded environment" use-case. We are technically inside a Linux process but need to avoid syscalls, so we build x86_64-unknown-none as well. Hardfloat is fine though -- we expect the full suite of SIMD instructions to work. I had blindly copied the min-platform example previously. We also should really use stable Rust -- nightly for -Zbuild-std is going to be a tough pill to swallow. What do you recommend @alexcrichton ?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 22 2025 at 16:48):

jsturtevant commented on issue #11506:

Ah yeah these warnings definitely affect x86_64 as well. This has to do with a thorny set of issues where more-or-less when you change the ABI the whole world has to agree on it. The -Ctarget-feature=-soft-float flag is effectively fundamentally incompatible with this because it means that you're only changing the ABI for part of the world (your crates) and not the whole world (e.g. the precompiled Rust standard library). This will cause issues if a float value is passed between the two, so for example if f32::from_bits or something like that wasn't inlined (e.g. maybe you made it a function pointer) then the standard library would return the value in a GPR while your local compilation would expect it in a XMM register (due to differing ABIs).

this makes alot of sense and thanks for taking the time to clearly explain. I thought -Ctarget-feature=-soft-float was ok because of the comment in https://doc.rust-lang.org/rustc/targets/known-issues.html which says soft floats are off in rust core:

Using software emulated floats ("soft-floats") disables usage of xmm registers, but parts of Rust's core libraries (e.g. std::f32 or std::f64) are compiled without soft-floats and expect parameters to be passed in xmm registers.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 22 2025 at 16:56):

jsturtevant commented on issue #11506:

Ideally what would happen today is we would add a check into Wasmtime about "which float ABI is being used?" and assert that it's not "soft". If it's "soft" we would emit a first-class error preventing a Module from being created that is allowed to use floats because it would mean that Cranelift's ABI is mismatched with Wasmtime's ABI.

Isn't -Ctarget-feature=-soft-float turning soft-float off? so those abi's would match?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 22 2025 at 16:59):

alexcrichton commented on issue #11506:

@jsturtevant ah I think that documentation might be slightly confusing, but what I believe that's trying to say is that libcore uses floats, and for x86_64-unknown-none it's compiled with "soft floats" meaning it's not using xmm registers. If you were then to compile your code with "hard floats" instead, that would result in an ABI mismatch if you tried to call float functions in libcore/libstd.

Isn't -Ctarget-feature=-soft-float turning soft-float off?

Sort of, sort of not. The main reason this doesn't work is it only affects your local compilation unit not others. So for example if you tried to communicate with libcore (which wasn't compiled with this) you'd get the same ABI mismatch. Soft float is also weird in LLVM since there's also -Csoft-float=n to rustc and it's specified in "weird ways" other than just -Ctarget-feature. AFAIK there's a few ways to configure it and they're not all quite right. Regardless though you can't escape from "libcore is compiled differently than your code".

What do you recommend @alexcrichton ?

Ah good point! Given the constraint of stable Rust here's two possible ideas (one more to add on the one I had above)

  1. Implement the soft-float ABI in Cranelift and update Wasmtime to enable this ABI when compiling modules for x86_64-unknown-none. This would only affect the ABI (floats in GPRs, not in XMMs) and wouldn't affect generated code (which would still use XMMs and float-related instructions). Wasmtime would then additionally check at runtime that SSE and SSE2 are detected as otherwise the Cranelift-generated code is invalid.
  2. Update the ABI of libcalls to avoid floats. For example we could pass things on the stack or in GPRs manually. That would be a Wasmtime-local fix and would be sufficient I think since the only time floats are in registers in wasm<->host transitions is in libcalls. For this we'd probably want to add some sort of assertion in Cranelift that floats aren't actually used on the ABI at all (e.g. still have a "soft float" feature but instead of implementing it we just assert we don't need to implement it)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 22 2025 at 17:55):

jsturtevant commented on issue #11506:

Sort of, sort of not. The main reason this doesn't work is it only affects your local compilation unit not others. So for example if you tried to communicate with libcore (which wasn't compiled with this) you'd get the same ABI mismatch.

Got it, our code has it off but we didn't compile libcore with those same settings on the x86_64-unknown-none so libcore would have it on. We would need to compile libcore with the same settings via -Zbuild-std. Thanks again for the patience and explanations!

We also would like to take advantage of the full suite of SIMD.

I might not have enough understanding of the tradeoff here but my rather naive and initial thought is the option (2) Update the ABI of libcalls to avoid floats. seems reasonable for being able to use Rust stable.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 23 2025 at 00:47):

alexcrichton commented on issue #11506:

Ok I've managed to reproduce this by applying the following diff on top of https://github.com/bytecodealliance/wasmtime/pull/11516:

<details>

commit b1bf88344905ac4813ec4a7bce58e475678b175f
Author: Alex Crichton <alex@alexcrichton.com>
Date:   Fri Aug 22 17:15:17 2025 -0700

    wip

diff --git a/examples/min-platform/embedding/src/lib.rs b/examples/min-platform/embedding/src/lib.rs
index 460ea5d2c8..9448c3785e 100644
--- a/examples/min-platform/embedding/src/lib.rs
+++ b/examples/min-platform/embedding/src/lib.rs
@@ -4,7 +4,7 @@
 extern crate alloc;

 use alloc::string::ToString;
-use anyhow::Result;
+use anyhow::{Result, ensure};
 use core::ptr;
 use wasmtime::{Engine, Instance, Linker, Module, Store};

@@ -29,6 +29,8 @@ pub unsafe extern "C" fn run(
     simple_add_size: usize,
     simple_host_fn_module: *const u8,
     simple_host_fn_size: usize,
+    simple_floats_module: *const u8,
+    simple_floats_size: usize,
 ) -> usize {
     unsafe {
         let buf = core::slice::from_raw_parts_mut(error_buf, error_size);
@@ -36,7 +38,8 @@ pub unsafe extern "C" fn run(
         let simple_add = core::slice::from_raw_parts(simple_add_module, simple_add_size);
         let simple_host_fn =
             core::slice::from_raw_parts(simple_host_fn_module, simple_host_fn_size);
-        match run_result(smoke, simple_add, simple_host_fn) {
+        let simple_floats = core::slice::from_raw_parts(simple_floats_module, simple_floats_size);
+        match run_result(smoke, simple_add, simple_host_fn, simple_floats) {
             Ok(()) => 0,
             Err(e) => {
                 let msg = format!("{e:?}");
@@ -52,10 +55,12 @@ fn run_result(
     smoke_module: &[u8],
     simple_add_module: &[u8],
     simple_host_fn_module: &[u8],
+    simple_floats_module: &[u8],
 ) -> Result<()> {
     smoke(smoke_module)?;
     simple_add(simple_add_module)?;
     simple_host_fn(simple_host_fn_module)?;
+    simple_floats(simple_floats_module)?;
     Ok(())
 }

@@ -78,7 +83,7 @@ fn simple_add(module: &[u8]) -> Result<()> {
     let mut store = Store::new(&engine, ());
     let instance = Linker::new(&engine).instantiate(&mut store, &module)?;
     let func = instance.get_typed_func::<(u32, u32), u32>(&mut store, "add")?;
-    assert_eq!(func.call(&mut store, (2, 3))?, 5);
+    ensure!(func.call(&mut store, (2, 3))? == 5);
     Ok(())
 }

@@ -93,7 +98,20 @@ fn simple_host_fn(module: &[u8]) -> Result<()> {
     let mut store = Store::new(&engine, ());
     let instance = linker.instantiate(&mut store, &module)?;
     let func = instance.get_typed_func::<(u32, u32, u32), u32>(&mut store, "add_and_mul")?;
-    assert_eq!(func.call(&mut store, (2, 3, 4))?, 10);
+    ensure!(func.call(&mut store, (2, 3, 4))? == 10);
+    Ok(())
+}
+
+fn simple_floats(module: &[u8]) -> Result<()> {
+    let engine = Engine::default();
+    let module = match deserialize(&engine, module)? {
+        Some(module) => module,
+        None => panic!(),
+    };
+    let mut store = Store::new(&engine, ());
+    let instance = Linker::new(&engine).instantiate(&mut store, &module)?;
+    let func = instance.get_typed_func::<(f32, f32), f32>(&mut store, "frob")?;
+    ensure!(func.call(&mut store, (1.4, 3.2))? == 5.);
     Ok(())
 }

diff --git a/examples/min-platform/src/main.rs b/examples/min-platform/src/main.rs
index fd1867c6d5..7de14846da 100644
--- a/examples/min-platform/src/main.rs
+++ b/examples/min-platform/src/main.rs
@@ -95,6 +95,16 @@ fn main() -> Result<()> {
             )
         "#,
     )?;
+    let simple_floats = engine.precompile_module(
+        br#"
+            (module
+                (func (export "frob") (param f32 f32) (result f32)
+                    (f32.ceil (local.get 0))
+                    (f32.floor (local.get 0))
+                    f32.add)
+            )
+        "#,
+    )?;

     // Next is an example of running this embedding, which also serves as test
     // that basic functionality actually works.
@@ -134,6 +144,8 @@ fn main() -> Result<()> {
                 usize,
                 *const u8,
                 usize,
+                *const u8,
+                usize,
             ) -> usize,
         > = lib
             .get(b"run")
@@ -149,6 +161,8 @@ fn main() -> Result<()> {
             simple_add.len(),
             simple_host_fn.as_ptr(),
             simple_host_fn.len(),
+            simple_floats.as_ptr(),
+            simple_floats.len(),
         );
         error_buf.set_len(len);

</details>

that yields:

$ MIN_PLATFORM_TEST_DISABLE_WASI=1 WASMTIME_SIGNALS_BASED_TRAPS=1 ./build.sh x86_64-unknown-none
...
Error: Condition failed: `func.call(&mut store, (1.4, 3.2))? == 5.` (2.8 vs 5.0)

and some simple debugging shows the libcalls getting invalid arguments.


Personally I'm tempted to (a) add a "soft float" feature to Cranelift and assert in ABI code that if this feature is enabled that no floats are used, and then (b) update libcalls to unconditionally use GPR arguments/results instead of XMM args/results. (aka do the mov-xmm-to-gpr in Cranelift). Then Wasmtime would configure this soft float flag for the x86_64-unknown-none target and would additionally add runtime checks for SSE/SSE2 which it lacks today.

That should get everything working between compiled wasm code and Wasmtime. Libcalls are slow but they always are. Libcalls are easily avoided as well by enabling more CPU features (e.g. up to SSE4.1 or even up to AVX). Given that I'm not keen on investing a lot of effort into this when performance is basically secondary.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 23 2025 at 01:05):

cfallin commented on issue #11506:

A few thoughts on "real softfloat":

I wonder if it would make sense to statically compile-error if the target has softfloat configured?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 23 2025 at 01:06):

cfallin edited a comment on issue #11506:

A few thoughts on "real softfloat":

I wonder if it would make sense to statically compile-error if the target has softfloat configured?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 23 2025 at 09:48):

bjorn3 commented on issue #11506:

For the Linux kernel there are functions you can use to delimit a section of code where using xmm registers are used. This will then handle saving and restoring the registers as needed.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 26 2025 at 16:15):

jsturtevant commented on issue #11506:

Libcalls are easily avoided as well by enabling more CPU features (e.g. up to SSE4.1 or even up to AVX). Given that I'm not keen on investing a lot of effort into this when performance is basically secondary.

This means enabling more cpu features in cranelifts compilation? So in this case it would mean that the wasm module doesn't transition through a libcall to do the rounding? This might be an option for us. Is there a way to detect when libcalls are likely to happen? I am wonder if there might be other edge cases that we want to avoid.

Given that I'm not keen on investing a lot of effort into this when performance is basically secondary.

This makes sense.

In our embedding it's looking like we'll actually go the way of a custom target and bite the bullet on a nightly requirement for -Zbuild-std -- the key factor for us is that we're also linking with other code (a bunch of legacy C) that isn't using a softfloat ABI, so our hand is basically forced. I suspect others might be in the same boat...

Are you linking wasmtime code with c? or the module/component?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 26 2025 at 16:23):

cfallin commented on issue #11506:

Are you linking wasmtime code with c?

Yes! Wasmtime no_std dropped into a big legacy C codebase. It technically runs on a Linux base but avoids all syscalls for Reasons, so we've used all of the embedded platform functionality in Wasmtime (thanks Alex!) to make this work. Thus our interest in this topic!

view this post on Zulip Wasmtime GitHub notifications bot (Aug 26 2025 at 16:46):

alexcrichton commented on issue #11506:

@jsturtevant correct yeah, if you enable everything up through SSE4.1 it should avoid almost all libcalls with floats. One more feature is FMA as well, but that's only for relaxed-simd support.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 27 2025 at 11:11):

syntactically commented on issue #11506:

if you enable everything up through SSE4.1 it should avoid almost all libcalls with floats

Just to confirm my understanding here---is this ABI mismatch not a problem with normal import/export trampolines (only with the libcall trampolines) because those tend to pass arguments on the Cranelift side in memory?

I'm not sure that this is a terribly interesting target configuration for the folks in this thread, at least?

The kernel target configuration for rustc is definitely interesting to us, but that is a different discussion so I won't go further off topic here :)

Personally I'm tempted to (a) add a "soft float" feature to Cranelift and assert in ABI code that if this feature is enabled that no floats are used, and then (b) update libcalls to unconditionally use GPR arguments/results instead of XMM args/results. (aka do the mov-xmm-to-gpr in Cranelift). Then Wasmtime would configure this soft float flag for the x86_64-unknown-none target and would additionally add runtime checks for SSE/SSE2 which it lacks today.

I think that broadly makes sense, although where exactly would the new ABI flag make sure that "no floats are used"?

Separately: insofar as the Cranelift ABI ends up being different from the host ABI, do wasmtime_{set,long}jmp() need to preserve the extended cranelift ABI or just the host ABI? I think that currently every longjmp ends up trapping, so perhaps only the latter?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 27 2025 at 14:47):

alexcrichton commented on issue #11506:

is this ABI mismatch not a problem with normal import/export trampolines

Correct yeah, they only use GPRs in the ABI and in-memory bits contain floats.

where exactly would the new ABI flag make sure that "no floats are used"?

My thinking is that this'd happen somewhere in the ABI code in Cranelift where if the flag was set it'd assert that f32 and f64 weren't used in the signature of a function at all. So an internal assert in Cranelift which we'd be careful to avoid in Wasmtime (e.g. by reporting an error at compile time or changing things internally)

do wasmtime_{set,long}jmp() need to preserve the extended cranelift ABI or just the host ABI

That's a good question! Something I hadn't really considered before but I believe the answer is "both". This adds more fuel to the fire to me of burning down these functions entirely...

view this post on Zulip Wasmtime GitHub notifications bot (Aug 27 2025 at 19:42):

alexcrichton commented on issue #11506:

I pushed up https://github.com/bytecodealliance/wasmtime/pull/11553 to resolve this issue. Basically it'll make loading code on x86_64-unknown-none a hard error by default. The error has an escape hatch which documents some of the hazards embedders need to look out for.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 28 2025 at 19:54):

alexcrichton closed issue #11506:

Test Case

a module/component with a simple c function that will trigger this:

int RoundToNearestInt()
{
    float c = 1.331* 24.0;
    float r = lrintf(c);
    printf("rounded answer: %f\n", r); // should print 32
    return r;
}

Steps to Reproduce

  1. Build wasmtime for an emedding using the x86_64-unknown-none. Similar to how https://github.com/bytecodealliance/wasmtime/tree/release-36.0.0/examples/min-platform does it.
    2.Then pre-compile module/component https://github.com/bytecodealliance/wasmtime/blob/ebce5d453464d3b5fcc6f9391a9b21fd6307844d/examples/min-platform/src/main.rs#L66-L82

run the wasm

Expected Results

rounding is completed properly

Actual Results

incorrect result. Example above prints 31

Versions and Environment

Wasmtime version or commit: was using 34+

Operating system: Linux

Architecture: x86

Extra Info

Anything else you'd like to add?
I wasn't able to reproduce this with the min-platform example. I believe this is because its actually compiling to a platform Linux platform?


Last updated: Dec 06 2025 at 06:05 UTC