jsturtevant opened issue #11506:
Test Case
a module/component with a simple c function that will trigger this:
int RoundToNearestInt() { float c = 1.331* 24.0; float r = lrintf(c); printf("rounded answer: %f\n", r); // should print 32 return r; }Steps to Reproduce
- Build wasmtime for an emedding using the
x86_64-unknown-none. Similar to how https://github.com/bytecodealliance/wasmtime/tree/release-36.0.0/examples/min-platform does it.
2.Then pre-compile module/component https://github.com/bytecodealliance/wasmtime/blob/ebce5d453464d3b5fcc6f9391a9b21fd6307844d/examples/min-platform/src/main.rs#L66-L82run the wasm
Expected Results
rounding is completed properly
Actual Results
incorrect result. Example above prints 31
Versions and Environment
Wasmtime version or commit: was using 34+
Operating system: Linux
Architecture: x86
Extra Info
Anything else you'd like to add?
I wasn't able to reproduce this with themin-platformexample. I believe this is because its actually compiling to a platform Linux platform?
jsturtevant added the bug label to Issue #11506.
cfallin commented on issue #11506:
Is it possible that you have a nonstandard FPU configuration in your embedded environment? (
MXCSRsettings for example)That's the only thing I can think of personally -- we otherwise generate exactly the same machine code for Wasm-on-x86_64 whether that's running in a standard Wasmtime on Linux or a
no_stdbuild (given the same compiler settings for ISA level, etc., but those are all orthogonal to platform).
jsturtevant commented on issue #11506:
Is it possible that you have a nonstandard FPU configuration in your embedded environment? (MXCSR settings for example)
I did initially check this, and we have the setting of
0x1f80.I believe I tracked it down to the fact rustc doesn't generate SSE and SSE2 instructions by default with the target
x86_64-unknown-nonewhere are cranelift does. So when wasmtime transitions through lib calls the wrong registers are set up when passing floating points arguements to the floating point builtin functions. Adding SSE and SSE2 to features in rustc fixed the issue.I guess I expected cranelift to generate similar code to rustc when targeting
x86_64-unknown-none. I also thought Cranelift might perform a compatibility check to ensure the target configuration aligns with rustc's assumptions. Or maybe provide some docs on what features should be enabled as a baseline when compiling wasmtime tox86_64-unknown-nonewith rustc to match the same target in cranelift. I realize there needs to be some level of knowledge here when tuning it further, but that initial miss match was hard to detect.Is there something else I might have missed that causes this mis-match?
alexcrichton commented on issue #11506:
Ah yes there's definitely an ABI mismatch here that I missed with
x86_64-unknown-none. Effectively we're not doing proper checking in Wasmtime that SSE/SSE2 are enabled at compile time. Cranelift assumes SSE/SSE2 are enabled, but thex86_64-unknown-noneis "soft float" which means that the libcalls aren't matched up in their ABIs.@jsturtevant questions for you:
- Do you want to forbid floats in guests? That's a configuration option we don't currently expose today but we could do so. Wasmtime would then only require SSE/SSE2 features if floats are enabled, and otherwise it would reject this module in question since it's using floats.
- Do you want to allow floats in guests? To do this I think that the
x86_64-unknown-nonetarget may not be suitable for your embedding. That target cannot have SSE/SSE2 enabled due to ABI constraints. You'd have to make a new (probably JSON-based) target which has the features enabled. That would then fix this issue because you'd be compiling for a target that does indeed have SSE/SSE2.The "naive" fix for this is to check that the host has SSE/SSE2 enabled. If we added that to Wasmtime though then all embeddings would cease to work with
x86_64-unknown-nonebecause that doesn't have these features enabled. Given that I'd ideally like to make sure there's a path to keeping things working on your end first.
jsturtevant commented on issue #11506:
We would like to enable floats in the guest and potentially in the future add support for more advance SIMD. I was able to get this working by setting
target-feature=-soft-float,+sse,+sse2in the.cargo/config.tomlto add SSE when building wasmtime for the targetx86_64-unknown-none. Is that a valid option or do I need to be going about it differently?
alexcrichton commented on issue #11506:
That technically can work for now but you should be getting warnings along the lines of:
warning: target feature `soft-float` must be enabled to ensure that the ABI of the current target can be implemented correctly | = note: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release! = note: for more information, see issue #116344 <https://github.com/rust-lang/rust/issues/116344> warning: unstable feature specified for `-Ctarget-feature`: `soft-float` | = note: this feature is not stably supported; its behavior can change in the future warning: 2 warnings emittedThe main problem is that these features affect the ABI which means the entire world needs to agree on that, and the Rust standard library wasn't built with that (unless you're using
-Zbuild-std). That's where a custom JSON target spec comes in of, for the entire target, forcibly enabling the SSE2 ABI for floats on the target. I was looking and I don't believe Rust has a pre-baked "bare metal" target which is allowed to use floats
jsturtevant commented on issue #11506:
That technically can work for now but you should be getting warnings along the lines of:
I saw that and reading through the issue saw it seem to be related to
i686which we currently don't target but maybe I read this wrong or didn't understand all the implications. Are we going to see other ABI issues with these settings?
alexcrichton commented on issue #11506:
Ah yeah these warnings definitely affect x86_64 as well. This has to do with a thorny set of issues where more-or-less when you change the ABI the whole world has to agree on it. The
-Ctarget-feature=-soft-floatflag is effectively fundamentally incompatible with this because it means that you're only changing the ABI for part of the world (your crates) and not the whole world (e.g. the precompiled Rust standard library). This will cause issues if a float value is passed between the two, so for example iff32::from_bitsor something like that wasn't inlined (e.g. maybe you made it a function pointer) then the standard library would return the value in a GPR while your local compilation would expect it in a XMM register (due to differing ABIs).Effectively ABI-changing flags shouldn't have existed in the first place and ABI-changing things need to be part of target, not flags to a local compilation unit. Two ideas to solve this:
- We could add support to the "soft float" ABI to Cranelift. That way Cranelift would use a different ABI for the
x86_64-unknown-nonetarget and would match rustc. That would resolve the issue at hand, and Cranelift could still otherwise use floats everywhere to its hearts content. We would still need to perform a runtime check that sse/sse2 are present in Wasmtime, but that wouldn't be hard to add.- You could switch to using a JSON target which enables SSE/SSE2 and disables soft-float. That would mean you'd have to use
-Zbuild-stdplus a JSON target spec, neither of which are stable in Rust right now. That would update Rust codegen to match Cranelift's ABI, and we would again still need to double-check SSE/SSE2 at runtime in Wasmtime.Ideally what would happen today is we would add a check into Wasmtime about "which float ABI is being used?" and assert that it's not "soft". If it's "soft" we would emit a first-class error preventing a
Modulefrom being created that is allowed to use floats because it would mean that Cranelift's ABI is mismatched with Wasmtime's ABI.@jsturtevant how viable is it to use
-Zbuild-stdplus a custom JSON target spec? If not it might be worth poking around Cranelift to see how hard it would be to implement a "soft float" variant of the system-v ABI
cfallin commented on issue #11506:
Hmm -- I am realizing that we internally have the same issue with our "Wasmtime in weird embedded environment" use-case. We are technically inside a Linux process but need to avoid syscalls, so we build
x86_64-unknown-noneas well. Hardfloat is fine though -- we expect the full suite of SIMD instructions to work. I had blindly copied themin-platformexample previously. We also should really use stable Rust -- nightly for-Zbuild-stdis going to be a tough pill to swallow. What do you recommend @alexcrichton ?
jsturtevant commented on issue #11506:
Ah yeah these warnings definitely affect x86_64 as well. This has to do with a thorny set of issues where more-or-less when you change the ABI the whole world has to agree on it. The -Ctarget-feature=-soft-float flag is effectively fundamentally incompatible with this because it means that you're only changing the ABI for part of the world (your crates) and not the whole world (e.g. the precompiled Rust standard library). This will cause issues if a float value is passed between the two, so for example if f32::from_bits or something like that wasn't inlined (e.g. maybe you made it a function pointer) then the standard library would return the value in a GPR while your local compilation would expect it in a XMM register (due to differing ABIs).
this makes alot of sense and thanks for taking the time to clearly explain. I thought
-Ctarget-feature=-soft-floatwas ok because of the comment in https://doc.rust-lang.org/rustc/targets/known-issues.html which says soft floats are off in rust core:Using software emulated floats ("soft-floats") disables usage of xmm registers, but parts of Rust's core libraries (e.g. std::f32 or std::f64) are compiled without soft-floats and expect parameters to be passed in xmm registers.
jsturtevant commented on issue #11506:
Ideally what would happen today is we would add a check into Wasmtime about "which float ABI is being used?" and assert that it's not "soft". If it's "soft" we would emit a first-class error preventing a Module from being created that is allowed to use floats because it would mean that Cranelift's ABI is mismatched with Wasmtime's ABI.
Isn't
-Ctarget-feature=-soft-floatturning soft-float off? so those abi's would match?
alexcrichton commented on issue #11506:
@jsturtevant ah I think that documentation might be slightly confusing, but what I believe that's trying to say is that libcore uses floats, and for
x86_64-unknown-noneit's compiled with "soft floats" meaning it's not using xmm registers. If you were then to compile your code with "hard floats" instead, that would result in an ABI mismatch if you tried to call float functions in libcore/libstd.Isn't -Ctarget-feature=-soft-float turning soft-float off?
Sort of, sort of not. The main reason this doesn't work is it only affects your local compilation unit not others. So for example if you tried to communicate with libcore (which wasn't compiled with this) you'd get the same ABI mismatch. Soft float is also weird in LLVM since there's also
-Csoft-float=nto rustc and it's specified in "weird ways" other than just-Ctarget-feature. AFAIK there's a few ways to configure it and they're not all quite right. Regardless though you can't escape from "libcore is compiled differently than your code".What do you recommend @alexcrichton ?
Ah good point! Given the constraint of stable Rust here's two possible ideas (one more to add on the one I had above)
- Implement the soft-float ABI in Cranelift and update Wasmtime to enable this ABI when compiling modules for x86_64-unknown-none. This would only affect the ABI (floats in GPRs, not in XMMs) and wouldn't affect generated code (which would still use XMMs and float-related instructions). Wasmtime would then additionally check at runtime that SSE and SSE2 are detected as otherwise the Cranelift-generated code is invalid.
- Update the ABI of libcalls to avoid floats. For example we could pass things on the stack or in GPRs manually. That would be a Wasmtime-local fix and would be sufficient I think since the only time floats are in registers in wasm<->host transitions is in libcalls. For this we'd probably want to add some sort of assertion in Cranelift that floats aren't actually used on the ABI at all (e.g. still have a "soft float" feature but instead of implementing it we just assert we don't need to implement it)
jsturtevant commented on issue #11506:
Sort of, sort of not. The main reason this doesn't work is it only affects your local compilation unit not others. So for example if you tried to communicate with libcore (which wasn't compiled with this) you'd get the same ABI mismatch.
Got it, our code has it off but we didn't compile libcore with those same settings on the
x86_64-unknown-noneso libcore would have it on. We would need to compile libcore with the same settings via-Zbuild-std. Thanks again for the patience and explanations!We also would like to take advantage of the full suite of SIMD.
I might not have enough understanding of the tradeoff here but my rather naive and initial thought is the option
(2) Update the ABI of libcalls to avoid floats.seems reasonable for being able to use Rust stable.
alexcrichton commented on issue #11506:
Ok I've managed to reproduce this by applying the following diff on top of https://github.com/bytecodealliance/wasmtime/pull/11516:
<details>
commit b1bf88344905ac4813ec4a7bce58e475678b175f Author: Alex Crichton <alex@alexcrichton.com> Date: Fri Aug 22 17:15:17 2025 -0700 wip diff --git a/examples/min-platform/embedding/src/lib.rs b/examples/min-platform/embedding/src/lib.rs index 460ea5d2c8..9448c3785e 100644 --- a/examples/min-platform/embedding/src/lib.rs +++ b/examples/min-platform/embedding/src/lib.rs @@ -4,7 +4,7 @@ extern crate alloc; use alloc::string::ToString; -use anyhow::Result; +use anyhow::{Result, ensure}; use core::ptr; use wasmtime::{Engine, Instance, Linker, Module, Store}; @@ -29,6 +29,8 @@ pub unsafe extern "C" fn run( simple_add_size: usize, simple_host_fn_module: *const u8, simple_host_fn_size: usize, + simple_floats_module: *const u8, + simple_floats_size: usize, ) -> usize { unsafe { let buf = core::slice::from_raw_parts_mut(error_buf, error_size); @@ -36,7 +38,8 @@ pub unsafe extern "C" fn run( let simple_add = core::slice::from_raw_parts(simple_add_module, simple_add_size); let simple_host_fn = core::slice::from_raw_parts(simple_host_fn_module, simple_host_fn_size); - match run_result(smoke, simple_add, simple_host_fn) { + let simple_floats = core::slice::from_raw_parts(simple_floats_module, simple_floats_size); + match run_result(smoke, simple_add, simple_host_fn, simple_floats) { Ok(()) => 0, Err(e) => { let msg = format!("{e:?}"); @@ -52,10 +55,12 @@ fn run_result( smoke_module: &[u8], simple_add_module: &[u8], simple_host_fn_module: &[u8], + simple_floats_module: &[u8], ) -> Result<()> { smoke(smoke_module)?; simple_add(simple_add_module)?; simple_host_fn(simple_host_fn_module)?; + simple_floats(simple_floats_module)?; Ok(()) } @@ -78,7 +83,7 @@ fn simple_add(module: &[u8]) -> Result<()> { let mut store = Store::new(&engine, ()); let instance = Linker::new(&engine).instantiate(&mut store, &module)?; let func = instance.get_typed_func::<(u32, u32), u32>(&mut store, "add")?; - assert_eq!(func.call(&mut store, (2, 3))?, 5); + ensure!(func.call(&mut store, (2, 3))? == 5); Ok(()) } @@ -93,7 +98,20 @@ fn simple_host_fn(module: &[u8]) -> Result<()> { let mut store = Store::new(&engine, ()); let instance = linker.instantiate(&mut store, &module)?; let func = instance.get_typed_func::<(u32, u32, u32), u32>(&mut store, "add_and_mul")?; - assert_eq!(func.call(&mut store, (2, 3, 4))?, 10); + ensure!(func.call(&mut store, (2, 3, 4))? == 10); + Ok(()) +} + +fn simple_floats(module: &[u8]) -> Result<()> { + let engine = Engine::default(); + let module = match deserialize(&engine, module)? { + Some(module) => module, + None => panic!(), + }; + let mut store = Store::new(&engine, ()); + let instance = Linker::new(&engine).instantiate(&mut store, &module)?; + let func = instance.get_typed_func::<(f32, f32), f32>(&mut store, "frob")?; + ensure!(func.call(&mut store, (1.4, 3.2))? == 5.); Ok(()) } diff --git a/examples/min-platform/src/main.rs b/examples/min-platform/src/main.rs index fd1867c6d5..7de14846da 100644 --- a/examples/min-platform/src/main.rs +++ b/examples/min-platform/src/main.rs @@ -95,6 +95,16 @@ fn main() -> Result<()> { ) "#, )?; + let simple_floats = engine.precompile_module( + br#" + (module + (func (export "frob") (param f32 f32) (result f32) + (f32.ceil (local.get 0)) + (f32.floor (local.get 0)) + f32.add) + ) + "#, + )?; // Next is an example of running this embedding, which also serves as test // that basic functionality actually works. @@ -134,6 +144,8 @@ fn main() -> Result<()> { usize, *const u8, usize, + *const u8, + usize, ) -> usize, > = lib .get(b"run") @@ -149,6 +161,8 @@ fn main() -> Result<()> { simple_add.len(), simple_host_fn.as_ptr(), simple_host_fn.len(), + simple_floats.as_ptr(), + simple_floats.len(), ); error_buf.set_len(len);</details>
that yields:
$ MIN_PLATFORM_TEST_DISABLE_WASI=1 WASMTIME_SIGNALS_BASED_TRAPS=1 ./build.sh x86_64-unknown-none ... Error: Condition failed: `func.call(&mut store, (1.4, 3.2))? == 5.` (2.8 vs 5.0)and some simple debugging shows the libcalls getting invalid arguments.
Personally I'm tempted to (a) add a "soft float" feature to Cranelift and assert in ABI code that if this feature is enabled that no floats are used, and then (b) update libcalls to unconditionally use GPR arguments/results instead of XMM args/results. (aka do the mov-xmm-to-gpr in Cranelift). Then Wasmtime would configure this soft float flag for the
x86_64-unknown-nonetarget and would additionally add runtime checks for SSE/SSE2 which it lacks today.That should get everything working between compiled wasm code and Wasmtime. Libcalls are slow but they always are. Libcalls are easily avoided as well by enabling more CPU features (e.g. up to SSE4.1 or even up to AVX). Given that I'm not keen on investing a lot of effort into this when performance is basically secondary.
cfallin commented on issue #11506:
A few thoughts on "real softfloat":
- In our embedding it's looking like we'll actually go the way of a custom target and bite the bullet on a nightly requirement for
-Zbuild-std-- the key factor for us is that we're also linking with other code (a bunch of legacy C) that isn't using a softfloat ABI, so our hand is basically forced. I suspect others might be in the same boat...- In a world that does require softfloat, like the Linux kernel, I don't know if it's safe to even use XMMs while still upholding an "ABI at call boundaries" invariant and saving them as clobbers -- at the very least it interacts weirdly with the lazy state-switching stuff that x86 does and the Linux kernel optionally uses. A little experimentation (Godbolt link) shows that Rust compiled to
x86_64-unknown-noneuses helpers like__addf3even for a simplea + bwherea: f64, b: f64(Yikes!) I'm not sure that this is a terribly interesting target configuration for the folks in this thread, at least?I wonder if it would make sense to statically compile-error if the target has softfloat configured?
cfallin edited a comment on issue #11506:
A few thoughts on "real softfloat":
- In our embedding it's looking like we'll actually go the way of a custom target and bite the bullet on a nightly requirement for
-Zbuild-std-- the key factor for us is that we're also linking with other code (a bunch of legacy C) that isn't using a softfloat ABI, so our hand is basically forced. I suspect others might be in the same boat...- In a world that does require softfloat, like the Linux kernel, I don't know if it's safe to even use XMMs while still upholding an "ABI at call boundaries" invariant and saving them as clobbers -- at the very least it interacts weirdly with the lazy state-switching stuff that x86 does and the Linux kernel optionally uses. A little experimentation (Godbolt link) shows that Rust compiled to
x86_64-unknown-noneuses helpers like__adddf3even for a simplea + bwherea: f64, b: f64(Yikes!) I'm not sure that this is a terribly interesting target configuration for the folks in this thread, at least?I wonder if it would make sense to statically compile-error if the target has softfloat configured?
bjorn3 commented on issue #11506:
For the Linux kernel there are functions you can use to delimit a section of code where using xmm registers are used. This will then handle saving and restoring the registers as needed.
jsturtevant commented on issue #11506:
Libcalls are easily avoided as well by enabling more CPU features (e.g. up to SSE4.1 or even up to AVX). Given that I'm not keen on investing a lot of effort into this when performance is basically secondary.
This means enabling more cpu features in cranelifts compilation? So in this case it would mean that the wasm module doesn't transition through a libcall to do the rounding? This might be an option for us. Is there a way to detect when libcalls are likely to happen? I am wonder if there might be other edge cases that we want to avoid.
Given that I'm not keen on investing a lot of effort into this when performance is basically secondary.
This makes sense.
In our embedding it's looking like we'll actually go the way of a custom target and bite the bullet on a nightly requirement for -Zbuild-std -- the key factor for us is that we're also linking with other code (a bunch of legacy C) that isn't using a softfloat ABI, so our hand is basically forced. I suspect others might be in the same boat...
Are you linking wasmtime code with c? or the module/component?
cfallin commented on issue #11506:
Are you linking wasmtime code with c?
Yes! Wasmtime
no_stddropped into a big legacy C codebase. It technically runs on a Linux base but avoids all syscalls for Reasons, so we've used all of the embedded platform functionality in Wasmtime (thanks Alex!) to make this work. Thus our interest in this topic!
alexcrichton commented on issue #11506:
@jsturtevant correct yeah, if you enable everything up through SSE4.1 it should avoid almost all libcalls with floats. One more feature is FMA as well, but that's only for relaxed-simd support.
syntactically commented on issue #11506:
if you enable everything up through SSE4.1 it should avoid almost all libcalls with floats
Just to confirm my understanding here---is this ABI mismatch not a problem with normal import/export trampolines (only with the libcall trampolines) because those tend to pass arguments on the Cranelift side in memory?
I'm not sure that this is a terribly interesting target configuration for the folks in this thread, at least?
The kernel target configuration for rustc is definitely interesting to us, but that is a different discussion so I won't go further off topic here :)
Personally I'm tempted to (a) add a "soft float" feature to Cranelift and assert in ABI code that if this feature is enabled that no floats are used, and then (b) update libcalls to unconditionally use GPR arguments/results instead of XMM args/results. (aka do the mov-xmm-to-gpr in Cranelift). Then Wasmtime would configure this soft float flag for the x86_64-unknown-none target and would additionally add runtime checks for SSE/SSE2 which it lacks today.
I think that broadly makes sense, although where exactly would the new ABI flag make sure that "no floats are used"?
Separately: insofar as the Cranelift ABI ends up being different from the host ABI, do
wasmtime_{set,long}jmp()need to preserve the extended cranelift ABI or just the host ABI? I think that currently every longjmp ends up trapping, so perhaps only the latter?
alexcrichton commented on issue #11506:
is this ABI mismatch not a problem with normal import/export trampolines
Correct yeah, they only use GPRs in the ABI and in-memory bits contain floats.
where exactly would the new ABI flag make sure that "no floats are used"?
My thinking is that this'd happen somewhere in the ABI code in Cranelift where if the flag was set it'd assert that
f32andf64weren't used in the signature of a function at all. So an internal assert in Cranelift which we'd be careful to avoid in Wasmtime (e.g. by reporting an error at compile time or changing things internally)do wasmtime_{set,long}jmp() need to preserve the extended cranelift ABI or just the host ABI
That's a good question! Something I hadn't really considered before but I believe the answer is "both". This adds more fuel to the fire to me of burning down these functions entirely...
alexcrichton commented on issue #11506:
I pushed up https://github.com/bytecodealliance/wasmtime/pull/11553 to resolve this issue. Basically it'll make loading code on
x86_64-unknown-nonea hard error by default. The error has an escape hatch which documents some of the hazards embedders need to look out for.
alexcrichton closed issue #11506:
Test Case
a module/component with a simple c function that will trigger this:
int RoundToNearestInt() { float c = 1.331* 24.0; float r = lrintf(c); printf("rounded answer: %f\n", r); // should print 32 return r; }Steps to Reproduce
- Build wasmtime for an emedding using the
x86_64-unknown-none. Similar to how https://github.com/bytecodealliance/wasmtime/tree/release-36.0.0/examples/min-platform does it.
2.Then pre-compile module/component https://github.com/bytecodealliance/wasmtime/blob/ebce5d453464d3b5fcc6f9391a9b21fd6307844d/examples/min-platform/src/main.rs#L66-L82run the wasm
Expected Results
rounding is completed properly
Actual Results
incorrect result. Example above prints 31
Versions and Environment
Wasmtime version or commit: was using 34+
Operating system: Linux
Architecture: x86
Extra Info
Anything else you'd like to add?
I wasn't able to reproduce this with themin-platformexample. I believe this is because its actually compiling to a platform Linux platform?
Last updated: Dec 06 2025 at 06:05 UTC