candymate added the bug label to Issue #8179.
candymate added the fuzz-bug label to Issue #8179.
candymate opened issue #8179:
Test Case
// main.rs use wasmtime::*; fn main() -> Result<()> { let mut config = Config::default(); config.strategy(Strategy::Cranelift); config.cranelift_opt_level(OptLevel::None); config.cranelift_nan_canonicalization(true); // doesn't matter: both options show the same result let engine = Engine::new(&config)?; let wat = r#" (module (type (;0;) (func (param f64) (result f32))) (import "mem" "mem" (memory (;0;) 1)) (func (;0;) (type 0) (param f64) (result f32) local.get 0 f64.const -0x1 ;; anything with negative sign f64.copysign f32.demote_f64) (export "main" (func 0))) "#; let module = Module::new(&engine, wat)?; let mut store = Store::new(&engine, ()); let memory_ty = MemoryType::new(1, None); let memory = Memory::new(&mut store, memory_ty.clone())?; let instance = Instance::new(&mut store, &module, &[memory.into()])?; let main = instance.get_func(&mut store, "main") .expect("`main` was not an exported function"); let params = vec![ Val::F64(0x7ff8000000000000), // +nan (canonicalized) ]; let mut results = vec![Val::F32(0)]; println!("Opt level None: {:?}", main.call( &mut store, ¶ms, &mut results )); println!("{:?}", results); Ok(()) }
[package] name = "wasmtime-wrapper" version = "0.1.0" edition = "2021" # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html [dependencies] # wasmtime = "18.0.3" wasmtime = { path = "../wasmtime/crates/wasmtime" } # commit: 3deaa8780bbef21648f2d9eebabcbae909c4a090(current latest, Date: Mon Mar 18 18:03:37 2024 -0700)
Steps to reproduce
Compare the following executions:
cargo run --release cargo run --release --target=aarch64-unknown-linux-gnu cargo run --release --target=s390x-unknown-linux-gnu ----- cargo run --release --target=riscv64gc-unknown-linux-gnu
QEMU run options I'm currently using are the following:
qemu-aarch64 -L /usr/aarch64-linux-gnu -E LD_LIBRARY_PATH=/usr/aarch64-linux-gnu/lib -E WASMTIME_TEST_NO_HOG_MEMORY=1 target/aarch64-unknown-linux-gnu/release/wasmtime-wrapper qemu-s390x -L /usr/s390x-linux-gnu -E LD_LIBRARY_PATH=/usr/s390x-linux-gnu/lib -E WASMTIME_TEST_NO_HOG_MEMORY=1 target/s390x-unknown-linux-gnu/release/wasmtime-wrapper qemu-riscv64 -cpu rv64,v=true,vlen=128,vext_spec=v1.0,zba=true,zbb=true,zbs=true,zbc=true,zbkb=true,zcb=true,zicond=true -L /usr/riscv64-linux-gnu -E LD_LIBRARY_PATH=/usr/riscv64-linux-gnu/lib -E WASMTIME_TEST_NO_HOG_MEMORY=1 target/riscv64gc-unknown-linux-gnu/release/wasmtime-wrapper
Expected Results
Results from all architecture should be the same value of +NaN or -NaN (with
cranelift_nan_canonicalization
option)Actual Results
Execution results in a negative canonicalized NaN value,
0xffc00000
(regardless ofcranelift_nan_canonicalization
option)Opt level None: Ok(()) [F32(4290772992)]
However, RISC-V64 shows the result
0x7fc00000
, which is a positive canonicalized NaN value:Opt level None: Ok(()) [F32(2143289344)]
Versions and Environment
- wasmtime version
- commit: 3deaa8780bbef21648f2d9eebabcbae909c4a090 (current latest, Date: Mon Mar 18 18:03:37 2024 -0700)
- However, also checked on v18.0.3
- Operating system & architecture: Ubuntu 22.04.3 LTS, Arch: x86_64
- QEMU version: all architectures (aarch64, s390x, riscv64) in
version 8.2.1 (v8.2.1)
Extra Info
- As both results are canonicalized versions of NaN (and conforms to spec), I think this might not be a bug - however, this is worth checking
- Specification of
f64.copysign
: https://webassembly.github.io/spec/core/exec/numerics.html#xref-exec-numerics-op-fcopysign-mathrm-fcopysign-n-z-1-z-2cranelift_nan_canonicalization
option doesn't change the result- Uploading as public (not a security vuln.) since the bug only exhibits a semantic difference and I don't think this can cause a security impact. Also, this is related to
cranelift_nan_canonicalization
flag, which isfalse
by default.
alexcrichton commented on issue #8179:
Thanks (as always!) @candymate!
@sunfishcode I'm wondering if you've got some historical context around the nan canonicalization pass in Cranelift? (given the original implementation in https://github.com/bytecodealliance/cranelift/pull/322 and issue in https://github.com/bytecodealliance/cranelift/issues/311, sorry for the blast from the past)
We've got a small set of opcodes that receive the canonicalization sequence with the nan canonicalization pass but it seems that it's intentionally not a comprehensive set of opcodes which work with floats. Given this issue I'm wondering if https://github.com/bytecodealliance/wasmtime/pull/8146 was perhaps the wrong way to fix that issue? Or is the "answer" here to add the canonicalization sequence to all float-related operations?
Put another way, it seems that https://github.com/bytecodealliance/cranelift/pull/322 was built with the assumption that most platforms behaved the same way with respect to most float operations and NaN except for the ones that are listed in canonicalization. For example x64, aarch64, and s390x all do the same thing above where they faithfully copy the sign bit into a NaN payload. Does riscv64 violate the original assumptions of the nan canonicalization pass then? Given https://github.com/bytecodealliance/wasmtime/issues/7322 and my read of risc-v-the-spec I think that canonical NaNs always pop out, so it seems to me that we'll need the canonicalization sequence for all float operations instead of just a small set?
alexcrichton commented on issue #8179:
Or, alternatively you mention
For these purposes, fneg, fabs, and fcopysign are not considered to count as "arithmetic" instructions, because they're defined to just operate in terms of the sign bit and leave all other bits alone.
So is the answer to consider the riscv backend buggy here and requiring a different codegen pattern for these instructions?
sunfishcode commented on issue #8179:
IEEE 754 defines abs, neg, and copysign to be "Sign bit operations" that only modify the sign bit. They're fully deterministic, so not only do we not need to canonicalize them; we shouldn't, because they're supposed to preserve arbitrary NaN payloads.
fitzgen commented on issue #8179:
It would be nice, sometime, to revise our NaN canonicalization pass to only canonicalize NaNs when floats cross an "observability boundary" rather than after each possibly-NaN-producing operation. So we would canonicalize when
- converting a float to an int
- storing a float to memory (wasm linear memory or the memory location of a wasm global definition)
- returning a float
and I think that is it?
Might be easier to exhaustively list these boundaries than it is to exhaustively list individual operations that can produce NaNs. Probably also results in better generated code too, since we are generally doing the minimum amount of work.
sunfishcode commented on issue #8179:
@fitzgen also passing a float (to an exported function), and
global.set
to an exported global. And yeah, we should do that; the only reason I didn't do that before is that, at the time, I was just looking to do the simplest possible thing that would work.@alexcrichton RISC-V has a copysign instruction and it looks like cranelift uses it. My guess here is that something's going on with demote. Could this be the same issue as https://github.com/bytecodealliance/wasmtime/issues/7322?
sunfishcode edited a comment on issue #8179:
@fitzgen also passing a float (to an exported function), and
global.set
to an exported global. And yeah, we should do that; the only reason I didn't do that before is that, at the time, I was just looking to do the simplest possible thing that would work.@alexcrichton RISC-V has a copysign instruction and it looks like cranelift uses it. My guess here is that something's going on with demote. Could this be the same issue as https://github.com/bytecodealliance/wasmtime/issues/8145?
Edit: copy and past the right issue number
fitzgen edited a comment on issue #8179:
It would be nice, sometime, to revise our NaN canonicalization pass to only canonicalize NaNs when floats cross an "observability boundary" rather than after each possibly-NaN-producing operation. So we would canonicalize when
- converting a float to an int
- storing a float to memory (wasm linear memory or the memory location of an (imported or exported) wasm global definition)
- returning a float
- passing a float as an argument
and I think that is it?
Might be easier to exhaustively list these boundaries than it is to exhaustively list individual operations that can produce NaNs. Probably also results in better generated code too, since we are generally doing the minimum amount of work.
sunfishcode commented on issue #8179:
@fitzgen Also several things involving
v128
, because it doesn't statically distinguish between float and int.
fitzgen commented on issue #8179:
Ah, that's annoying, because now we are back to enumerating individual ops :-/
alexcrichton commented on issue #8179:
My guess here is that something's going on with demote
I think you're right yeah, I opened https://github.com/bytecodealliance/wasmtime/pull/8182 which adds canonicalization for promote/demote since I think that's the source of the nondeterminism here.
alexcrichton closed issue #8179:
Test Case
// main.rs use wasmtime::*; fn main() -> Result<()> { let mut config = Config::default(); config.strategy(Strategy::Cranelift); config.cranelift_opt_level(OptLevel::None); config.cranelift_nan_canonicalization(true); // doesn't matter: both options show the same result let engine = Engine::new(&config)?; let wat = r#" (module (type (;0;) (func (param f64) (result f32))) (import "mem" "mem" (memory (;0;) 1)) (func (;0;) (type 0) (param f64) (result f32) local.get 0 f64.const -0x1 ;; anything with negative sign f64.copysign f32.demote_f64) (export "main" (func 0))) "#; let module = Module::new(&engine, wat)?; let mut store = Store::new(&engine, ()); let memory_ty = MemoryType::new(1, None); let memory = Memory::new(&mut store, memory_ty.clone())?; let instance = Instance::new(&mut store, &module, &[memory.into()])?; let main = instance.get_func(&mut store, "main") .expect("`main` was not an exported function"); let params = vec![ Val::F64(0x7ff8000000000000), // +nan (canonicalized) ]; let mut results = vec![Val::F32(0)]; println!("Opt level None: {:?}", main.call( &mut store, ¶ms, &mut results )); println!("{:?}", results); Ok(()) }
[package] name = "wasmtime-wrapper" version = "0.1.0" edition = "2021" # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html [dependencies] # wasmtime = "18.0.3" wasmtime = { path = "../wasmtime/crates/wasmtime" } # commit: 3deaa8780bbef21648f2d9eebabcbae909c4a090(current latest, Date: Mon Mar 18 18:03:37 2024 -0700)
Steps to reproduce
Compare the following executions:
cargo run --release cargo run --release --target=aarch64-unknown-linux-gnu cargo run --release --target=s390x-unknown-linux-gnu ----- cargo run --release --target=riscv64gc-unknown-linux-gnu
QEMU run options I'm currently using are the following:
qemu-aarch64 -L /usr/aarch64-linux-gnu -E LD_LIBRARY_PATH=/usr/aarch64-linux-gnu/lib -E WASMTIME_TEST_NO_HOG_MEMORY=1 target/aarch64-unknown-linux-gnu/release/wasmtime-wrapper qemu-s390x -L /usr/s390x-linux-gnu -E LD_LIBRARY_PATH=/usr/s390x-linux-gnu/lib -E WASMTIME_TEST_NO_HOG_MEMORY=1 target/s390x-unknown-linux-gnu/release/wasmtime-wrapper qemu-riscv64 -cpu rv64,v=true,vlen=128,vext_spec=v1.0,zba=true,zbb=true,zbs=true,zbc=true,zbkb=true,zcb=true,zicond=true -L /usr/riscv64-linux-gnu -E LD_LIBRARY_PATH=/usr/riscv64-linux-gnu/lib -E WASMTIME_TEST_NO_HOG_MEMORY=1 target/riscv64gc-unknown-linux-gnu/release/wasmtime-wrapper
Expected Results
Results from all architecture should be the same value of +NaN or -NaN (with
cranelift_nan_canonicalization
option)Actual Results
Execution results in a negative canonicalized NaN value,
0xffc00000
(regardless ofcranelift_nan_canonicalization
option)Opt level None: Ok(()) [F32(4290772992)]
However, RISC-V64 shows the result
0x7fc00000
, which is a positive canonicalized NaN value:Opt level None: Ok(()) [F32(2143289344)]
Versions and Environment
- wasmtime version
- commit: 3deaa8780bbef21648f2d9eebabcbae909c4a090 (current latest, Date: Mon Mar 18 18:03:37 2024 -0700)
- However, also checked on v18.0.3
- Operating system & architecture: Ubuntu 22.04.3 LTS, Arch: x86_64
- QEMU version: all architectures (aarch64, s390x, riscv64) in
version 8.2.1 (v8.2.1)
Extra Info
- As both results are canonicalized versions of NaN (and conforms to spec), I think this might not be a bug - however, this is worth checking
- Specification of
f64.copysign
: https://webassembly.github.io/spec/core/exec/numerics.html#xref-exec-numerics-op-fcopysign-mathrm-fcopysign-n-z-1-z-2cranelift_nan_canonicalization
option doesn't change the result- Uploading as public (not a security vuln.) since the bug only exhibits a semantic difference and I don't think this can cause a security impact. Also, this is related to
cranelift_nan_canonicalization
flag, which isfalse
by default.
Last updated: Nov 22 2024 at 17:03 UTC