candymate opened issue #8112:
Test Case
// main.rs use wasmtime::*; fn main() -> Result<()> { let mut config = Config::default(); config.strategy(Strategy::Cranelift); config.cranelift_opt_level(OptLevel::None); let engine1 = Engine::new(&config)?; let mut new_config = config.clone(); new_config.cranelift_opt_level(OptLevel::Speed); let engine2 = Engine::new(&new_config)?; let wat = r#" (module (type (;0;) (func (param i32 v128 v128) (result f64))) (import "mem" "mem" (memory (;0;) 1)) (func (;0;) (type 0) (param i32 v128 v128) (result f64) (local f64) local.get 0 local.get 1 local.get 2 i8x16.extract_lane_s 11 i16x8.shl v128.store16_lane align=1 1 local.get 0 i32.load align=1 f64.load align=1 f64.const 0x1.fffffffep+31 (;=4.29497e+09;) i32.const 1 select) (export "main" (func 0))) "#; let module1 = Module::new(&engine1, wat)?; let module2 = Module::new(&engine2, wat)?; let mut store1 = Store::new(&engine1, ()); let mut store2 = Store::new(&engine2, ()); let memory_ty = MemoryType::new(1, None); let memory1 = Memory::new(&mut store1, memory_ty.clone())?; let memory2 = Memory::new(&mut store2, memory_ty)?; let instance1 = Instance::new(&mut store1, &module1, &[memory1.into()])?; let instance2 = Instance::new(&mut store2, &module2, &[memory2.into()])?; let main1 = instance1.get_func(&mut store1, "main") .expect("`main` was not an exported function"); let main2 = instance2.get_func(&mut store2, "main") .expect("`main` was not an exported function"); let params = vec![ Val::I32(4097), Val::V128(0x10203cccdcecf807f7e7dfffefdfc.into()), Val::V128(0xd7a950c126c2ce62f786d3c83f7914e.into()), ]; let mut results1 = vec![Val::null()]; let mut results2 = vec![Val::null()]; println!("Opt level None: {:?}", main1.call( &mut store1, ¶ms, &mut results1 )); println!("--------------"); println!("Opt level Speed: {:?}", main2.call( &mut store2, ¶ms, &mut results2 )); Ok(()) }
[package] name = "wasmtime-wrapper" version = "0.1.0" edition = "2021" # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html [dependencies] wasmtime = "18.0.3"
Steps to Reproduce
cargo run --release
Expected Results
Both results from none and speed optimizations should show the same result (both in error or successful execution)
Actual Results
Opt level None: Err(error while executing at wasm backtrace: 0: 0x59 - <unknown>!<wasm function 0> Caused by: 0: memory fault at wasm address 0x10000 in linear memory of size 0x10000 1: wasm trap: out of bounds memory access) -------------- Opt level Speed: Ok(())
Versions and Environment
- wasmtime version: v18.0.3 (but also on v18.0.2)
- Operating system & architecture: Ubuntu 22.04.3 LTS, Arch: x86_64
Extra Info
- The test case is not fully minimized - need to minimize more
- Does not work on other architectures (arm64, s390x, riscv64)
cfallin commented on issue #8112:
I'm able to reproduce on
main
and confirm that on x86-64, we see the OOB on no-opt and successful execution on opt. On aarch64, we see successful execution in both cases.The CLIF before opt is
function u0:0(i64 vmctx, i64, i32, i8x16, i8x16) -> f64 fast { gv0 = vmctx gv1 = load.i64 notrap aligned readonly gv0+8 gv2 = load.i64 notrap aligned gv1 gv3 = vmctx gv4 = load.i64 notrap aligned readonly gv3+72 gv5 = load.i64 notrap aligned readonly checked gv4 sig0 = (i64 vmctx, i32 uext, i32 uext, i32 uext) -> i32 uext system_v sig1 = (i64 vmctx, i32 uext, i32 uext, i32 uext) -> i32 uext system_v sig2 = (i64 vmctx, i32 uext, i32 uext) -> i32 uext system_v sig3 = (i64 vmctx, i32 uext) -> i32 uext system_v stack_limit = gv2 block0(v0: i64, v1: i64, v2: i32, v3: i8x16, v4: i8x16): @0033 v6 = f64const 0.0 @003b v7 = extractlane v4, 11 @003b v8 = sextend.i32 v7 @003e v9 = bitcast.i16x8 little v3 @003e v10 = ishl v9, v8 @0041 v11 = extractlane v10, 1 @0041 v12 = uextend.i64 v2 @0041 v13 = global_value.i64 gv5 @0041 v14 = iadd v13, v12 @0041 store little heap v11, v14 @0048 v15 = uextend.i64 v2 @0048 v16 = global_value.i64 gv5 @0048 v17 = iadd v16, v15 @0048 v18 = load.i32 little heap v17 @004b v19 = uextend.i64 v18 @004b v20 = global_value.i64 gv5 @004b v21 = iadd v20, v19 @004b v22 = load.f64 little heap v21 @004e v23 = f64const 0x1.fffffffe00000p31 @0057 v24 = iconst.i32 1 @0059 v25 = select v24, v22, v23 ; v24 = 1, v23 = 0x1.fffffffe00000p31 @005a jump block1(v25) block1(v5: f64): @005a return v5 }
and after opt is
function u0:0(i64 vmctx, i64, i32, i8x16, i8x16) -> f64 fast { gv0 = vmctx gv1 = load.i64 notrap aligned readonly gv0+8 gv2 = load.i64 notrap aligned gv1 gv3 = vmctx gv4 = load.i64 notrap aligned readonly gv3+72 gv5 = load.i64 notrap aligned readonly checked gv4 sig0 = (i64 vmctx, i32 uext, i32 uext, i32 uext) -> i32 uext system_v sig1 = (i64 vmctx, i32 uext, i32 uext, i32 uext) -> i32 uext system_v sig2 = (i64 vmctx, i32 uext, i32 uext) -> i32 uext system_v sig3 = (i64 vmctx, i32 uext) -> i32 uext system_v stack_limit = gv2 block0(v0: i64, v1: i64, v2: i32, v3: i8x16, v4: i8x16): v27 -> v0 v29 -> v0 v31 -> v0 @003e v9 = bitcast.i16x8 little v3 @003b v7 = extractlane v4, 11 v32 = ishl v9, v7 @0041 v11 = extractlane v32, 1 @0041 v26 = load.i64 notrap aligned readonly v0+72 @0041 v13 = load.i64 notrap aligned readonly checked v26 @0041 v12 = uextend.i64 v2 @0041 v14 = iadd v13, v12 @0041 store little heap v11, v14 @0048 v18 = load.i32 little heap v14 @004b v19 = uextend.i64 v18 @004b v21 = iadd v13, v19 @004b v22 = load.f64 little heap v21 @005a jump block1 block1: @005a return v22 }
The faulting access is the last load. The no-opt x86-64 machine code is
0000000000000000 <wasm[0]::function[0]>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 4c 8b 57 08 mov 0x8(%rdi),%r10 8: 4d 8b 12 mov (%r10),%r10 b: 49 39 e2 cmp %rsp,%r10 e: 0f 87 67 00 00 00 ja 7b <wasm[0]::function[0]+0x7b> 14: c4 c3 79 14 c9 0b vpextrb $0xb,%xmm1,%r9d 1a: 41 0f be c9 movsbl %r9b,%ecx 1e: 48 83 e1 0f and $0xf,%rcx 22: c5 f9 6e c9 vmovd %ecx,%xmm1 26: c5 f9 f1 d1 vpsllw %xmm1,%xmm0,%xmm2 2a: 44 8b ca mov %edx,%r9d 2d: 4c 8b 57 48 mov 0x48(%rdi),%r10 31: 4d 8b 12 mov (%r10),%r10 34: c4 83 79 15 14 0a 01 vpextrw $0x1,%xmm2,(%r10,%r9,1) 3b: 44 8b d2 mov %edx,%r10d 3e: 4c 8b 5f 48 mov 0x48(%rdi),%r11 42: 4d 8b 1b mov (%r11),%r11 45: 47 8b 14 13 mov (%r11,%r10,1),%r10d 49: 4c 8b 5f 48 mov 0x48(%rdi),%r11 4d: 4d 8b 1b mov (%r11),%r11 50: 49 b9 00 00 e0 ff ff movabs $0x41efffffffe00000,%r9 57: ff ef 41 5a: c4 c1 f9 6e c1 vmovq %r9,%xmm0 5f: be 01 00 00 00 mov $0x1,%esi 64: f3 43 0f 6f 0c 13 movdqu (%r11,%r10,1),%xmm1 6a: 85 f6 test %esi,%esi 6c: 0f 84 04 00 00 00 je 76 <wasm[0]::function[0]+0x76> 72: f2 0f 10 c1 movsd %xmm1,%xmm0 76: 48 89 ec mov %rbp,%rsp 79: 5d pop %rbp 7a: c3 retq 7b: 0f 0b ud2
and note in particular that the
load.f64
is translated to amovdqu
. Capturing the SIGSEGV in gdb I see the offset (r10
here) as0xfff8
confirming that aload.f64
would have succeeded while a 128-bit load indeed goes OOB. It seems that this should only be possible with this lowering rule but that explicitly guards withty_vec128
.That's as far as I've gotten tonight; cc @abrown maybe due to weird vector lowering issue?
alexcrichton commented on issue #8112:
Jinx @cfallin! Had this written up when I saw yours come in as well, so I think this can add some extra information:
Thanks for the report! I believe that the opt-level=speed behavior is correct here and I believe that Wasmtime has a bug in its x64 backend which affects the unoptimized lowering. I'm going to place some more details in this issue but @candymate we'll take it from here, this should be an easy fix and my initial thinking is that this shouldn't impact much else in the system.
It appears that with this input:
;; foo.clif function %a(i32, f64, i64) -> f64 { block0(v0: i32, v1: f64, v2: i64): v3 = load.f64 v2 v4 = select v0, v3, v1 return v4 }
Cranelift generates:
$ cargo run compile ./foo.clif --target x86_64 -D ... Disassembly of 25 bytes: 0: 55 pushq %rbp 1: 48 89 e5 movq %rsp, %rbp 4: f3 0f 6f 26 movdqu (%rsi), %xmm4 ;; this instruction is a 16-byte load, not 8-byte load 8: 85 ff testl %edi, %edi a: 0f 84 04 00 00 00 je 0x14 10: f2 0f 10 c4 movsd %xmm4, %xmm0 14: 48 89 ec movq %rbp, %rsp 17: 5d popq %rbp 18: c3 retq
The bug here is at offset 0x4, namely that
movdqu
is used to load a 64-bit floating-point value. This instruction actually loads 16-bytes instead of 8. This affects the WebAssembly you pasted because the address being loaded from is 8 away from the end of memory, so an 8-byte load is correct but a 16-byte load runs off the edge of memory, triggering a fault.For runnable reproductions, @candymate's original test case can be turned into:
(module $mem (memory (export "mem") 1)) (module (import "mem" "mem" (memory 1)) (func (export "main") (param i32 v128 v128) (result f64) (local f64) local.get 0 ;; 4097 local.get 1 local.get 2 i8x16.extract_lane_s 11 ;; => 0x83 i16x8.shl ;; => param1 << 3 v128.store16_lane align=1 1 ;; store 16-bits at 4097 local.get 0 ;; 4097 i32.load align=1 ;; load previous 16-bits f64.load align=1 ;; load again f64.const 0x1.fffffffep+31 (;=4.29497e+09;) i32.const 1 select)) (assert_return (invoke "main" (i32.const 4097) (v128.const i64x2 0x807f7e7dfffefdfc 0x10203_cccd_cecf) (v128.const i64x2 0x2f786d3c83f7914e 0xd7a950c126c2ce6) ) (f64.const 0))
and run with:
$ wasmtime wast -O opt-level=0 foo.wast $ wasmtime wast foo.wast
In WebAssembly this can be minimized to:
(module (memory 1) (func (export "main") (param i32 f64 i32) (result f64) local.get 2 f64.load f64.const 0 local.get 0 select return)) (assert_return (invoke "main" (i32.const 1) (f64.const 1) (i32.const 0xfff8) ) (f64.const 0))
which fails both with and without optimizations as it hits the buggy lowering rule.
I believe the bug is here-ish because
XmmCmove
which is used for theselect
has an aligned argument, and the fallback to eagerly load the bytes as they may be unaligned doesn't take size into account and loads the full xmm register.
cfallin commented on issue #8112:
Yep I just found the same :-) About to put up a PR.
cfallin commented on issue #8112:
In a little more detail, my diagnosis is: the
XmmCmove
pseudo-inst is fine as a 16-byte move; XMMs are 128 bits indeed; the issue is that thecmove_from_values
helper, which is used by lowerings ofselect
, has a case here where it creates theXmmCmove
while passing theValue
through to anXmmMemAligned
which allows load fusion via the available implicit conversions, guarding only byis_xmm_type
.is_xmm_type
is true even for$F64
because floats live in XMMs, though they don't take the full width. So we have an incorrect implicit widening. My plan is to disallow load fusion in this case.
cfallin closed issue #8112:
Test Case
// main.rs use wasmtime::*; fn main() -> Result<()> { let mut config = Config::default(); config.strategy(Strategy::Cranelift); config.cranelift_opt_level(OptLevel::None); let engine1 = Engine::new(&config)?; let mut new_config = config.clone(); new_config.cranelift_opt_level(OptLevel::Speed); let engine2 = Engine::new(&new_config)?; let wat = r#" (module (type (;0;) (func (param i32 v128 v128) (result f64))) (import "mem" "mem" (memory (;0;) 1)) (func (;0;) (type 0) (param i32 v128 v128) (result f64) (local f64) local.get 0 local.get 1 local.get 2 i8x16.extract_lane_s 11 i16x8.shl v128.store16_lane align=1 1 local.get 0 i32.load align=1 f64.load align=1 f64.const 0x1.fffffffep+31 (;=4.29497e+09;) i32.const 1 select) (export "main" (func 0))) "#; let module1 = Module::new(&engine1, wat)?; let module2 = Module::new(&engine2, wat)?; let mut store1 = Store::new(&engine1, ()); let mut store2 = Store::new(&engine2, ()); let memory_ty = MemoryType::new(1, None); let memory1 = Memory::new(&mut store1, memory_ty.clone())?; let memory2 = Memory::new(&mut store2, memory_ty)?; let instance1 = Instance::new(&mut store1, &module1, &[memory1.into()])?; let instance2 = Instance::new(&mut store2, &module2, &[memory2.into()])?; let main1 = instance1.get_func(&mut store1, "main") .expect("`main` was not an exported function"); let main2 = instance2.get_func(&mut store2, "main") .expect("`main` was not an exported function"); let params = vec![ Val::I32(4097), Val::V128(0x10203cccdcecf807f7e7dfffefdfc.into()), Val::V128(0xd7a950c126c2ce62f786d3c83f7914e.into()), ]; let mut results1 = vec![Val::null()]; let mut results2 = vec![Val::null()]; println!("Opt level None: {:?}", main1.call( &mut store1, ¶ms, &mut results1 )); println!("--------------"); println!("Opt level Speed: {:?}", main2.call( &mut store2, ¶ms, &mut results2 )); Ok(()) }
[package] name = "wasmtime-wrapper" version = "0.1.0" edition = "2021" # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html [dependencies] wasmtime = "18.0.3"
Steps to Reproduce
cargo run --release
Expected Results
Both results from none and speed optimizations should show the same result (both in error or successful execution)
Actual Results
Opt level None: Err(error while executing at wasm backtrace: 0: 0x59 - <unknown>!<wasm function 0> Caused by: 0: memory fault at wasm address 0x10000 in linear memory of size 0x10000 1: wasm trap: out of bounds memory access) -------------- Opt level Speed: Ok(())
Versions and Environment
- wasmtime version: v18.0.3 (but also on v18.0.2)
- Operating system & architecture: Ubuntu 22.04.3 LTS, Arch: x86_64
Extra Info
- The test case is not fully minimized - need to minimize more
- Does not work on other architectures (arm64, s390x, riscv64)
fitzgen commented on issue #8112:
Idle thought: it is worth thinking about what kinds of fuzzing we can do help uncover these sorts of bugs. It intuitively seems like doing various operations on the last word of memory is "likely" to turn up "interesting" things.
alexcrichton commented on issue #8112:
Also, if you don't mind me asking, @candymate how did the test case from above come about? Was it a reduction from a larger application or perhaps fuzz-generated? If fuzz-generated and you're willing we'd be interested to learn how and possibly integrate it with Wasmtime's preexisting fuzzing as well
candymate commented on issue #8112:
Also, if you don't mind me asking, @candymate how did the test case from above come about? Was it a reduction from a larger application or perhaps fuzz-generated? If fuzz-generated and you're willing we'd be interested to learn how and possibly integrate it with Wasmtime's preexisting fuzzing as well
The test case is generated from the fuzzer we are currently developing. However, I cannot disclose details about it for now. Hopefully, I can disclose the details of the fuzzer within this year, but I'm not sure about the date.
candymate edited a comment on issue #8112:
Also, if you don't mind me asking, @candymate how did the test case from above come about? Was it a reduction from a larger application or perhaps fuzz-generated? If fuzz-generated and you're willing we'd be interested to learn how and possibly integrate it with Wasmtime's preexisting fuzzing as well
The test case is generated from the fuzzer we are currently developing. However, I cannot disclose details about it for now. Hopefully, I can disclose the details within this year, but I'm not sure about the date.
alexcrichton commented on issue #8112:
Oh nice! Sounds good, and we'll definitely be eager to read more about it once details are available!
alexcrichton added the fuzz-bug label to Issue #8112.
alexcrichton added the cranelift:area:x64 label to Issue #8112.
Last updated: Dec 23 2024 at 12:05 UTC