abrown labeled issue #4786:
Test Case
No test case produced.
Steps to Reproduce
$ ALLOWED_ENGINES=-spec,-wasmi cargo +nightly fuzz run differential -s none
Run for enough time to crash.
Expected Results
Not to crash.
Actual Results
... === Execution rate (1023 successes / 6000 attempted modules): 17.05% === wasmi: 0.00%, spec: 0.00%, wasmtime: 87.70%, v8: 12.30% wasm-smith: 44.81%, single-inst: 55.19% ... #6851 NEW cov: 20559 ft: 63794 corp: 1188/146Kb lim: 170 exec/s: 360 rss: 106Mb L: 138/170 MS: 2 CopyPart-CopyPart- #6862 NEW cov: 20559 ft: 63795 corp: 1189/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 132/170 MS: 1 CopyPart- #6864 NEW cov: 20560 ft: 63796 corp: 1190/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 103/170 MS: 2 EraseBytes-ChangeBit- #6869 NEW cov: 20563 ft: 63809 corp: 1191/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 169/170 MS: 5 ShuffleBytes-InsertRepeatedBytes-InsertRepeatedBytes-ShuffleBytes-PersAutoDict- DE: "\xff\xff\xff\x07"- #6988 NEW cov: 20563 ft: 63838 corp: 1192/146Kb lim: 170 exec/s: 349 rss: 106Mb L: 129/170 MS: 4 InsertRepeatedBytes-ChangeBit-EraseBytes-InsertRepeatedBytes- ──────────────────────────────────────────────────────────────────────────────── Error: Fuzz target exited with signal: 11 (core dumped)
Versions and Environment
Wasmtime version or commit:
main
Operating system: Fedora 35
Architecture: x86_64
abrown opened issue #4786:
Test Case
No test case produced.
Steps to Reproduce
$ ALLOWED_ENGINES=-spec,-wasmi cargo +nightly fuzz run differential -s none
Run for enough time to crash.
Expected Results
Not to crash.
Actual Results
... === Execution rate (1023 successes / 6000 attempted modules): 17.05% === wasmi: 0.00%, spec: 0.00%, wasmtime: 87.70%, v8: 12.30% wasm-smith: 44.81%, single-inst: 55.19% ... #6851 NEW cov: 20559 ft: 63794 corp: 1188/146Kb lim: 170 exec/s: 360 rss: 106Mb L: 138/170 MS: 2 CopyPart-CopyPart- #6862 NEW cov: 20559 ft: 63795 corp: 1189/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 132/170 MS: 1 CopyPart- #6864 NEW cov: 20560 ft: 63796 corp: 1190/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 103/170 MS: 2 EraseBytes-ChangeBit- #6869 NEW cov: 20563 ft: 63809 corp: 1191/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 169/170 MS: 5 ShuffleBytes-InsertRepeatedBytes-InsertRepeatedBytes-ShuffleBytes-PersAutoDict- DE: "\xff\xff\xff\x07"- #6988 NEW cov: 20563 ft: 63838 corp: 1192/146Kb lim: 170 exec/s: 349 rss: 106Mb L: 129/170 MS: 4 InsertRepeatedBytes-ChangeBit-EraseBytes-InsertRepeatedBytes- ──────────────────────────────────────────────────────────────────────────────── Error: Fuzz target exited with signal: 11 (core dumped)
Versions and Environment
Wasmtime version or commit:
main
Operating system: Fedora 35
Architecture: x86_64
abrown commented on issue #4786:
cc: @alexcrichton, guess that crash we saw wasn't the
spec
interpreter after all.
abrown labeled issue #4786:
Test Case
No test case produced.
Steps to Reproduce
$ ALLOWED_ENGINES=-spec,-wasmi cargo +nightly fuzz run differential -s none
Run for enough time to crash.
Expected Results
Not to crash.
Actual Results
... === Execution rate (1023 successes / 6000 attempted modules): 17.05% === wasmi: 0.00%, spec: 0.00%, wasmtime: 87.70%, v8: 12.30% wasm-smith: 44.81%, single-inst: 55.19% ... #6851 NEW cov: 20559 ft: 63794 corp: 1188/146Kb lim: 170 exec/s: 360 rss: 106Mb L: 138/170 MS: 2 CopyPart-CopyPart- #6862 NEW cov: 20559 ft: 63795 corp: 1189/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 132/170 MS: 1 CopyPart- #6864 NEW cov: 20560 ft: 63796 corp: 1190/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 103/170 MS: 2 EraseBytes-ChangeBit- #6869 NEW cov: 20563 ft: 63809 corp: 1191/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 169/170 MS: 5 ShuffleBytes-InsertRepeatedBytes-InsertRepeatedBytes-ShuffleBytes-PersAutoDict- DE: "\xff\xff\xff\x07"- #6988 NEW cov: 20563 ft: 63838 corp: 1192/146Kb lim: 170 exec/s: 349 rss: 106Mb L: 129/170 MS: 4 InsertRepeatedBytes-ChangeBit-EraseBytes-InsertRepeatedBytes- ──────────────────────────────────────────────────────────────────────────────── Error: Fuzz target exited with signal: 11 (core dumped)
Versions and Environment
Wasmtime version or commit:
main
Operating system: Fedora 35
Architecture: x86_64
alexcrichton commented on issue #4786:
I haven't been able to reproduce this locally unfortunately. When I let the fuzzer run long enough it hit https://github.com/bytecodealliance/wasmtime/issues/4812 before hitting this fault.
Can you try running the fuzzer in
gdb
to see if you can grab a stack trace of the fault? (presuming it doesn't take too long to fault). When usinggdb
though it's difficult b/c segfaults can normally happen for wasm and there's no great way to handle this. I was able to dohandle SIGILL nostop
locally in gdb but if you dohandle SIGSEGV nostop
I think that when it actually hits the crashing failure it lets the process exit which is no good, so you may have to babysit it a bit and double-check thatSIGSEGV
doesn't look like a normal wasm SIGSEGV (rr
might be super helpful here in this regard)
bjorn3 commented on issue #4786:
Maybe you can do
handle SIGSEGV nostop
but also add a breakpoint where the signal handler propagates the SIGSEGV if it didn't originate from the wasm module?
alexcrichton commented on issue #4786:
I've been unable to fuzz with v8 locally for the longest time so I tried to dig into this today. Turns out it's an issue with memory protection keys so this'll only affect new enough x64 hardware with new enough glibc. That I think would explain why Andrew initially ran into this and I couldn't reproduce. I think that would also help explain why it hasn't showed up on oss-fuzz at all despite being trivial to reproduce locally.
My best read of the situation is that if you run with v8 and differential fuzzing long enough locally it will eventually crash. This crash, when I've inspected it, always looks like this:
Thread 1 "differential" received signal SIGSEGV, Segmentation fault. 0x00002061775c8170 in ?? () (gdb) disas $rip,$rip+20 Dump of assembler code from 0x2061775c8170 to 0x2061775c8184: => 0x00002061775c8170: jmp *0x2(%rip) # 0x2061775c8178
The memory being accessed is just below
%rip
itself, and the memory address of 0x00002061775c8178 is indeed readable in gdb. This is in fact an indirect jump toBuiltins_WasmLiftoffFrameSetup
all the time. The problem is that if you print$_siginfo.si_code
in gdb it yields 4 which corresponds toSEGV_PKUERR
.So the problem here is that JIT code isn't allowed to read this code due to MPK and the status of the PKRU. The corruption here seems to be:
- V8 looks like it uses a single pkey at most.
- This is entirely unrelated to Wasmtime's PKRU support, if that's disabled the issue still happens.
- V8 assumes that JIT code executes with pkey 1.
- V8 uses pkeys to disallow writes to JIT code with pkey 1.
- V8 assumes a steady-state of pkey 1 is accessible (executable/readable) but not writable
- V8, when it modifies JIT code, frobs the PKRU to teporarily allow writes
This is all fine and dandy and Wasmtime shouldn't interfere with it, except it does. The problem is apparently signal handlers (it's always signal handlers). V8 assumes a steady-state PKRU register of 0x5...58. That means no access to keys >=2, readonly/executable access to pkey 1, and all access to pkey 0. When a signal is received, however, it resets the PKRU register to 0x5..54. This notably disables access to all keys >=1, disagreeing with V8's steady state.
The problem here is that when Wasmtime executes, hits a signal, recovers, and goes back into V8, now V8 is in a corrupt state. V8 thinks JIT code can access pkey 1, but it in fact cannot. A reproduction of this problem can be found in this gist which uses the
v8
crate to run some code, then a segfault is generated and caught, then code is run again. This second execution crashes.
Ok so what to do about this. I found that
v8::V8::set_flags_from_string("--no-memory_protection_keys")
looks promising. That fixes the standalone test case here but it causes V8 to segfault in a different location when passed during fuzzing. According to the man page about pkeys signal handlers need to manually restore the state after they're received. Looks likesig{set,long}jmp
don't solve this automatically.I have run out of ideas of how to fix this. If OSS-Fuzz picks up hardware with MPK we'll probably have to disable differential execution with v8 if nothing else changes.
alexcrichton commented on issue #4786:
That fixes the standalone test case here but it causes V8 to segfault in a different location when passed during fuzzing.
Looks like this flag doesn't fully disable MPK. The crash is in V8 with
si_code = 4
which shows pkey misconfiguration again. So looks like that flag only partially disables MPK.
Last updated: Dec 23 2024 at 12:05 UTC