Stream: git-wasmtime

Topic: wasmtime / issue #4786 fuzz: differential V8 engine occas...


view this post on Zulip Wasmtime GitHub notifications bot (Aug 25 2022 at 19:47):

abrown labeled issue #4786:

Test Case

No test case produced.

Steps to Reproduce

$ ALLOWED_ENGINES=-spec,-wasmi cargo +nightly fuzz run differential -s none

Run for enough time to crash.

Expected Results

Not to crash.

Actual Results

...
=== Execution rate (1023 successes / 6000 attempted modules): 17.05% ===
        wasmi: 0.00%, spec: 0.00%, wasmtime: 87.70%, v8: 12.30%
        wasm-smith: 44.81%, single-inst: 55.19%
...
#6851   NEW    cov: 20559 ft: 63794 corp: 1188/146Kb lim: 170 exec/s: 360 rss: 106Mb L: 138/170 MS: 2 CopyPart-CopyPart-
#6862   NEW    cov: 20559 ft: 63795 corp: 1189/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 132/170 MS: 1 CopyPart-
#6864   NEW    cov: 20560 ft: 63796 corp: 1190/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 103/170 MS: 2 EraseBytes-ChangeBit-
#6869   NEW    cov: 20563 ft: 63809 corp: 1191/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 169/170 MS: 5 ShuffleBytes-InsertRepeatedBytes-InsertRepeatedBytes-ShuffleBytes-PersAutoDict- DE: "\xff\xff\xff\x07"-
#6988   NEW    cov: 20563 ft: 63838 corp: 1192/146Kb lim: 170 exec/s: 349 rss: 106Mb L: 129/170 MS: 4 InsertRepeatedBytes-ChangeBit-EraseBytes-InsertRepeatedBytes-
────────────────────────────────────────────────────────────────────────────────

Error: Fuzz target exited with signal: 11 (core dumped)

Versions and Environment

Wasmtime version or commit: main

Operating system: Fedora 35

Architecture: x86_64

view this post on Zulip Wasmtime GitHub notifications bot (Aug 25 2022 at 19:47):

abrown opened issue #4786:

Test Case

No test case produced.

Steps to Reproduce

$ ALLOWED_ENGINES=-spec,-wasmi cargo +nightly fuzz run differential -s none

Run for enough time to crash.

Expected Results

Not to crash.

Actual Results

...
=== Execution rate (1023 successes / 6000 attempted modules): 17.05% ===
        wasmi: 0.00%, spec: 0.00%, wasmtime: 87.70%, v8: 12.30%
        wasm-smith: 44.81%, single-inst: 55.19%
...
#6851   NEW    cov: 20559 ft: 63794 corp: 1188/146Kb lim: 170 exec/s: 360 rss: 106Mb L: 138/170 MS: 2 CopyPart-CopyPart-
#6862   NEW    cov: 20559 ft: 63795 corp: 1189/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 132/170 MS: 1 CopyPart-
#6864   NEW    cov: 20560 ft: 63796 corp: 1190/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 103/170 MS: 2 EraseBytes-ChangeBit-
#6869   NEW    cov: 20563 ft: 63809 corp: 1191/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 169/170 MS: 5 ShuffleBytes-InsertRepeatedBytes-InsertRepeatedBytes-ShuffleBytes-PersAutoDict- DE: "\xff\xff\xff\x07"-
#6988   NEW    cov: 20563 ft: 63838 corp: 1192/146Kb lim: 170 exec/s: 349 rss: 106Mb L: 129/170 MS: 4 InsertRepeatedBytes-ChangeBit-EraseBytes-InsertRepeatedBytes-
────────────────────────────────────────────────────────────────────────────────

Error: Fuzz target exited with signal: 11 (core dumped)

Versions and Environment

Wasmtime version or commit: main

Operating system: Fedora 35

Architecture: x86_64

view this post on Zulip Wasmtime GitHub notifications bot (Aug 25 2022 at 19:48):

abrown commented on issue #4786:

cc: @alexcrichton, guess that crash we saw wasn't the spec interpreter after all.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 25 2022 at 19:48):

abrown labeled issue #4786:

Test Case

No test case produced.

Steps to Reproduce

$ ALLOWED_ENGINES=-spec,-wasmi cargo +nightly fuzz run differential -s none

Run for enough time to crash.

Expected Results

Not to crash.

Actual Results

...
=== Execution rate (1023 successes / 6000 attempted modules): 17.05% ===
        wasmi: 0.00%, spec: 0.00%, wasmtime: 87.70%, v8: 12.30%
        wasm-smith: 44.81%, single-inst: 55.19%
...
#6851   NEW    cov: 20559 ft: 63794 corp: 1188/146Kb lim: 170 exec/s: 360 rss: 106Mb L: 138/170 MS: 2 CopyPart-CopyPart-
#6862   NEW    cov: 20559 ft: 63795 corp: 1189/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 132/170 MS: 1 CopyPart-
#6864   NEW    cov: 20560 ft: 63796 corp: 1190/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 103/170 MS: 2 EraseBytes-ChangeBit-
#6869   NEW    cov: 20563 ft: 63809 corp: 1191/146Kb lim: 170 exec/s: 361 rss: 106Mb L: 169/170 MS: 5 ShuffleBytes-InsertRepeatedBytes-InsertRepeatedBytes-ShuffleBytes-PersAutoDict- DE: "\xff\xff\xff\x07"-
#6988   NEW    cov: 20563 ft: 63838 corp: 1192/146Kb lim: 170 exec/s: 349 rss: 106Mb L: 129/170 MS: 4 InsertRepeatedBytes-ChangeBit-EraseBytes-InsertRepeatedBytes-
────────────────────────────────────────────────────────────────────────────────

Error: Fuzz target exited with signal: 11 (core dumped)

Versions and Environment

Wasmtime version or commit: main

Operating system: Fedora 35

Architecture: x86_64

view this post on Zulip Wasmtime GitHub notifications bot (Aug 30 2022 at 04:12):

alexcrichton commented on issue #4786:

I haven't been able to reproduce this locally unfortunately. When I let the fuzzer run long enough it hit https://github.com/bytecodealliance/wasmtime/issues/4812 before hitting this fault.

Can you try running the fuzzer in gdb to see if you can grab a stack trace of the fault? (presuming it doesn't take too long to fault). When using gdb though it's difficult b/c segfaults can normally happen for wasm and there's no great way to handle this. I was able to do handle SIGILL nostop locally in gdb but if you do handle SIGSEGV nostop I think that when it actually hits the crashing failure it lets the process exit which is no good, so you may have to babysit it a bit and double-check that SIGSEGV doesn't look like a normal wasm SIGSEGV (rr might be super helpful here in this regard)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 30 2022 at 07:38):

bjorn3 commented on issue #4786:

Maybe you can do handle SIGSEGV nostop but also add a breakpoint where the signal handler propagates the SIGSEGV if it didn't originate from the wasm module?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2024 at 22:21):

alexcrichton commented on issue #4786:

I've been unable to fuzz with v8 locally for the longest time so I tried to dig into this today. Turns out it's an issue with memory protection keys so this'll only affect new enough x64 hardware with new enough glibc. That I think would explain why Andrew initially ran into this and I couldn't reproduce. I think that would also help explain why it hasn't showed up on oss-fuzz at all despite being trivial to reproduce locally.

My best read of the situation is that if you run with v8 and differential fuzzing long enough locally it will eventually crash. This crash, when I've inspected it, always looks like this:

Thread 1 "differential" received signal SIGSEGV, Segmentation fault.
0x00002061775c8170 in ?? ()
(gdb) disas $rip,$rip+20
Dump of assembler code from 0x2061775c8170 to 0x2061775c8184:
=> 0x00002061775c8170:  jmp    *0x2(%rip)        # 0x2061775c8178

The memory being accessed is just below %rip itself, and the memory address of 0x00002061775c8178 is indeed readable in gdb. This is in fact an indirect jump to Builtins_WasmLiftoffFrameSetup all the time. The problem is that if you print $_siginfo.si_code in gdb it yields 4 which corresponds to SEGV_PKUERR.

So the problem here is that JIT code isn't allowed to read this code due to MPK and the status of the PKRU. The corruption here seems to be:

This is all fine and dandy and Wasmtime shouldn't interfere with it, except it does. The problem is apparently signal handlers (it's always signal handlers). V8 assumes a steady-state PKRU register of 0x5...58. That means no access to keys >=2, readonly/executable access to pkey 1, and all access to pkey 0. When a signal is received, however, it resets the PKRU register to 0x5..54. This notably disables access to all keys >=1, disagreeing with V8's steady state.

The problem here is that when Wasmtime executes, hits a signal, recovers, and goes back into V8, now V8 is in a corrupt state. V8 thinks JIT code can access pkey 1, but it in fact cannot. A reproduction of this problem can be found in this gist which uses the v8 crate to run some code, then a segfault is generated and caught, then code is run again. This second execution crashes.


Ok so what to do about this. I found that v8::V8::set_flags_from_string("--no-memory_protection_keys") looks promising. That fixes the standalone test case here but it causes V8 to segfault in a different location when passed during fuzzing. According to the man page about pkeys signal handlers need to manually restore the state after they're received. Looks like sig{set,long}jmp don't solve this automatically.

I have run out of ideas of how to fix this. If OSS-Fuzz picks up hardware with MPK we'll probably have to disable differential execution with v8 if nothing else changes.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2024 at 22:26):

alexcrichton commented on issue #4786:

That fixes the standalone test case here but it causes V8 to segfault in a different location when passed during fuzzing.

Looks like this flag doesn't fully disable MPK. The crash is in V8 with si_code = 4 which shows pkey misconfiguration again. So looks like that flag only partially disables MPK.


Last updated: Jan 24 2025 at 00:11 UTC