Stream: git-wasmtime

Topic: wasmtime / issue #3186 Timeouts in spec interpreter fuzzing


view this post on Zulip Wasmtime GitHub notifications bot (Aug 13 2021 at 20:53):

alexcrichton opened issue #3186:

I... don't really know how we want to handle this. It appears that the spec interpreter is showing extremely slow behavior when processing nested blocks. For example this module (reduce from a fuzz case):

(module
  (func
    global.get 0
    i32.eqz
    if
      unreachable
    end
    global.get 0
    i32.const 1
    i32.sub
    global.set 0

    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    call 0
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    )
  (global (mut i32) (i32.const 1000))
  (export "" (func 0)))

takes many dozens of seconds to run in the interpreter. If the block ... end is all removed though it finishes near instantaneously.

In some sense this may be something we can fix in the interpreter (not that I have any idea how to really read or write ocaml), but in the more general sense this is probably something we'll want to protect against in the fuzzers. AFAIK the interpreter isn't built to be fast, it's just built to run, so we're probably going to run into a lot more of these where "simple" modules take disproportionately long to run in the interpreter.

My only guess, though, of how to do this would be to fork a process with the interpreter and get the results back via a pipe or something like that, giving us the ability to SIGKILL if it times out. Apart from that though I don't know how we'd time out the interpreter...

view this post on Zulip Wasmtime GitHub notifications bot (Aug 13 2021 at 22:24):

abrown commented on issue #3186:

I can take a look at the OCaml side next week to see why this so slow, but, yeah, in general the in-process interpreter execution is going to be prone to this type of thing. Let's talk more about a general solution in the Wasmtime meeting?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 13 2021 at 22:24):

abrown assigned issue #3186:

I... don't really know how we want to handle this. It appears that the spec interpreter is showing extremely slow behavior when processing nested blocks. For example this module (reduce from a fuzz case):

(module
  (func
    global.get 0
    i32.eqz
    if
      unreachable
    end
    global.get 0
    i32.const 1
    i32.sub
    global.set 0

    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    call 0
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    )
  (global (mut i32) (i32.const 1000))
  (export "" (func 0)))

takes many dozens of seconds to run in the interpreter. If the block ... end is all removed though it finishes near instantaneously.

In some sense this may be something we can fix in the interpreter (not that I have any idea how to really read or write ocaml), but in the more general sense this is probably something we'll want to protect against in the fuzzers. AFAIK the interpreter isn't built to be fast, it's just built to run, so we're probably going to run into a lot more of these where "simple" modules take disproportionately long to run in the interpreter.

My only guess, though, of how to do this would be to fork a process with the interpreter and get the results back via a pipe or something like that, giving us the ability to SIGKILL if it times out. Apart from that though I don't know how we'd time out the interpreter...

view this post on Zulip Wasmtime GitHub notifications bot (Aug 16 2021 at 15:20):

alexcrichton labeled issue #3186 (assigned to abrown):

I... don't really know how we want to handle this. It appears that the spec interpreter is showing extremely slow behavior when processing nested blocks. For example this module (reduce from a fuzz case):

(module
  (func
    global.get 0
    i32.eqz
    if
      unreachable
    end
    global.get 0
    i32.const 1
    i32.sub
    global.set 0

    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    call 0
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    )
  (global (mut i32) (i32.const 1000))
  (export "" (func 0)))

takes many dozens of seconds to run in the interpreter. If the block ... end is all removed though it finishes near instantaneously.

In some sense this may be something we can fix in the interpreter (not that I have any idea how to really read or write ocaml), but in the more general sense this is probably something we'll want to protect against in the fuzzers. AFAIK the interpreter isn't built to be fast, it's just built to run, so we're probably going to run into a lot more of these where "simple" modules take disproportionately long to run in the interpreter.

My only guess, though, of how to do this would be to fork a process with the interpreter and get the results back via a pipe or something like that, giving us the ability to SIGKILL if it times out. Apart from that though I don't know how we'd time out the interpreter...

view this post on Zulip Wasmtime GitHub notifications bot (Aug 16 2021 at 15:20):

alexcrichton labeled issue #3186 (assigned to abrown):

I... don't really know how we want to handle this. It appears that the spec interpreter is showing extremely slow behavior when processing nested blocks. For example this module (reduce from a fuzz case):

(module
  (func
    global.get 0
    i32.eqz
    if
      unreachable
    end
    global.get 0
    i32.const 1
    i32.sub
    global.set 0

    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    call 0
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    )
  (global (mut i32) (i32.const 1000))
  (export "" (func 0)))

takes many dozens of seconds to run in the interpreter. If the block ... end is all removed though it finishes near instantaneously.

In some sense this may be something we can fix in the interpreter (not that I have any idea how to really read or write ocaml), but in the more general sense this is probably something we'll want to protect against in the fuzzers. AFAIK the interpreter isn't built to be fast, it's just built to run, so we're probably going to run into a lot more of these where "simple" modules take disproportionately long to run in the interpreter.

My only guess, though, of how to do this would be to fork a process with the interpreter and get the results back via a pipe or something like that, giving us the ability to SIGKILL if it times out. Apart from that though I don't know how we'd time out the interpreter...

view this post on Zulip Wasmtime GitHub notifications bot (Aug 16 2021 at 15:21):

github-actions[bot] commented on issue #3186:

Subscribe to Label Action

cc @fitzgen

<details>
This issue or pull request has been labeled: "fuzzing"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Aug 16 2021 at 16:14):

alexcrichton commented on issue #3186:

Personally I don't want to get into the business of maintaining, optimizing, etc, the spec interpreter. If others want to take up that mantle though that can work! (it's why we have a fork after all...)

To clarify, though, I don't think this is an interpreter-vs-compiler slowdown. That certainly has a slowdown but something here is clearly quadratic or greater behavior in the interpreter. I expect timeouts between wasmtime and the interpreter are all likely to fall in this category where it's bugs to egregiously slow behavior in the interpreter rather than "oh well interpretation is a bit slower". The fuel limit we have in each module is only at 1000 right now which severly limits the actual amount of executed wasm to the point that it shouldn't ever actually time out, even in a slow interpreter.

Unless OCaml provides the native ability to interrupt and exit actively running OCaml code though I don't think that we really have a great recourse here because I don't think the subprocess idea is really that feasible. This'd end up meaning that the spec fuzzer is less useful because it will end up fuzzing less as it hits bugs like this.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 19 2021 at 23:53):

cfallin commented on issue #3186:

I dug into this a bit more today, just to understand if it was something we could easily fix or if it was more fundamental.

The basic issue seems to be that the interpreter has O(n^2) behavior because of the way it steps configurations when it is deep in a nested labels-and-callframes configuration. Specifically, if in the step function we start with a configuration Block (Block (Block (Block ...))), we will step through Label (Plain Block (Block (Block ...))), then Label (Label (Plain Block (Block ...))), then Label (Label (Label (Block ...))), etc. This much is fine: there are O(n) steps to execute the above example, i.e., the number of steps per Wasm instruction executed is asymptotically constant.

However, the way that the step function operates causes each step to take O(|config|) time, where |config| is the current size of the "configuration" (the term trees above). Specifically, when a (Label ...) is at the root, we recurse into the sub-configuration with a recursive call to step, and step the inner configuration once; then we rebuild the outer configuration, which ends up allocating a new heap object. This happens here:

    | Label (n, es0, code'), vs ->
      let c' = step {c with code = code'} in
      vs, [Label (n, es0, c'.code) @@ e.at]

The end result is the O(n^2) behavior we observe. Note that Wasm calls are handled by putting a Frame term in the configuration and stepping within that (i.e., the Wasm callstack is embedded in the configuration tree) -- see here -- so the 1000-deep recursive calls in the example above contribute to the quadratic term as well.

When I profiled an optimized build of the Wasm spec interpreter running the above example, I saw about 15% of time spent in the code snippet above, and about 50% of time in the various parts of the garbage collector + allocator.

I think that we could fix this by replacing the recursive step invocations with effectively a step*, i.e., "step to completion". This would cause issues for any resource-limiting ("run for exactly this many steps") built on top of the step function, but for our use case we only care about the final result. Alternately we could have a stepN that takes a nonzero N. This might not be terrible to do; I'd be willing to try it out and then file an issue on WebAssembly/spec; but there's also the risk that this would not be accepted upstream for clarity or other reasons. Thoughts?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2021 at 00:54):

cfallin commented on issue #3186:

Well, I went and got nerd-sniped... I fixed the issue in this branch of the spec interpreter. I'll see about contributing this back to WebAssembly/spec tomorrow.

In the meantime we could potentially put this in our fork. (Happy to start a longer conversation about maintenance effort of doing actual fixes on our fork, vs. just merging new specs or whatnot -- I hope that bugfixes such as this don't become common!)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2021 at 14:33):

alexcrichton commented on issue #3186:

Oh wow, thanks for digging into that @cfallin! Fingers crossed for how it goes upstream :fingers_crossed:

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2021 at 16:32):

abrown commented on issue #3186:

Yeah, nice work! What if we create a fuzzing branch on https://github.com/bytecodealliance/wasm-spec-mirror that contains fixes like this and, eventually, any proposals we merge in? If your commit gets merged upstream then fuzzing can match main?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2021 at 16:45):

cfallin commented on issue #3186:

Sure, I can push a fuzzing branch and we can switch to that for now!

view this post on Zulip Wasmtime GitHub notifications bot (Sep 03 2021 at 19:10):

alexcrichton commented on issue #3186:

I'm going to close this in favor of https://github.com/bytecodealliance/wasmtime/issues/3251

view this post on Zulip Wasmtime GitHub notifications bot (Sep 03 2021 at 19:10):

alexcrichton closed issue #3186 (assigned to abrown):

I... don't really know how we want to handle this. It appears that the spec interpreter is showing extremely slow behavior when processing nested blocks. For example this module (reduce from a fuzz case):

(module
  (func
    global.get 0
    i32.eqz
    if
      unreachable
    end
    global.get 0
    i32.const 1
    i32.sub
    global.set 0

    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    block
    call 0
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    end
    )
  (global (mut i32) (i32.const 1000))
  (export "" (func 0)))

takes many dozens of seconds to run in the interpreter. If the block ... end is all removed though it finishes near instantaneously.

In some sense this may be something we can fix in the interpreter (not that I have any idea how to really read or write ocaml), but in the more general sense this is probably something we'll want to protect against in the fuzzers. AFAIK the interpreter isn't built to be fast, it's just built to run, so we're probably going to run into a lot more of these where "simple" modules take disproportionately long to run in the interpreter.

My only guess, though, of how to do this would be to fork a process with the interpreter and get the results back via a pipe or something like that, giving us the ability to SIGKILL if it times out. Apart from that though I don't know how we'd time out the interpreter...


Last updated: Oct 23 2024 at 20:03 UTC