Stream: git-wasmtime

Topic: wasmtime / PR #2611 Implement limiting WebAssembly execut...


view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2021 at 22:32):

alexcrichton opened PR #2611 from fuel to main:

This PR lifts a feature from Lucet to wasmtime where generated code can count instructions or account for "fuel" during execution. The purpose of this feature is similar to the interrupt support via InterruptHandle, but it mainly allows deterministically interrupting a wasm module instead of relying on a timer. Additionally a future goal of this PR is to extend the async support of wasmtime to leverage fuel to periodically "interrupt" executing wasm code to yield back to the host. This would enable wasmtime futures to never take "too long" in Future::poll since if they would otherwise take awhile they'd yield back to the host and allow preemption and/or other things like future timeouts.

Thee implementation here is nearly copied verbatim from Lucet itself, with tweaks as appropriate for the different vmctx representation in Wasmtime. The main difference is that Wasmtime's fuel counter is two levels of indirection away from the vmctx rather than one in Lucet. To help with this a new Variable stores the VMInterrupts pointer value to avoid reloading the same value each time from the vmctx.

Support for this feature is exposed through a few new APIs:

The current behavior, which cannot be changed, is that when fuel runs out a wasm trap is generated. I hope to make this configurable in the future so that for async stores when fuel runs out it's automatically re-injected with fuel but only after a yield back to the host happens.

I've done a bit of benchmarking with this using criterion and the benchmarks here -- https://github.com/bytecodealliance/sightglass/tree/main/benchmarks-next. The benchmarks are relatively limited at this time but were able to produce some useful data in the meantime. This shows to be a 35-45% slowdown on my personal laptop for the runtime execution of the benchmarked porttion of the code for blake3-scalar and shootout-ackermann. At least for ackermann this is somewhat expected because the loops/function calls are all tiny, so the overhead is quite noticeable. For blake3-scalar I assume it's similar but haven't dug in yet. Note that these numbers were with the new backend since the old x86 backend seems significantly worse than the x64 one.

I do think there might be some relatively low-hanging fruit with respect to performance, but further tweaks would require changes to cranelift itself to optimize instruction selection. For example one optimization might be to not have a fuel_var and instead periodically do addq $fuel_consumed, offset(%vminterrupts_ptr) which avoids consuming extra registers. Similarly cmpq $0, offset(%vminterrupts_ptr) could be generated as well. I couldn't get the x64 backend to emit those forms of instructions at this time though. I'm also not 100% certain that it'll be faster.

Note that for now this doesn't depend on the async PR, but I plan on having a future PR after these two land which implements the periodically-yield option.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2021 at 22:34):

alexcrichton requested cfallin for a review on PR #2611.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2021 at 22:44):

alexcrichton updated PR #2611 from fuel to main:

This PR lifts a feature from Lucet to wasmtime where generated code can count instructions or account for "fuel" during execution. The purpose of this feature is similar to the interrupt support via InterruptHandle, but it mainly allows deterministically interrupting a wasm module instead of relying on a timer. Additionally a future goal of this PR is to extend the async support of wasmtime to leverage fuel to periodically "interrupt" executing wasm code to yield back to the host. This would enable wasmtime futures to never take "too long" in Future::poll since if they would otherwise take awhile they'd yield back to the host and allow preemption and/or other things like future timeouts.

Thee implementation here is nearly copied verbatim from Lucet itself, with tweaks as appropriate for the different vmctx representation in Wasmtime. The main difference is that Wasmtime's fuel counter is two levels of indirection away from the vmctx rather than one in Lucet. To help with this a new Variable stores the VMInterrupts pointer value to avoid reloading the same value each time from the vmctx.

Support for this feature is exposed through a few new APIs:

The current behavior, which cannot be changed, is that when fuel runs out a wasm trap is generated. I hope to make this configurable in the future so that for async stores when fuel runs out it's automatically re-injected with fuel but only after a yield back to the host happens.

I've done a bit of benchmarking with this using criterion and the benchmarks here -- https://github.com/bytecodealliance/sightglass/tree/main/benchmarks-next. The benchmarks are relatively limited at this time but were able to produce some useful data in the meantime. This shows to be a 35-45% slowdown on my personal laptop for the runtime execution of the benchmarked porttion of the code for blake3-scalar and shootout-ackermann. At least for ackermann this is somewhat expected because the loops/function calls are all tiny, so the overhead is quite noticeable. For blake3-scalar I assume it's similar but haven't dug in yet. Note that these numbers were with the new backend since the old x86 backend seems significantly worse than the x64 one.

I do think there might be some relatively low-hanging fruit with respect to performance, but further tweaks would require changes to cranelift itself to optimize instruction selection. For example one optimization might be to not have a fuel_var and instead periodically do addq $fuel_consumed, offset(%vminterrupts_ptr) which avoids consuming extra registers. Similarly cmpq $0, offset(%vminterrupts_ptr) could be generated as well. I couldn't get the x64 backend to emit those forms of instructions at this time though. I'm also not 100% certain that it'll be faster.

Note that for now this doesn't depend on the async PR, but I plan on having a future PR after these two land which implements the periodically-yield option.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 00:14):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 00:14):

fitzgen created PR Review Comment:

Maybe worth noting somewhere around here that, for the purposes of fuel, we don't care about (implicit) branches due to traps in the middle of a block?

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 00:14):

fitzgen submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 00:16):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 00:16):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 00:16):

cfallin created PR Review Comment:

Is there any reason we can't add to the existing fuel_adj value here, in order to continue accumulating the consumed-fuel count and return the true total from fuel_consumed()?

(In that case I might also call this add_fuel(), and adjust existing rather than overwrite the fuel-consumed counter...)

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 00:16):

cfallin created PR Review Comment:

Can you say more about the expected way to run this in an executor-loop system with timeslicing (as in Lucet's fuel implementation)?

Specifically, is the common case that we'll run for a timeslice and then unwind here using the trap mechanism back to the Wasm entry point, at which point some higher-level wrapper might yield a future or similar?

My two concerns are:

All of my concerns above go away if we have a way to plug in a custom out-of-gas handler; I couldn't find an API that would let one do this (one would need to pass in a custom TrapInfo trait impl I think?) though I may have missed it. Or alternately, is the more complete interface coming with later async work?

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 15:15):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 15:15):

alexcrichton created PR Review Comment:

Oh sure yeah, the "raise a trap" implementation here is primarily just so we can test it in this PR without all the async bits landed yet. Additionally it helps us fuzz the implementation well by ensuring wasm is always consuming fuel when executing. For an async yield-every-so-often implementation my plan is to:

The implementation would be relatively simple, basically calling suspend here aftter notifying ourselves to the future executor saying we're already ready for another poll. I would imagine that this would all be guarded by a basic if in this out-of-gas handler which either traps or yields.

Efficiency-wise I think it should be quite fast because it's a fiber switch and no unwinding happens (not even longjmp). Upon resumption we'd simply return from this function and wasm would keep going. Resumption-wise we should be good as well due to fibers and whatnot. Basically the trap stuff won't happen at for the timeslicing, it's just a way for me to land this PR before the async fiddly bits are here.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 15:16):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 15:16):

alexcrichton created PR Review Comment:

I was a tiny bit worried that a long-lived store might overflow the i64 counter but that may not be too realistic. Do you think that'd be rare enough that we should just switch this to an add instead of a set?

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 15:17):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 15:17):

alexcrichton created PR Review Comment:

Ah indeed, good point!

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 15:22):

alexcrichton updated PR #2611 from fuel to main:

This PR lifts a feature from Lucet to wasmtime where generated code can count instructions or account for "fuel" during execution. The purpose of this feature is similar to the interrupt support via InterruptHandle, but it mainly allows deterministically interrupting a wasm module instead of relying on a timer. Additionally a future goal of this PR is to extend the async support of wasmtime to leverage fuel to periodically "interrupt" executing wasm code to yield back to the host. This would enable wasmtime futures to never take "too long" in Future::poll since if they would otherwise take awhile they'd yield back to the host and allow preemption and/or other things like future timeouts.

Thee implementation here is nearly copied verbatim from Lucet itself, with tweaks as appropriate for the different vmctx representation in Wasmtime. The main difference is that Wasmtime's fuel counter is two levels of indirection away from the vmctx rather than one in Lucet. To help with this a new Variable stores the VMInterrupts pointer value to avoid reloading the same value each time from the vmctx.

Support for this feature is exposed through a few new APIs:

The current behavior, which cannot be changed, is that when fuel runs out a wasm trap is generated. I hope to make this configurable in the future so that for async stores when fuel runs out it's automatically re-injected with fuel but only after a yield back to the host happens.

I've done a bit of benchmarking with this using criterion and the benchmarks here -- https://github.com/bytecodealliance/sightglass/tree/main/benchmarks-next. The benchmarks are relatively limited at this time but were able to produce some useful data in the meantime. This shows to be a 35-45% slowdown on my personal laptop for the runtime execution of the benchmarked porttion of the code for blake3-scalar and shootout-ackermann. At least for ackermann this is somewhat expected because the loops/function calls are all tiny, so the overhead is quite noticeable. For blake3-scalar I assume it's similar but haven't dug in yet. Note that these numbers were with the new backend since the old x86 backend seems significantly worse than the x64 one.

I do think there might be some relatively low-hanging fruit with respect to performance, but further tweaks would require changes to cranelift itself to optimize instruction selection. For example one optimization might be to not have a fuel_var and instead periodically do addq $fuel_consumed, offset(%vminterrupts_ptr) which avoids consuming extra registers. Similarly cmpq $0, offset(%vminterrupts_ptr) could be generated as well. I couldn't get the x64 backend to emit those forms of instructions at this time though. I'm also not 100% certain that it'll be faster.

Note that for now this doesn't depend on the async PR, but I plan on having a future PR after these two land which implements the periodically-yield option.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 18:52):

alexcrichton updated PR #2611 from fuel to main:

This PR lifts a feature from Lucet to wasmtime where generated code can count instructions or account for "fuel" during execution. The purpose of this feature is similar to the interrupt support via InterruptHandle, but it mainly allows deterministically interrupting a wasm module instead of relying on a timer. Additionally a future goal of this PR is to extend the async support of wasmtime to leverage fuel to periodically "interrupt" executing wasm code to yield back to the host. This would enable wasmtime futures to never take "too long" in Future::poll since if they would otherwise take awhile they'd yield back to the host and allow preemption and/or other things like future timeouts.

Thee implementation here is nearly copied verbatim from Lucet itself, with tweaks as appropriate for the different vmctx representation in Wasmtime. The main difference is that Wasmtime's fuel counter is two levels of indirection away from the vmctx rather than one in Lucet. To help with this a new Variable stores the VMInterrupts pointer value to avoid reloading the same value each time from the vmctx.

Support for this feature is exposed through a few new APIs:

The current behavior, which cannot be changed, is that when fuel runs out a wasm trap is generated. I hope to make this configurable in the future so that for async stores when fuel runs out it's automatically re-injected with fuel but only after a yield back to the host happens.

I've done a bit of benchmarking with this using criterion and the benchmarks here -- https://github.com/bytecodealliance/sightglass/tree/main/benchmarks-next. The benchmarks are relatively limited at this time but were able to produce some useful data in the meantime. This shows to be a 35-45% slowdown on my personal laptop for the runtime execution of the benchmarked porttion of the code for blake3-scalar and shootout-ackermann. At least for ackermann this is somewhat expected because the loops/function calls are all tiny, so the overhead is quite noticeable. For blake3-scalar I assume it's similar but haven't dug in yet. Note that these numbers were with the new backend since the old x86 backend seems significantly worse than the x64 one.

I do think there might be some relatively low-hanging fruit with respect to performance, but further tweaks would require changes to cranelift itself to optimize instruction selection. For example one optimization might be to not have a fuel_var and instead periodically do addq $fuel_consumed, offset(%vminterrupts_ptr) which avoids consuming extra registers. Similarly cmpq $0, offset(%vminterrupts_ptr) could be generated as well. I couldn't get the x64 backend to emit those forms of instructions at this time though. I'm also not 100% certain that it'll be faster.

Note that for now this doesn't depend on the async PR, but I plan on having a future PR after these two land which implements the periodically-yield option.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 19:01):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 19:01):

cfallin created PR Review Comment:

IMHO it's nicer to not have the "fuel-used value that we return is only since the last set" property -- it has the potential to become a subtle stats bug later.

Doing some quick math, a 2^63 max count, at 1B Wasm ops per second, gives us 2^33 or 8B seconds of runtime before overflow, which is ~250 years. Sometime before the year 2270 we can come back and upgrade to an i128 :-)

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 19:03):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 19:03):

cfallin created PR Review Comment:

Makes sense! Happy to see this go in as-is, then.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 21:46):

alexcrichton updated PR #2611 from fuel to main:

This PR lifts a feature from Lucet to wasmtime where generated code can count instructions or account for "fuel" during execution. The purpose of this feature is similar to the interrupt support via InterruptHandle, but it mainly allows deterministically interrupting a wasm module instead of relying on a timer. Additionally a future goal of this PR is to extend the async support of wasmtime to leverage fuel to periodically "interrupt" executing wasm code to yield back to the host. This would enable wasmtime futures to never take "too long" in Future::poll since if they would otherwise take awhile they'd yield back to the host and allow preemption and/or other things like future timeouts.

Thee implementation here is nearly copied verbatim from Lucet itself, with tweaks as appropriate for the different vmctx representation in Wasmtime. The main difference is that Wasmtime's fuel counter is two levels of indirection away from the vmctx rather than one in Lucet. To help with this a new Variable stores the VMInterrupts pointer value to avoid reloading the same value each time from the vmctx.

Support for this feature is exposed through a few new APIs:

The current behavior, which cannot be changed, is that when fuel runs out a wasm trap is generated. I hope to make this configurable in the future so that for async stores when fuel runs out it's automatically re-injected with fuel but only after a yield back to the host happens.

I've done a bit of benchmarking with this using criterion and the benchmarks here -- https://github.com/bytecodealliance/sightglass/tree/main/benchmarks-next. The benchmarks are relatively limited at this time but were able to produce some useful data in the meantime. This shows to be a 35-45% slowdown on my personal laptop for the runtime execution of the benchmarked porttion of the code for blake3-scalar and shootout-ackermann. At least for ackermann this is somewhat expected because the loops/function calls are all tiny, so the overhead is quite noticeable. For blake3-scalar I assume it's similar but haven't dug in yet. Note that these numbers were with the new backend since the old x86 backend seems significantly worse than the x64 one.

I do think there might be some relatively low-hanging fruit with respect to performance, but further tweaks would require changes to cranelift itself to optimize instruction selection. For example one optimization might be to not have a fuel_var and instead periodically do addq $fuel_consumed, offset(%vminterrupts_ptr) which avoids consuming extra registers. Similarly cmpq $0, offset(%vminterrupts_ptr) could be generated as well. I couldn't get the x64 backend to emit those forms of instructions at this time though. I'm also not 100% certain that it'll be faster.

Note that for now this doesn't depend on the async PR, but I plan on having a future PR after these two land which implements the periodically-yield option.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 21:46):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 21:46):

alexcrichton created PR Review Comment:

Heh good point!

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 21:55):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2021 at 22:10):

alexcrichton updated PR #2611 from fuel to main:

This PR lifts a feature from Lucet to wasmtime where generated code can count instructions or account for "fuel" during execution. The purpose of this feature is similar to the interrupt support via InterruptHandle, but it mainly allows deterministically interrupting a wasm module instead of relying on a timer. Additionally a future goal of this PR is to extend the async support of wasmtime to leverage fuel to periodically "interrupt" executing wasm code to yield back to the host. This would enable wasmtime futures to never take "too long" in Future::poll since if they would otherwise take awhile they'd yield back to the host and allow preemption and/or other things like future timeouts.

Thee implementation here is nearly copied verbatim from Lucet itself, with tweaks as appropriate for the different vmctx representation in Wasmtime. The main difference is that Wasmtime's fuel counter is two levels of indirection away from the vmctx rather than one in Lucet. To help with this a new Variable stores the VMInterrupts pointer value to avoid reloading the same value each time from the vmctx.

Support for this feature is exposed through a few new APIs:

The current behavior, which cannot be changed, is that when fuel runs out a wasm trap is generated. I hope to make this configurable in the future so that for async stores when fuel runs out it's automatically re-injected with fuel but only after a yield back to the host happens.

I've done a bit of benchmarking with this using criterion and the benchmarks here -- https://github.com/bytecodealliance/sightglass/tree/main/benchmarks-next. The benchmarks are relatively limited at this time but were able to produce some useful data in the meantime. This shows to be a 35-45% slowdown on my personal laptop for the runtime execution of the benchmarked porttion of the code for blake3-scalar and shootout-ackermann. At least for ackermann this is somewhat expected because the loops/function calls are all tiny, so the overhead is quite noticeable. For blake3-scalar I assume it's similar but haven't dug in yet. Note that these numbers were with the new backend since the old x86 backend seems significantly worse than the x64 one.

I do think there might be some relatively low-hanging fruit with respect to performance, but further tweaks would require changes to cranelift itself to optimize instruction selection. For example one optimization might be to not have a fuel_var and instead periodically do addq $fuel_consumed, offset(%vminterrupts_ptr) which avoids consuming extra registers. Similarly cmpq $0, offset(%vminterrupts_ptr) could be generated as well. I couldn't get the x64 backend to emit those forms of instructions at this time though. I'm also not 100% certain that it'll be faster.

Note that for now this doesn't depend on the async PR, but I plan on having a future PR after these two land which implements the periodically-yield option.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 29 2021 at 14:57):

alexcrichton merged PR #2611.


Last updated: Dec 23 2024 at 12:05 UTC