alexcrichton opened PR #2611 from fuel
to main
:
This PR lifts a feature from Lucet to wasmtime where generated code can count instructions or account for "fuel" during execution. The purpose of this feature is similar to the interrupt support via
InterruptHandle
, but it mainly allows deterministically interrupting a wasm module instead of relying on a timer. Additionally a future goal of this PR is to extend theasync
support of wasmtime to leverage fuel to periodically "interrupt" executing wasm code to yield back to the host. This would enable wasmtime futures to never take "too long" inFuture::poll
since if they would otherwise take awhile they'd yield back to the host and allow preemption and/or other things like future timeouts.Thee implementation here is nearly copied verbatim from Lucet itself, with tweaks as appropriate for the different vmctx representation in Wasmtime. The main difference is that Wasmtime's fuel counter is two levels of indirection away from the vmctx rather than one in Lucet. To help with this a new
Variable
stores theVMInterrupts
pointer value to avoid reloading the same value each time from the vmctx.Support for this feature is exposed through a few new APIs:
Config::consume_fuel
- enables codegen options for wasm to consume fuel, and behaves similar tointerruptable
.Store::set_fuel_remaining
- this is how fuel is injected into aStore
for execution of wasm. Note that stores always start with 0 fuel so this is required to be called.Store::fuel_consumed
- this can be used to check how much fuel has been consumed so far.The current behavior, which cannot be changed, is that when fuel runs out a wasm trap is generated. I hope to make this configurable in the future so that for async stores when fuel runs out it's automatically re-injected with fuel but only after a yield back to the host happens.
I've done a bit of benchmarking with this using criterion and the benchmarks here -- https://github.com/bytecodealliance/sightglass/tree/main/benchmarks-next. The benchmarks are relatively limited at this time but were able to produce some useful data in the meantime. This shows to be a 35-45% slowdown on my personal laptop for the runtime execution of the benchmarked porttion of the code for blake3-scalar and shootout-ackermann. At least for ackermann this is somewhat expected because the loops/function calls are all tiny, so the overhead is quite noticeable. For blake3-scalar I assume it's similar but haven't dug in yet. Note that these numbers were with the new backend since the old x86 backend seems significantly worse than the x64 one.
I do think there might be some relatively low-hanging fruit with respect to performance, but further tweaks would require changes to cranelift itself to optimize instruction selection. For example one optimization might be to not have a
fuel_var
and instead periodically doaddq $fuel_consumed, offset(%vminterrupts_ptr)
which avoids consuming extra registers. Similarlycmpq $0, offset(%vminterrupts_ptr)
could be generated as well. I couldn't get the x64 backend to emit those forms of instructions at this time though. I'm also not 100% certain that it'll be faster.Note that for now this doesn't depend on the
async
PR, but I plan on having a future PR after these two land which implements the periodically-yield option.
alexcrichton requested cfallin for a review on PR #2611.
alexcrichton updated PR #2611 from fuel
to main
:
This PR lifts a feature from Lucet to wasmtime where generated code can count instructions or account for "fuel" during execution. The purpose of this feature is similar to the interrupt support via
InterruptHandle
, but it mainly allows deterministically interrupting a wasm module instead of relying on a timer. Additionally a future goal of this PR is to extend theasync
support of wasmtime to leverage fuel to periodically "interrupt" executing wasm code to yield back to the host. This would enable wasmtime futures to never take "too long" inFuture::poll
since if they would otherwise take awhile they'd yield back to the host and allow preemption and/or other things like future timeouts.Thee implementation here is nearly copied verbatim from Lucet itself, with tweaks as appropriate for the different vmctx representation in Wasmtime. The main difference is that Wasmtime's fuel counter is two levels of indirection away from the vmctx rather than one in Lucet. To help with this a new
Variable
stores theVMInterrupts
pointer value to avoid reloading the same value each time from the vmctx.Support for this feature is exposed through a few new APIs:
Config::consume_fuel
- enables codegen options for wasm to consume fuel, and behaves similar tointerruptable
.Store::set_fuel_remaining
- this is how fuel is injected into aStore
for execution of wasm. Note that stores always start with 0 fuel so this is required to be called.Store::fuel_consumed
- this can be used to check how much fuel has been consumed so far.The current behavior, which cannot be changed, is that when fuel runs out a wasm trap is generated. I hope to make this configurable in the future so that for async stores when fuel runs out it's automatically re-injected with fuel but only after a yield back to the host happens.
I've done a bit of benchmarking with this using criterion and the benchmarks here -- https://github.com/bytecodealliance/sightglass/tree/main/benchmarks-next. The benchmarks are relatively limited at this time but were able to produce some useful data in the meantime. This shows to be a 35-45% slowdown on my personal laptop for the runtime execution of the benchmarked porttion of the code for blake3-scalar and shootout-ackermann. At least for ackermann this is somewhat expected because the loops/function calls are all tiny, so the overhead is quite noticeable. For blake3-scalar I assume it's similar but haven't dug in yet. Note that these numbers were with the new backend since the old x86 backend seems significantly worse than the x64 one.
I do think there might be some relatively low-hanging fruit with respect to performance, but further tweaks would require changes to cranelift itself to optimize instruction selection. For example one optimization might be to not have a
fuel_var
and instead periodically doaddq $fuel_consumed, offset(%vminterrupts_ptr)
which avoids consuming extra registers. Similarlycmpq $0, offset(%vminterrupts_ptr)
could be generated as well. I couldn't get the x64 backend to emit those forms of instructions at this time though. I'm also not 100% certain that it'll be faster.Note that for now this doesn't depend on the
async
PR, but I plan on having a future PR after these two land which implements the periodically-yield option.
fitzgen submitted PR Review.
fitzgen created PR Review Comment:
Maybe worth noting somewhere around here that, for the purposes of fuel, we don't care about (implicit) branches due to traps in the middle of a block?
fitzgen submitted PR Review.
cfallin submitted PR Review.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Is there any reason we can't add to the existing
fuel_adj
value here, in order to continue accumulating the consumed-fuel count and return the true total fromfuel_consumed()
?(In that case I might also call this
add_fuel()
, and adjust existing rather than overwrite the fuel-consumed counter...)
cfallin created PR Review Comment:
Can you say more about the expected way to run this in an executor-loop system with timeslicing (as in Lucet's fuel implementation)?
Specifically, is the common case that we'll run for a timeslice and then unwind here using the trap mechanism back to the Wasm entry point, at which point some higher-level wrapper might yield a future or similar?
My two concerns are:
- Efficiency: invoking the unwind mechanism and heap-allocating an error is somewhat heavyweight;
- Ability to resume: it looks like
raise_lib_trap
eventually invokesUnwind
in wasmtime-runtime's helpers.c, which uses longjmp to escape the Wasm stack frames. Does this mean that we can't resume at the point where fuel was exhausted (i.e., it's a terminating trap rather than a resumable one)?All of my concerns above go away if we have a way to plug in a custom out-of-gas handler; I couldn't find an API that would let one do this (one would need to pass in a custom
TrapInfo
trait impl I think?) though I may have missed it. Or alternately, is the more complete interface coming with later async work?
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Oh sure yeah, the "raise a trap" implementation here is primarily just so we can test it in this PR without all the async bits landed yet. Additionally it helps us fuzz the implementation well by ensuring wasm is always consuming fuel when executing. For an async yield-every-so-often implementation my plan is to:
- Still have this bake into a
Store
without the ability to register a custom handler (although that's still possible, but fraught with correctness issues on the caller's part)- Instead of raising a trap here this would simply switch off the fiber. With async we are guaranteed that all wasm is always executing on a fiber, so this is possible.
- The
Store
would have some sort of flag/configuration where in async mode you could request that N fuel is used up and when finished it yields to the current future and then injects N more fuel when it comes back.The implementation would be relatively simple, basically calling suspend here aftter notifying ourselves to the future executor saying we're already ready for another
poll
. I would imagine that this would all be guarded by a basicif
in this out-of-gas handler which either traps or yields.Efficiency-wise I think it should be quite fast because it's a fiber switch and no unwinding happens (not even longjmp). Upon resumption we'd simply return from this function and wasm would keep going. Resumption-wise we should be good as well due to fibers and whatnot. Basically the trap stuff won't happen at for the timeslicing, it's just a way for me to land this PR before the async fiddly bits are here.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
I was a tiny bit worried that a long-lived store might overflow the
i64
counter but that may not be too realistic. Do you think that'd be rare enough that we should just switch this to an add instead of a set?
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Ah indeed, good point!
alexcrichton updated PR #2611 from fuel
to main
:
This PR lifts a feature from Lucet to wasmtime where generated code can count instructions or account for "fuel" during execution. The purpose of this feature is similar to the interrupt support via
InterruptHandle
, but it mainly allows deterministically interrupting a wasm module instead of relying on a timer. Additionally a future goal of this PR is to extend theasync
support of wasmtime to leverage fuel to periodically "interrupt" executing wasm code to yield back to the host. This would enable wasmtime futures to never take "too long" inFuture::poll
since if they would otherwise take awhile they'd yield back to the host and allow preemption and/or other things like future timeouts.Thee implementation here is nearly copied verbatim from Lucet itself, with tweaks as appropriate for the different vmctx representation in Wasmtime. The main difference is that Wasmtime's fuel counter is two levels of indirection away from the vmctx rather than one in Lucet. To help with this a new
Variable
stores theVMInterrupts
pointer value to avoid reloading the same value each time from the vmctx.Support for this feature is exposed through a few new APIs:
Config::consume_fuel
- enables codegen options for wasm to consume fuel, and behaves similar tointerruptable
.Store::set_fuel_remaining
- this is how fuel is injected into aStore
for execution of wasm. Note that stores always start with 0 fuel so this is required to be called.Store::fuel_consumed
- this can be used to check how much fuel has been consumed so far.The current behavior, which cannot be changed, is that when fuel runs out a wasm trap is generated. I hope to make this configurable in the future so that for async stores when fuel runs out it's automatically re-injected with fuel but only after a yield back to the host happens.
I've done a bit of benchmarking with this using criterion and the benchmarks here -- https://github.com/bytecodealliance/sightglass/tree/main/benchmarks-next. The benchmarks are relatively limited at this time but were able to produce some useful data in the meantime. This shows to be a 35-45% slowdown on my personal laptop for the runtime execution of the benchmarked porttion of the code for blake3-scalar and shootout-ackermann. At least for ackermann this is somewhat expected because the loops/function calls are all tiny, so the overhead is quite noticeable. For blake3-scalar I assume it's similar but haven't dug in yet. Note that these numbers were with the new backend since the old x86 backend seems significantly worse than the x64 one.
I do think there might be some relatively low-hanging fruit with respect to performance, but further tweaks would require changes to cranelift itself to optimize instruction selection. For example one optimization might be to not have a
fuel_var
and instead periodically doaddq $fuel_consumed, offset(%vminterrupts_ptr)
which avoids consuming extra registers. Similarlycmpq $0, offset(%vminterrupts_ptr)
could be generated as well. I couldn't get the x64 backend to emit those forms of instructions at this time though. I'm also not 100% certain that it'll be faster.Note that for now this doesn't depend on the
async
PR, but I plan on having a future PR after these two land which implements the periodically-yield option.
alexcrichton updated PR #2611 from fuel
to main
:
This PR lifts a feature from Lucet to wasmtime where generated code can count instructions or account for "fuel" during execution. The purpose of this feature is similar to the interrupt support via
InterruptHandle
, but it mainly allows deterministically interrupting a wasm module instead of relying on a timer. Additionally a future goal of this PR is to extend theasync
support of wasmtime to leverage fuel to periodically "interrupt" executing wasm code to yield back to the host. This would enable wasmtime futures to never take "too long" inFuture::poll
since if they would otherwise take awhile they'd yield back to the host and allow preemption and/or other things like future timeouts.Thee implementation here is nearly copied verbatim from Lucet itself, with tweaks as appropriate for the different vmctx representation in Wasmtime. The main difference is that Wasmtime's fuel counter is two levels of indirection away from the vmctx rather than one in Lucet. To help with this a new
Variable
stores theVMInterrupts
pointer value to avoid reloading the same value each time from the vmctx.Support for this feature is exposed through a few new APIs:
Config::consume_fuel
- enables codegen options for wasm to consume fuel, and behaves similar tointerruptable
.Store::set_fuel_remaining
- this is how fuel is injected into aStore
for execution of wasm. Note that stores always start with 0 fuel so this is required to be called.Store::fuel_consumed
- this can be used to check how much fuel has been consumed so far.The current behavior, which cannot be changed, is that when fuel runs out a wasm trap is generated. I hope to make this configurable in the future so that for async stores when fuel runs out it's automatically re-injected with fuel but only after a yield back to the host happens.
I've done a bit of benchmarking with this using criterion and the benchmarks here -- https://github.com/bytecodealliance/sightglass/tree/main/benchmarks-next. The benchmarks are relatively limited at this time but were able to produce some useful data in the meantime. This shows to be a 35-45% slowdown on my personal laptop for the runtime execution of the benchmarked porttion of the code for blake3-scalar and shootout-ackermann. At least for ackermann this is somewhat expected because the loops/function calls are all tiny, so the overhead is quite noticeable. For blake3-scalar I assume it's similar but haven't dug in yet. Note that these numbers were with the new backend since the old x86 backend seems significantly worse than the x64 one.
I do think there might be some relatively low-hanging fruit with respect to performance, but further tweaks would require changes to cranelift itself to optimize instruction selection. For example one optimization might be to not have a
fuel_var
and instead periodically doaddq $fuel_consumed, offset(%vminterrupts_ptr)
which avoids consuming extra registers. Similarlycmpq $0, offset(%vminterrupts_ptr)
could be generated as well. I couldn't get the x64 backend to emit those forms of instructions at this time though. I'm also not 100% certain that it'll be faster.Note that for now this doesn't depend on the
async
PR, but I plan on having a future PR after these two land which implements the periodically-yield option.
cfallin submitted PR Review.
cfallin created PR Review Comment:
IMHO it's nicer to not have the "fuel-used value that we return is only since the last set" property -- it has the potential to become a subtle stats bug later.
Doing some quick math, a 2^63 max count, at 1B Wasm ops per second, gives us 2^33 or 8B seconds of runtime before overflow, which is ~250 years. Sometime before the year 2270 we can come back and upgrade to an
i128
:-)
cfallin submitted PR Review.
cfallin created PR Review Comment:
Makes sense! Happy to see this go in as-is, then.
alexcrichton updated PR #2611 from fuel
to main
:
This PR lifts a feature from Lucet to wasmtime where generated code can count instructions or account for "fuel" during execution. The purpose of this feature is similar to the interrupt support via
InterruptHandle
, but it mainly allows deterministically interrupting a wasm module instead of relying on a timer. Additionally a future goal of this PR is to extend theasync
support of wasmtime to leverage fuel to periodically "interrupt" executing wasm code to yield back to the host. This would enable wasmtime futures to never take "too long" inFuture::poll
since if they would otherwise take awhile they'd yield back to the host and allow preemption and/or other things like future timeouts.Thee implementation here is nearly copied verbatim from Lucet itself, with tweaks as appropriate for the different vmctx representation in Wasmtime. The main difference is that Wasmtime's fuel counter is two levels of indirection away from the vmctx rather than one in Lucet. To help with this a new
Variable
stores theVMInterrupts
pointer value to avoid reloading the same value each time from the vmctx.Support for this feature is exposed through a few new APIs:
Config::consume_fuel
- enables codegen options for wasm to consume fuel, and behaves similar tointerruptable
.Store::set_fuel_remaining
- this is how fuel is injected into aStore
for execution of wasm. Note that stores always start with 0 fuel so this is required to be called.Store::fuel_consumed
- this can be used to check how much fuel has been consumed so far.The current behavior, which cannot be changed, is that when fuel runs out a wasm trap is generated. I hope to make this configurable in the future so that for async stores when fuel runs out it's automatically re-injected with fuel but only after a yield back to the host happens.
I've done a bit of benchmarking with this using criterion and the benchmarks here -- https://github.com/bytecodealliance/sightglass/tree/main/benchmarks-next. The benchmarks are relatively limited at this time but were able to produce some useful data in the meantime. This shows to be a 35-45% slowdown on my personal laptop for the runtime execution of the benchmarked porttion of the code for blake3-scalar and shootout-ackermann. At least for ackermann this is somewhat expected because the loops/function calls are all tiny, so the overhead is quite noticeable. For blake3-scalar I assume it's similar but haven't dug in yet. Note that these numbers were with the new backend since the old x86 backend seems significantly worse than the x64 one.
I do think there might be some relatively low-hanging fruit with respect to performance, but further tweaks would require changes to cranelift itself to optimize instruction selection. For example one optimization might be to not have a
fuel_var
and instead periodically doaddq $fuel_consumed, offset(%vminterrupts_ptr)
which avoids consuming extra registers. Similarlycmpq $0, offset(%vminterrupts_ptr)
could be generated as well. I couldn't get the x64 backend to emit those forms of instructions at this time though. I'm also not 100% certain that it'll be faster.Note that for now this doesn't depend on the
async
PR, but I plan on having a future PR after these two land which implements the periodically-yield option.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Heh good point!
cfallin submitted PR Review.
alexcrichton updated PR #2611 from fuel
to main
:
This PR lifts a feature from Lucet to wasmtime where generated code can count instructions or account for "fuel" during execution. The purpose of this feature is similar to the interrupt support via
InterruptHandle
, but it mainly allows deterministically interrupting a wasm module instead of relying on a timer. Additionally a future goal of this PR is to extend theasync
support of wasmtime to leverage fuel to periodically "interrupt" executing wasm code to yield back to the host. This would enable wasmtime futures to never take "too long" inFuture::poll
since if they would otherwise take awhile they'd yield back to the host and allow preemption and/or other things like future timeouts.Thee implementation here is nearly copied verbatim from Lucet itself, with tweaks as appropriate for the different vmctx representation in Wasmtime. The main difference is that Wasmtime's fuel counter is two levels of indirection away from the vmctx rather than one in Lucet. To help with this a new
Variable
stores theVMInterrupts
pointer value to avoid reloading the same value each time from the vmctx.Support for this feature is exposed through a few new APIs:
Config::consume_fuel
- enables codegen options for wasm to consume fuel, and behaves similar tointerruptable
.Store::set_fuel_remaining
- this is how fuel is injected into aStore
for execution of wasm. Note that stores always start with 0 fuel so this is required to be called.Store::fuel_consumed
- this can be used to check how much fuel has been consumed so far.The current behavior, which cannot be changed, is that when fuel runs out a wasm trap is generated. I hope to make this configurable in the future so that for async stores when fuel runs out it's automatically re-injected with fuel but only after a yield back to the host happens.
I've done a bit of benchmarking with this using criterion and the benchmarks here -- https://github.com/bytecodealliance/sightglass/tree/main/benchmarks-next. The benchmarks are relatively limited at this time but were able to produce some useful data in the meantime. This shows to be a 35-45% slowdown on my personal laptop for the runtime execution of the benchmarked porttion of the code for blake3-scalar and shootout-ackermann. At least for ackermann this is somewhat expected because the loops/function calls are all tiny, so the overhead is quite noticeable. For blake3-scalar I assume it's similar but haven't dug in yet. Note that these numbers were with the new backend since the old x86 backend seems significantly worse than the x64 one.
I do think there might be some relatively low-hanging fruit with respect to performance, but further tweaks would require changes to cranelift itself to optimize instruction selection. For example one optimization might be to not have a
fuel_var
and instead periodically doaddq $fuel_consumed, offset(%vminterrupts_ptr)
which avoids consuming extra registers. Similarlycmpq $0, offset(%vminterrupts_ptr)
could be generated as well. I couldn't get the x64 backend to emit those forms of instructions at this time though. I'm also not 100% certain that it'll be faster.Note that for now this doesn't depend on the
async
PR, but I plan on having a future PR after these two land which implements the periodically-yield option.
alexcrichton merged PR #2611.
Last updated: Dec 23 2024 at 12:05 UTC