Stream: git-wasmtime

Topic: wasmtime / issue #3927 Fold `setjmp` into Cranelift-gener...


view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2022 at 16:15):

alexcrichton opened issue #3927:

This issues comes out of a discussion that @lukewagner, @fitzgen, and I were having recently. We were thinking again about how Wasmtime implements calls into WebAssembly and about some of the overhead associated with that. Currently it's suprisingly expensive relative to wasm->host transitions, where host->wasm is on the order of 20-30ns where wasm->host is on the order of 3-5ns.

One of the major costs of entering WebAssembly is that we have to call setjmp. Not only is setjmp complicated since it's platform-specific but as seen there it's also written in C. We can't call setjmp from Rust (since it "returns twice" and the Rust compiler doesn't inform LLVM of that, meaning optimizations could go awry) which means entering WebAssembly is even further de-optimized because all arguments must pass through the stack. This closure captures all arguments into WebAssembly and is forced to be on the stack as we pass a single pointer to C which is called back.

Another further complication with this current strategy of entering WebAssembly is that in a future world with the wasm exceptions proposal whatever is chosen to implement exceptions at the cranelift level is highly unlikely to be exposed in the full fidelity required to native stable Rust, meaning that we couldn't actually write a "catch" block in Rust (and probably not C).

To solve all these issues, @lukewagner mentioned we could do something like SpiderMonkey which is to have specialized entry trampolines into WebAssembly code. Currently our trampolines are primarily just converting from a dynamic stack-based layout to a particular System-V ABI signature, which isn't really all that interesting. Instead, though, we could specifically have a trampoline that receives the appropriate arguments, sets up a "catch" frame, and then enters the desired WebAssembly code. This could have a number of benefits:

Another possible idea is that currently trampolines are one-per-function-signature which means that they always contain an indirect call to a target. Instead we could also explore a scheme where we have one-per-export which would enable the trampoline to statically call into the correct export (no indirect function call necessary) which is another route to possibly optimize this.


The implementation of setjmp/longjmp in Cranelift is likely to be pretty nontrivial for this which is why I wanted to open an issue on this and let it get some feedback before implementing. I also don't think that this is super pressing at this time to the point that we should implement it, but it's good to have in our back pocket if we run into issues with the overhead of host->wasm transitions. I'm not actually sure how we'd implement setjmp/longjmp in Cranelift (e.g. expose it and represent it in clif) myself. Implementation-wise we'd probably want to at least take inspiration if not scrutinize the SpiderMonkey implementation since we don't need a general setjmp/longjmp mechanism, only one that works for wasm traps.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2022 at 16:59):

lukewagner commented on issue #3927:

Agreed that you can get away with something much simpler than a general-purpose setjmp/longjmp implementation.

Just putting out some info and links for how SpiderMonkey does this in case anyone is interested later:

Ultimately, because the sp of the entry trampoline is restored to the same offset expected upon normal (non-exceptional) return, the entry trampoline doesn't need to do much specially: it simply places the data it needs for returning (exceptionally and normally) in the stack frame and branches right after the call into normal wasm code to detect the exceptional return.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2022 at 17:13):

cfallin commented on issue #3927:

At a high level, I like this direction. Two major points:

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2022 at 17:25):

bjorn3 commented on issue #3927:

Could we model setjmp as a terminator with two successors? The first successor is directly jumped to while the second successor is jumped to in case of a longjmp. Also I think setjmp needs to be marked as clobbers everything too. At least for the second successor.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 14 2022 at 17:30):

cfallin commented on issue #3927:

Err, sorry, yeah, I meant setjmp above where I wrote longjmp. The longjmp is just an unconditional branch (terminator with zero successors) as far as the CFG is concerned I think.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2022 at 20:11):

alexcrichton labeled issue #3927:

This issues comes out of a discussion that @lukewagner, @fitzgen, and I were having recently. We were thinking again about how Wasmtime implements calls into WebAssembly and about some of the overhead associated with that. Currently it's suprisingly expensive relative to wasm->host transitions, where host->wasm is on the order of 20-30ns where wasm->host is on the order of 3-5ns.

One of the major costs of entering WebAssembly is that we have to call setjmp. Not only is setjmp complicated since it's platform-specific but as seen there it's also written in C. We can't call setjmp from Rust (since it "returns twice" and the Rust compiler doesn't inform LLVM of that, meaning optimizations could go awry) which means entering WebAssembly is even further de-optimized because all arguments must pass through the stack. This closure captures all arguments into WebAssembly and is forced to be on the stack as we pass a single pointer to C which is called back.

Another further complication with this current strategy of entering WebAssembly is that in a future world with the wasm exceptions proposal whatever is chosen to implement exceptions at the cranelift level is highly unlikely to be exposed in the full fidelity required to native stable Rust, meaning that we couldn't actually write a "catch" block in Rust (and probably not C).

To solve all these issues, @lukewagner mentioned we could do something like SpiderMonkey which is to have specialized entry trampolines into WebAssembly code. Currently our trampolines are primarily just converting from a dynamic stack-based layout to a particular System-V ABI signature, which isn't really all that interesting. Instead, though, we could specifically have a trampoline that receives the appropriate arguments, sets up a "catch" frame, and then enters the desired WebAssembly code. This could have a number of benefits:

Another possible idea is that currently trampolines are one-per-function-signature which means that they always contain an indirect call to a target. Instead we could also explore a scheme where we have one-per-export which would enable the trampoline to statically call into the correct export (no indirect function call necessary) which is another route to possibly optimize this.


The implementation of setjmp/longjmp in Cranelift is likely to be pretty nontrivial for this which is why I wanted to open an issue on this and let it get some feedback before implementing. I also don't think that this is super pressing at this time to the point that we should implement it, but it's good to have in our back pocket if we run into issues with the overhead of host->wasm transitions. I'm not actually sure how we'd implement setjmp/longjmp in Cranelift (e.g. expose it and represent it in clif) myself. Implementation-wise we'd probably want to at least take inspiration if not scrutinize the SpiderMonkey implementation since we don't need a general setjmp/longjmp mechanism, only one that works for wasm traps.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2022 at 20:11):

alexcrichton labeled issue #3927:

This issues comes out of a discussion that @lukewagner, @fitzgen, and I were having recently. We were thinking again about how Wasmtime implements calls into WebAssembly and about some of the overhead associated with that. Currently it's suprisingly expensive relative to wasm->host transitions, where host->wasm is on the order of 20-30ns where wasm->host is on the order of 3-5ns.

One of the major costs of entering WebAssembly is that we have to call setjmp. Not only is setjmp complicated since it's platform-specific but as seen there it's also written in C. We can't call setjmp from Rust (since it "returns twice" and the Rust compiler doesn't inform LLVM of that, meaning optimizations could go awry) which means entering WebAssembly is even further de-optimized because all arguments must pass through the stack. This closure captures all arguments into WebAssembly and is forced to be on the stack as we pass a single pointer to C which is called back.

Another further complication with this current strategy of entering WebAssembly is that in a future world with the wasm exceptions proposal whatever is chosen to implement exceptions at the cranelift level is highly unlikely to be exposed in the full fidelity required to native stable Rust, meaning that we couldn't actually write a "catch" block in Rust (and probably not C).

To solve all these issues, @lukewagner mentioned we could do something like SpiderMonkey which is to have specialized entry trampolines into WebAssembly code. Currently our trampolines are primarily just converting from a dynamic stack-based layout to a particular System-V ABI signature, which isn't really all that interesting. Instead, though, we could specifically have a trampoline that receives the appropriate arguments, sets up a "catch" frame, and then enters the desired WebAssembly code. This could have a number of benefits:

Another possible idea is that currently trampolines are one-per-function-signature which means that they always contain an indirect call to a target. Instead we could also explore a scheme where we have one-per-export which would enable the trampoline to statically call into the correct export (no indirect function call necessary) which is another route to possibly optimize this.


The implementation of setjmp/longjmp in Cranelift is likely to be pretty nontrivial for this which is why I wanted to open an issue on this and let it get some feedback before implementing. I also don't think that this is super pressing at this time to the point that we should implement it, but it's good to have in our back pocket if we run into issues with the overhead of host->wasm transitions. I'm not actually sure how we'd implement setjmp/longjmp in Cranelift (e.g. expose it and represent it in clif) myself. Implementation-wise we'd probably want to at least take inspiration if not scrutinize the SpiderMonkey implementation since we don't need a general setjmp/longjmp mechanism, only one that works for wasm traps.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2022 at 20:11):

alexcrichton labeled issue #3927:

This issues comes out of a discussion that @lukewagner, @fitzgen, and I were having recently. We were thinking again about how Wasmtime implements calls into WebAssembly and about some of the overhead associated with that. Currently it's suprisingly expensive relative to wasm->host transitions, where host->wasm is on the order of 20-30ns where wasm->host is on the order of 3-5ns.

One of the major costs of entering WebAssembly is that we have to call setjmp. Not only is setjmp complicated since it's platform-specific but as seen there it's also written in C. We can't call setjmp from Rust (since it "returns twice" and the Rust compiler doesn't inform LLVM of that, meaning optimizations could go awry) which means entering WebAssembly is even further de-optimized because all arguments must pass through the stack. This closure captures all arguments into WebAssembly and is forced to be on the stack as we pass a single pointer to C which is called back.

Another further complication with this current strategy of entering WebAssembly is that in a future world with the wasm exceptions proposal whatever is chosen to implement exceptions at the cranelift level is highly unlikely to be exposed in the full fidelity required to native stable Rust, meaning that we couldn't actually write a "catch" block in Rust (and probably not C).

To solve all these issues, @lukewagner mentioned we could do something like SpiderMonkey which is to have specialized entry trampolines into WebAssembly code. Currently our trampolines are primarily just converting from a dynamic stack-based layout to a particular System-V ABI signature, which isn't really all that interesting. Instead, though, we could specifically have a trampoline that receives the appropriate arguments, sets up a "catch" frame, and then enters the desired WebAssembly code. This could have a number of benefits:

Another possible idea is that currently trampolines are one-per-function-signature which means that they always contain an indirect call to a target. Instead we could also explore a scheme where we have one-per-export which would enable the trampoline to statically call into the correct export (no indirect function call necessary) which is another route to possibly optimize this.


The implementation of setjmp/longjmp in Cranelift is likely to be pretty nontrivial for this which is why I wanted to open an issue on this and let it get some feedback before implementing. I also don't think that this is super pressing at this time to the point that we should implement it, but it's good to have in our back pocket if we run into issues with the overhead of host->wasm transitions. I'm not actually sure how we'd implement setjmp/longjmp in Cranelift (e.g. expose it and represent it in clif) myself. Implementation-wise we'd probably want to at least take inspiration if not scrutinize the SpiderMonkey implementation since we don't need a general setjmp/longjmp mechanism, only one that works for wasm traps.


Last updated: Dec 23 2024 at 13:07 UTC