wasmtime / Issue #1749 Use virtual memory tricks to make ... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / Issue #1749 Use virtual memory tricks to make ...

Wasmtime GitHub notifications bot (May 23 2020 at 21:05):

Right now, to check if for interrupts:

we load the interrupts pointer from the vmctx

we dereference it to get the maybe-interrupted value

we compare that against the interrupt-has-been-requested value

and finally we conditionally trap when the comparison returned true.

There is no fast/slow path split here. Interrupts happen rarely, but we always perform those four steps.

By using virtual memory tricks, we can create a fast path for the common case when no interrupts are requested. We reserve a page of memory as the "interrupt page" and point to it from the vmctx. This replaces the current interrupt pointer on the vmctx. When interrupts are not requested, this page is readable. When an interrupt is requested, remove the readable bit via mprotect, and wait.

Now, all that our loop headers do is:

load the pointer to the interrupts page from the vmctx

(attempt to) read the first byte from the interrupts page

When the interrupts page is readable and an interrupt is not requested, we just have those two loads as our fast path.

When the interrupts page is not readable because an interrupt is requested, a signal is generated, so our signal handler needs to recognize+handle this case.

IIRC, essentially this same trick is used in some JVMs for synchronizing at safepoints for stop-the-world GC phases (e.g. root marking).

The one open question is how to detect stack overflows with this setup, since our interrupt handling and stack overflow code is very intertwined. Not totally sure here.

+cc @alexcrichton

Wasmtime GitHub notifications bot (May 23 2020 at 21:05):

fitzgen labeled Issue #1749:

Right now, to check if for interrupts:

we load the interrupts pointer from the vmctx

we dereference it to get the maybe-interrupted value

we compare that against the interrupt-has-been-requested value

and finally we conditionally trap when the comparison returned true.

There is no fast/slow path split here. Interrupts happen rarely, but we always perform those four steps.

By using virtual memory tricks, we can create a fast path for the common case when no interrupts are requested. We reserve a page of memory as the "interrupt page" and point to it from the vmctx. This replaces the current interrupt pointer on the vmctx. When interrupts are not requested, this page is readable. When an interrupt is requested, remove the readable bit via mprotect, and wait.

Now, all that our loop headers do is:

load the pointer to the interrupts page from the vmctx

(attempt to) read the first byte from the interrupts page

When the interrupts page is readable and an interrupt is not requested, we just have those two loads as our fast path.

When the interrupts page is not readable because an interrupt is requested, a signal is generated, so our signal handler needs to recognize+handle this case.

IIRC, essentially this same trick is used in some JVMs for synchronizing at safepoints for stop-the-world GC phases (e.g. root marking).

The one open question is how to detect stack overflows with this setup, since our interrupt handling and stack overflow code is very intertwined. Not totally sure here.

+cc @alexcrichton

Wasmtime GitHub notifications bot (May 23 2020 at 21:05):

fitzgen labeled Issue #1749:

Right now, to check if for interrupts:

we load the interrupts pointer from the vmctx

we dereference it to get the maybe-interrupted value

we compare that against the interrupt-has-been-requested value

and finally we conditionally trap when the comparison returned true.

There is no fast/slow path split here. Interrupts happen rarely, but we always perform those four steps.

By using virtual memory tricks, we can create a fast path for the common case when no interrupts are requested. We reserve a page of memory as the "interrupt page" and point to it from the vmctx. This replaces the current interrupt pointer on the vmctx. When interrupts are not requested, this page is readable. When an interrupt is requested, remove the readable bit via mprotect, and wait.

Now, all that our loop headers do is:

load the pointer to the interrupts page from the vmctx

(attempt to) read the first byte from the interrupts page

When the interrupts page is readable and an interrupt is not requested, we just have those two loads as our fast path.

When the interrupts page is not readable because an interrupt is requested, a signal is generated, so our signal handler needs to recognize+handle this case.

IIRC, essentially this same trick is used in some JVMs for synchronizing at safepoints for stop-the-world GC phases (e.g. root marking).

The one open question is how to detect stack overflows with this setup, since our interrupt handling and stack overflow code is very intertwined. Not totally sure here.

+cc @alexcrichton

Wasmtime GitHub notifications bot (May 23 2020 at 21:05):

fitzgen edited Issue #1749:

Right now, to check if for interrupts:

we load the interrupts pointer from the vmctx

we dereference it to get the maybe-interrupted value

we compare that against the interrupt-has-been-requested value

and finally we conditionally trap when the comparison returned true.

There is no fast/slow path split here. Interrupts happen rarely, but we always perform those four steps.

By using virtual memory tricks, we can create a fast path for the common case when no interrupts are requested. We reserve a page of memory as the "interrupt page" and point to it from the vmctx. This replaces the current interrupt pointer on the vmctx. When interrupts are not requested, this page is readable. When an interrupt is requested, remove the readable bit via mprotect, and wait.

Now, all that our loop headers do is:

load the pointer to the interrupts page from the vmctx

(attempt to) read the first byte from the interrupts page

When the interrupts page is readable and an interrupt is not requested, we just have those two loads as our fast path.

When the interrupts page is not readable because an interrupt is requested, a signal is generated, so our signal handler needs to recognize+handle this case.

IIRC, essentially this same trick is used in some JVMs for synchronizing at safepoints for stop-the-world GC phases (e.g. root marking).

The one open question is how to detect stack overflows with this setup, since our interrupt handling and stack overflow code is very intertwined. Not totally sure here.

+cc @alexcrichton

Wasmtime GitHub notifications bot (May 26 2020 at 14:14):

alexcrichton commented on Issue #1749:

Seems like a great idea! One thing that would be good to measure before committing to this is the overhead of the current strategy to see how much of an improvement this trick is. That way if it's like a 20% speedup even if we can't figure it out for stack checks it seems worthwhile.

For stack checks I think they'll fundamentally always need to have a comparison of some kind, but we could still have stack checks and interrupt checks in loops check the same memory location. We could basically "malloc a word for the stack check limit" by allocating a page from the OS, and then on interrupt we unmap it.

Last updated: Apr 17 2025 at 07:03 UTC