fitzgen opened Issue #1749:
Right now, to check if for interrupts:
- we load the interrupts pointer from the vmctx
- we dereference it to get the maybe-interrupted value
- we compare that against the interrupt-has-been-requested value
- and finally we conditionally trap when the comparison returned true.
There is no fast/slow path split here. Interrupts happen rarely, but we always perform those four steps.
By using virtual memory tricks, we can create a fast path for the common case when no interrupts are requested. We reserve a page of memory as the "interrupt page" and point to it from the vmctx. This replaces the current interrupt pointer on the vmctx. When interrupts are not requested, this page is readable. When an interrupt is requested, remove the readable bit via mprotect, and wait.
Now, all that our loop headers do is:
- load the pointer to the interrupts page from the vmctx
- (attempt to) read the first byte from the interrupts page
When the interrupts page is readable and an interrupt is not requested, we just have those two loads as our fast path.
When the interrupts page is not readable because an interrupt is requested, a signal is generated, so our signal handler needs to recognize+handle this case.
IIRC, essentially this same trick is used in some JVMs for synchronizing at safepoints for stop-the-world GC phases (e.g. root marking).
The one open question is how to detect stack overflows with this setup, since our interrupt handling and stack overflow code is very intertwined. Not totally sure here.
+cc @alexcrichton
fitzgen labeled Issue #1749:
Right now, to check if for interrupts:
- we load the interrupts pointer from the vmctx
- we dereference it to get the maybe-interrupted value
- we compare that against the interrupt-has-been-requested value
- and finally we conditionally trap when the comparison returned true.
There is no fast/slow path split here. Interrupts happen rarely, but we always perform those four steps.
By using virtual memory tricks, we can create a fast path for the common case when no interrupts are requested. We reserve a page of memory as the "interrupt page" and point to it from the vmctx. This replaces the current interrupt pointer on the vmctx. When interrupts are not requested, this page is readable. When an interrupt is requested, remove the readable bit via mprotect, and wait.
Now, all that our loop headers do is:
- load the pointer to the interrupts page from the vmctx
- (attempt to) read the first byte from the interrupts page
When the interrupts page is readable and an interrupt is not requested, we just have those two loads as our fast path.
When the interrupts page is not readable because an interrupt is requested, a signal is generated, so our signal handler needs to recognize+handle this case.
IIRC, essentially this same trick is used in some JVMs for synchronizing at safepoints for stop-the-world GC phases (e.g. root marking).
The one open question is how to detect stack overflows with this setup, since our interrupt handling and stack overflow code is very intertwined. Not totally sure here.
+cc @alexcrichton
fitzgen labeled Issue #1749:
Right now, to check if for interrupts:
- we load the interrupts pointer from the vmctx
- we dereference it to get the maybe-interrupted value
- we compare that against the interrupt-has-been-requested value
- and finally we conditionally trap when the comparison returned true.
There is no fast/slow path split here. Interrupts happen rarely, but we always perform those four steps.
By using virtual memory tricks, we can create a fast path for the common case when no interrupts are requested. We reserve a page of memory as the "interrupt page" and point to it from the vmctx. This replaces the current interrupt pointer on the vmctx. When interrupts are not requested, this page is readable. When an interrupt is requested, remove the readable bit via mprotect, and wait.
Now, all that our loop headers do is:
- load the pointer to the interrupts page from the vmctx
- (attempt to) read the first byte from the interrupts page
When the interrupts page is readable and an interrupt is not requested, we just have those two loads as our fast path.
When the interrupts page is not readable because an interrupt is requested, a signal is generated, so our signal handler needs to recognize+handle this case.
IIRC, essentially this same trick is used in some JVMs for synchronizing at safepoints for stop-the-world GC phases (e.g. root marking).
The one open question is how to detect stack overflows with this setup, since our interrupt handling and stack overflow code is very intertwined. Not totally sure here.
+cc @alexcrichton
fitzgen edited Issue #1749:
Right now, to check if for interrupts:
- we load the interrupts pointer from the vmctx
- we dereference it to get the maybe-interrupted value
- we compare that against the interrupt-has-been-requested value
- and finally we conditionally trap when the comparison returned true.
There is no fast/slow path split here. Interrupts happen rarely, but we always perform those four steps.
By using virtual memory tricks, we can create a fast path for the common case when no interrupts are requested. We reserve a page of memory as the "interrupt page" and point to it from the vmctx. This replaces the current interrupt pointer on the vmctx. When interrupts are not requested, this page is readable. When an interrupt is requested, remove the readable bit via mprotect, and wait.
Now, all that our loop headers do is:
- load the pointer to the interrupts page from the vmctx
- (attempt to) read the first byte from the interrupts page
When the interrupts page is readable and an interrupt is not requested, we just have those two loads as our fast path.
When the interrupts page is not readable because an interrupt is requested, a signal is generated, so our signal handler needs to recognize+handle this case.
IIRC, essentially this same trick is used in some JVMs for synchronizing at safepoints for stop-the-world GC phases (e.g. root marking).
The one open question is how to detect stack overflows with this setup, since our interrupt handling and stack overflow code is very intertwined. Not totally sure here.
+cc @alexcrichton
alexcrichton commented on Issue #1749:
Seems like a great idea! One thing that would be good to measure before committing to this is the overhead of the current strategy to see how much of an improvement this trick is. That way if it's like a 20% speedup even if we can't figure it out for stack checks it seems worthwhile.
For stack checks I think they'll fundamentally always need to have a comparison of some kind, but we could still have stack checks and interrupt checks in loops check the same memory location. We could basically "malloc a word for the stack check limit" by allocating a page from the OS, and then on interrupt we unmap it.
Last updated: Dec 23 2024 at 12:05 UTC