For my OCaml bytecode JIT I'm using the stackmap support for its GC. It works well but is overly conservative because it inserts a safepoint at every call. For things like the GC write barrier (part of the runtime in C) which gets called fairly often this can slow things down as it tends to spill and restore. The GC will only run during allocations which means a lot of the primitives from the runtime don't need to spill and emit a stackmap but currently do. Inlining the write barrier wouldn't work as it just pushes the issue down to the functions used in the write barrier itself.
I'd like to somehow have a way to do a call without a safepoint and think it would be a useful thing for the project. I am potentially able to do the work myself (although running out of time to give to this project). However, I'm not sure what the best design would be to represent it in the IR:
call_no_safepoint
and call_indirect_no_safepoint
Represent it in the signature either by
a. Having special calling conventions (messy but less work)
b. Adding another metadata field - perhaps could become something more general like the LLVM 'gc strategy'
Something else?
For passing this to codegen I think for the old backend it's just a case of extending is_safepoint
in inst_predicates.rs
but the new backend (which I actually use) looks more involved. As far as I can tell it might be a case of making gen_call
return InstIsSafepoint::No
for each ABIMachineSpec
impl but I haven't looked in to the effects doing this has downstream.
Which strategy would be best for adding this to the IR? Am I missing any complications in the implementation?
Greetings @Will Robson -- this is a very good question! You're correct that at the moment we simply decide that all callsites are safepoints; this was driven by the original application (in SpiderMonkey) where there was not a guarantee that a hostcall would not GC, in general.
I do like the idea of an ABI dimension that is "safepoint call" or "no-safepoint call" (your option 2b); this feels like the cleanest approach. You're correct that this basically means returning a different flag from gen_call
; the rest should Just Work.
I'm happy to review a PR for this if you want to try your hand at it! (I won't be able to get to it for a while otherwise, unfortunately)
interesting, do your gc barriers never trigger gc? for something like a generational barrier, usually you would have something like
fn generational_barrier(obj) {
if remembered_set.is_full() {
gc();
} else {
remembered_set.insert(obj);
}
}
This is from the existing OCaml GC. I'm not that familiar with how it's implemented (beyond the interface the JIT needs to care about) but this is what it says in the source:
/* You must use [caml_modify] to change a field of an existing shared block,
unless you are sure the value being overwritten is not a shared block and
the value being written is not a young block. */
/* [caml_modify] never calls the GC. */
/* [caml_modify] can also be used to do assignment on data structures that are
in the minor heap instead of in the major heap. In this case, it
is a bit slower than simple assignment.
In particular, you can use [caml_modify] when you don't know whether the
block being changed is in the minor heap or the major heap. */
/* PR#6084 workaround: define it as a weak symbol */
CAMLexport CAMLweakdef void caml_modify (value *fp, value val)
{
/* The write barrier implemented by [caml_modify] checks for the
following two conditions and takes appropriate action:
1- a pointer from the major heap to the minor heap is created
--> add [fp] to the remembered set
2- a pointer from the major heap to the major heap is overwritten,
while the GC is in the marking phase
--> call [caml_darken] on the overwritten pointer so that the
major GC treats it as an additional root.
Allocation in the minor heap is a simple pointer bump unless there's not enough space. This is almost the only condition causing the GC to trigger. I think the basic heuristic is as it's a functional language where you're creating new values all the time rather than mutating, most things in the minor heap are garbage. Most GC stop-the-worlds just touch the minor heap.
this is the actual implementation:
value old;
if (Is_young((value)fp)) {
/* The modified object resides in the minor heap.
Conditions 1 and 2 cannot occur. */
*fp = val;
} else {
/* The modified object resides in the major heap. */
CAMLassert(Is_in_heap(fp));
old = *fp;
*fp = val;
if (Is_block(old)) {
/* If [old] is a pointer within the minor heap, we already
have a major->minor pointer and [fp] is already in the
remembered set. Conditions 1 and 2 cannot occur. */
if (Is_young(old)) return;
/* Here, [old] can be a pointer within the major heap.
Check for condition 2. */
if (caml_gc_phase == Phase_mark) caml_darken(old, NULL);
}
/* Check for condition 1. */
if (Is_block(val) && Is_young(val)) {
add_to_ref_table (Caml_state->ref_table, fp);
}
}
Last updated: Nov 22 2024 at 16:03 UTC