Stream: cranelift

Topic: Calling functions without a safepoint


view this post on Zulip Will Robson (Apr 01 2021 at 11:17):

For my OCaml bytecode JIT I'm using the stackmap support for its GC. It works well but is overly conservative because it inserts a safepoint at every call. For things like the GC write barrier (part of the runtime in C) which gets called fairly often this can slow things down as it tends to spill and restore. The GC will only run during allocations which means a lot of the primitives from the runtime don't need to spill and emit a stackmap but currently do. Inlining the write barrier wouldn't work as it just pushes the issue down to the functions used in the write barrier itself.

I'd like to somehow have a way to do a call without a safepoint and think it would be a useful thing for the project. I am potentially able to do the work myself (although running out of time to give to this project). However, I'm not sure what the best design would be to represent it in the IR:

  1. Have separate call instructions like call_no_safepoint and call_indirect_no_safepoint
  2. Represent it in the signature either by
    a. Having special calling conventions (messy but less work)
    b. Adding another metadata field - perhaps could become something more general like the LLVM 'gc strategy'

  3. Something else?

For passing this to codegen I think for the old backend it's just a case of extending is_safepoint in inst_predicates.rs but the new backend (which I actually use) looks more involved. As far as I can tell it might be a case of making gen_call return InstIsSafepoint::Nofor each ABIMachineSpec impl but I haven't looked in to the effects doing this has downstream.

Which strategy would be best for adding this to the IR? Am I missing any complications in the implementation?

view this post on Zulip Chris Fallin (Apr 01 2021 at 15:42):

Greetings @Will Robson -- this is a very good question! You're correct that at the moment we simply decide that all callsites are safepoints; this was driven by the original application (in SpiderMonkey) where there was not a guarantee that a hostcall would not GC, in general.

I do like the idea of an ABI dimension that is "safepoint call" or "no-safepoint call" (your option 2b); this feels like the cleanest approach. You're correct that this basically means returning a different flag from gen_call; the rest should Just Work.

I'm happy to review a PR for this if you want to try your hand at it! (I won't be able to get to it for a while otherwise, unfortunately)

view this post on Zulip fitzgen (he/him) (Apr 01 2021 at 21:36):

interesting, do your gc barriers never trigger gc? for something like a generational barrier, usually you would have something like

fn generational_barrier(obj) {
    if remembered_set.is_full() {
        gc();
    } else {
        remembered_set.insert(obj);
    }
}

view this post on Zulip Will Robson (Apr 02 2021 at 16:51):

This is from the existing OCaml GC. I'm not that familiar with how it's implemented (beyond the interface the JIT needs to care about) but this is what it says in the source:

/* You must use [caml_modify] to change a field of an existing shared block,
   unless you are sure the value being overwritten is not a shared block and
   the value being written is not a young block. */
/* [caml_modify] never calls the GC. */
/* [caml_modify] can also be used to do assignment on data structures that are
   in the minor heap instead of in the major heap.  In this case, it
   is a bit slower than simple assignment.
   In particular, you can use [caml_modify] when you don't know whether the
   block being changed is in the minor heap or the major heap. */
/* PR#6084 workaround: define it as a weak symbol */

CAMLexport CAMLweakdef void caml_modify (value *fp, value val)
{
  /* The write barrier implemented by [caml_modify] checks for the
     following two conditions and takes appropriate action:
     1- a pointer from the major heap to the minor heap is created
        --> add [fp] to the remembered set
     2- a pointer from the major heap to the major heap is overwritten,
        while the GC is in the marking phase
        --> call [caml_darken] on the overwritten pointer so that the
            major GC treats it as an additional root.

Allocation in the minor heap is a simple pointer bump unless there's not enough space. This is almost the only condition causing the GC to trigger. I think the basic heuristic is as it's a functional language where you're creating new values all the time rather than mutating, most things in the minor heap are garbage. Most GC stop-the-worlds just touch the minor heap.

view this post on Zulip Will Robson (Apr 02 2021 at 16:53):

this is the actual implementation:

  value old;

  if (Is_young((value)fp)) {
    /* The modified object resides in the minor heap.
       Conditions 1 and 2 cannot occur. */
    *fp = val;
  } else {
    /* The modified object resides in the major heap. */
    CAMLassert(Is_in_heap(fp));
    old = *fp;
    *fp = val;
    if (Is_block(old)) {
      /* If [old] is a pointer within the minor heap, we already
         have a major->minor pointer and [fp] is already in the
         remembered set.  Conditions 1 and 2 cannot occur. */
      if (Is_young(old)) return;
      /* Here, [old] can be a pointer within the major heap.
         Check for condition 2. */
      if (caml_gc_phase == Phase_mark) caml_darken(old, NULL);
    }
    /* Check for condition 1. */
    if (Is_block(val) && Is_young(val)) {
      add_to_ref_table (Caml_state->ref_table, fp);
    }
  }

Last updated: Jan 24 2025 at 00:11 UTC