Stream: cranelift

Topic: Help with Stack Spilling for Garbage Collection


view this post on Zulip Alec Davis (Jun 30 2025 at 03:43):

Hello,
I am working on a garbage collected language like Java and I am running into some issues when trying to run my GC on generated code. For the most part, the stack maps generated work really well until recently. For some reason, one of the function calls in my language doesn't seem to have a stack map for it and I am not sure why.
I am trying to figure out if my issue is just user error on my part.

The way that function calls work in my language is that they are fetched based on a few values that tell the runtime where to find the required function. If the function hasn't been seen before, then it gets JIT-ed where we store the IP locations as keys to lists of SP offsets. This process allows me to run a check for GC before calling functions.

Here is the code in my language that I am trying to run that is giving me trouble:

class Main {

    fn main(args: [String]) {
        let printer: Printer = new Printer();
        let n: u64 = 5
        let result: u64 = Main::fib(n); // stack spills correctly here
        printer.println-int(result);
        Main::thrash() // Create a bunch of garbage to collect
        let arr: [u64] = [1, 2, 3, 4];
        while True {
            Main::from-array(arr); // For some reason we have stack spilling around this call but not at this call
        }
    }

    fn fib(n: u64) -> u64 {
        if n < 2 {
            return n;
        }
        return Main::fib(n - 1) + Main::fib(n - 2);
    }

    fn from-array(arr: [u64]) {
        let x: u64 = 0;
    }

    fn thrash() {
        let i: u64 = 0;
        while i < 10 {
            let x: [u64] = new [u64; 1];
            i = i + 1;
        }
    }
}

I know that some of the stack maps are working because I ran the part of my code that collects references and it found a few of them but not all of them.

Any help is appreciated. If you need any other information like a disassembly, please let me know how to do such a thing, and I will try to provide it as fast as I can.

view this post on Zulip bjorn3 (Jun 30 2025 at 10:14):

What is the generated clif ir?

view this post on Zulip Alec Davis (Jun 30 2025 at 14:52):

How can I dump the generated clif ir?

view this post on Zulip bjorn3 (Jun 30 2025 at 15:16):

You can do println!("{func}"); right after you finalize the FunctionBuilder if your Function is stored in a variable called func.

view this post on Zulip fitzgen (he/him) (Jun 30 2025 at 16:01):

what would be even better would be the complete trace logging output, so we could see the CLIF before and after safepoint spilling. that would be something like

RUST_LOG=trace cargo run ...

if you are using env_logger

view this post on Zulip Alec Davis (Jul 01 2025 at 02:24):

Sorry for taking so long. But here is the resulting clif ir. The logs are quite long and too big for pastebin. Should I just put the logs in a separate reply or is there a particular section you want to see?

clif ir:

    function u0:0(i64, i64) system_v {
        ss0 = explicit_slot 8, align = 8
        ss1 = explicit_slot 8, align = 8
        sig0 = (i64) -> i64 system_v
        sig1 = (i64, i64, i64) -> i64 system_v
        sig2 = (i64) -> i8 system_v
        sig3 = (i64, i64) -> i64 system_v
        sig4 = (i64) -> i8 system_v
        sig5 = (i64, i64, i64, i64, i64) -> i64 system_v
        sig6 = (i64) -> i8 system_v
        sig7 = (i64, i64, i64) system_v
        sig8 = (i64) -> i8 system_v
        sig9 = (i64, i64, i64) -> i64 system_v
        sig10 = (i64) -> i8 system_v
        sig11 = (i64) system_v
        sig12 = (i64) -> i8 system_v
        sig13 = (i64) -> i64 system_v
        sig14 = (i64, i64, i64) system_v
        sig15 = (i64, i64, i64, i64) system_v
        sig16 = (i64) -> i8 system_v
        sig17 = (i64, i64, i64, i64) system_v
        sig18 = (i64) -> i8 system_v
        sig19 = (i64, i64, i64, i64) system_v
        sig20 = (i64) -> i8 system_v
        sig21 = (i64, i64, i64, i64) system_v
        sig22 = (i64) -> i8 system_v
        sig23 = (i64, i64, i64) -> i64 system_v
        sig24 = (i64) -> i8 system_v
        sig25 = (i64, i64) system_v
        sig26 = (i64) -> i8 system_v
        sig27 = (i64, i64, i64, i64, i64) -> i64 system_v
        sig28 = (i64) -> i8 system_v
        sig29 = (i64, i64, i64) system_v
        sig30 = (i64) -> i8 system_v
        sig31 = (i64, i64, i64, i64, i64) -> i64 system_v
        sig32 = (i64) -> i8 system_v
        sig33 = (i64, i64, i64) system_v
        sig34 = (i64) -> i8 system_v
        sig35 = (i64) system_v
        fn0 = u0:8 sig0
        fn1 = u0:9 sig1
        fn2 = u0:6 sig2
        fn3 = u0:6 sig4
        fn4 = u0:10 sig5
        fn5 = u0:6 sig6
        fn6 = u0:6 sig8
        fn7 = u0:9 sig9
        fn8 = u0:6 sig10
        fn9 = u0:6 sig12
        fn10 = u0:8 sig13
        fn11 = u0:11 sig14
        fn12 = u0:12 sig15
        fn13 = u0:6 sig16
        fn14 = u0:12 sig17
        fn15 = u0:6 sig18
        fn16 = u0:12 sig19
        fn17 = u0:6 sig20
        fn18 = u0:12 sig21
        fn19 = u0:6 sig22
        fn20 = u0:9 sig23
        fn21 = u0:6 sig24
        fn22 = u0:6 sig26
        fn23 = u0:10 sig27
        fn24 = u0:6 sig28
        fn25 = u0:6 sig30
        fn26 = u0:10 sig31
        fn27 = u0:6 sig32
        fn28 = u0:6 sig34
        fn29 = u0:7 sig35

    block0(v0: i64, v1: i64):
        jump block1

    block1:
        v2 = iconst.i64 35
        v3 = call fn0(v2)  ; v2 = 35
        v99 = stack_addr.i64 ss1
        store notrap v3, v99
        v4 = iconst.i64 5
        v5 = iconst.i64 66
        v6 = iconst.i64 68
        v8 = call fn1(v0, v5, v6), stack_map=[i64 @ ss1+0]  ; v5 = 66, v6 = 68
        v9 = call fn2(v0), stack_map=[i64 @ ss1+0]
        brif v9, block2, block3

    block2:
        return

    block3:
        v11 = call_indirect.i64 sig3, v8(v0, v4), stack_map=[i64 @ ss1+0]  ; v4 = 5
        v13 = call fn3(v0), stack_map=[i64 @ ss1+0]
        brif v13, block4, block5

    block4:
        return

    block5:
        v16 = iconst.i64 35
        v17 = iconst.i64 -1
        v18 = iconst.i64 36
        v20 = call fn4(v0, v3, v16, v17, v18), stack_map=[i64 @ ss1+0]  ; v16 = 35, v17 = -1, v18 = 36
        v21 = call fn5(v0), stack_map=[i64 @ ss1+0]
        brif v21, block6, block7

    block6:
        return

    block7:
        call_indirect.i64 sig7, v20(v0, v3, v11), stack_map=[i64 @ ss1+0]
        v24 = call fn6(v0), stack_map=[i64 @ ss1+0]
        brif v24, block8, block9

    block8:
        return

    block9:
        v25 = iconst.i64 66
        v26 = iconst.i64 70
        v28 = call fn7(v0, v25, v26), stack_map=[i64 @ ss1+0]  ; v25 = 66, v26 = 70
        v29 = call fn8(v0), stack_map=[i64 @ ss1+0]
        brif v29, block10, block11

    block10:
        return

    block11:
        call_indirect.i64 sig11, v28(v0), stack_map=[i64 @ ss1+0]
        v32 = call fn9(v0), stack_map=[i64 @ ss1+0]
        brif v32, block12, block13

    block12:
        return

    block13:
        v33 = iconst.i64 4
        v34 = iconst.i64 19
        v35 = call fn10(v34), stack_map=[i64 @ ss1+0]  ; v34 = 19
        v100 = stack_addr.i64 ss0
        store notrap v35, v100
        v101 = stack_addr.i64 ss0
        v98 = load.i64 notrap v101
        call fn11(v0, v98, v33), stack_map=[i64 @ ss1+0, i64 @ ss0+0]  ; v33 = 4
        v37 = iconst.i64 0
        v38 = iconst.i64 1
        v102 = stack_addr.i64 ss0
        v97 = load.i64 notrap v102
        call fn12(v0, v97, v37, v38), stack_map=[i64 @ ss1+0, i64 @ ss0+0]  ; v37 = 0, v38 = 1
        v39 = call fn13(v0), stack_map=[i64 @ ss1+0, i64 @ ss0+0]
        brif v39, block14, block15

    block14:
        return

    block15:
        v40 = iconst.i64 1
        v41 = iconst.i64 2
        v103 = stack_addr.i64 ss0
        v96 = load.i64 notrap v103
        call fn14(v0, v96, v40, v41), stack_map=[i64 @ ss1+0, i64 @ ss0+0]  ; v40 = 1, v41 = 2
        v43 = call fn15(v0), stack_map=[i64 @ ss1+0, i64 @ ss0+0]
        brif v43, block16, block17

    block16:
        return

    block17:
        v44 = iconst.i64 2
        v45 = iconst.i64 3
        v104 = stack_addr.i64 ss0
        v95 = load.i64 notrap v104
        call fn16(v0, v95, v44, v45), stack_map=[i64 @ ss1+0, i64 @ ss0+0]  ; v44 = 2, v45 = 3
        v47 = call fn17(v0), stack_map=[i64 @ ss1+0, i64 @ ss0+0]
        brif v47, block18, block19

    block18:
        return

    block19:
        v48 = iconst.i64 3
        v49 = iconst.i64 4
        v105 = stack_addr.i64 ss0
        v94 = load.i64 notrap v105
        call fn18(v0, v94, v48, v49), stack_map=[i64 @ ss1+0, i64 @ ss0+0]  ; v48 = 3, v49 = 4
        v51 = call fn19(v0), stack_map=[i64 @ ss1+0, i64 @ ss0+0]
        brif v51, block20, block21

    block20:
        return

    block21:
        v106 = stack_addr.i64 ss0
        v93 = load.i64 notrap v106
        jump block22

    block22:
        v52 = iconst.i8 1
        brif v52, block23, block24  ; v52 = 1

    block23:
        v54 = iconst.i64 66
        v55 = iconst.i64 69
        v57 = call fn20(v0, v54, v55)  ; v54 = 66, v55 = 69
        v58 = call fn21(v0)
        brif v58, block26, block27

    block26:
        return

    block27:
        call_indirect.i64 sig25, v57(v0, v93)
        v61 = call fn22(v0)
        brif v61, block28, block29

    block28:
        return

    block29:
        v64 = iconst.i64 35
        v65 = iconst.i64 -1
        v66 = iconst.i64 36
        v68 = call fn23(v0, v3, v64, v65, v66)  ; v64 = 35, v65 = -1, v66 = 36
        v69 = call fn24(v0)
        brif v69, block30, block31

    block30:
        return

    block31:
        call_indirect.i64 sig29, v68(v0, v3, v11)
        v72 = call fn25(v0)
        brif v72, block32, block33

    block32:
        return

    block33:
        jump block22

    block24:
        v75 = iconst.i64 35
        v76 = iconst.i64 -1
        v77 = iconst.i64 36
        v79 = call fn26(v0, v3, v75, v76, v77)  ; v75 = 35, v76 = -1, v77 = 36
        v80 = call fn27(v0)
        brif v80, block34, block35

    block34:
        return

    block35:
        call_indirect.i64 sig33, v79(v0, v3, v11)
        v83 = call fn28(v0)
        brif v83, block36, block37

    block36:
        return

    block37:
        call fn29(v0)
        return
    }

view this post on Zulip bjorn3 (Jul 01 2025 at 13:56):

It seems like the clif ir you generate contains a couple more calls than the source you are compiling. Can you get some debug information about which fn* corresponds with a call to what function? You could do a println!() at the point you are emitting the could I would guess.

view this post on Zulip bjorn3 (Jul 01 2025 at 13:56):

Several of the calls don't have stack maps at least.

view this post on Zulip bjorn3 (Jul 01 2025 at 13:56):

Also is the source available somewhere?

view this post on Zulip Alec Davis (Jul 01 2025 at 14:55):

Yeah, I should specify a few details. I'll put the log in a separate reply due to its length.

Basically when you call, you fetch the function and then call it. After calling a function, we test if the context parameter holds an exception (another function call). There are also a few operations that are done through function calls like creating an object, creating an array (one for each size + object + floats)

You can find the source code here, it is not the greatest thing in the world but it could help you. Right now many design ideas have changed and I haven't documented them yet.

But to compile the test file I have you do cargo run --bin rowanc -- -s rowan-test-files rowan-test-files then to run the runtime, you will need to do it on a machine that supports libunwind (this will likely change) and do cargo run --bin rowan -- output/Main.class

A high level object oriented programming language for the 20s - Ki11erRabbit/rowan

view this post on Zulip Alec Davis (Jul 01 2025 at 14:58):

And here is the log output from before and after stack spilling.
https://pastebin.com/V01TyyDf

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

view this post on Zulip fitzgen (he/him) (Jul 01 2025 at 16:27):

@Alec Davis I feel like I'm not understanding exactly what your issue is, and what you expect to be happening but is not happening or whatever, so I think what would be really helpful would be:

I think providing this level of detail will help clarify things and help us help you -- thanks!

view this post on Zulip Alec Davis (Jul 02 2025 at 01:12):

I'll be sure to do that, thanks for the feedback. I am busy at work for most of the day, so it is hard to respond quickly while people are awake/available. I'll be sure to put the logs in a gist.

view this post on Zulip Alec Davis (Jul 02 2025 at 02:11):

I believe that this is the smallest example in my language (I compile to my own class file definition so it is a little hard to share). I'll annotate what I think is supposed to happen.

class Main {

    static thing: u64 = 66666666666;

    fn main(args: [String]) {
        let printer: Printer = new Printer();
        let n: u64 = 40;
        Main::thrash() // We create some garbage to collect
        while True {
            // Here to call println-int, we first fetch the method, but before trying that we first
            // try to acquire a lock, if we are able to acquire it, then that means the runtime isn't
            // collecting garbage. If we can't, then we run the routine to to collect all live rooted
            // memory on the stack which should only include the printer object.
            // However, for some reason, the call to fetch this particular method doesn't get a
            // stack spill but the calls around it do. If we did a print before this loop, it would also find it there
            printer.println-int(n);
        }
    }

    fn thrash() {
        let i: u64 = 0;
        while i < 10 {
            let x: [u64] = new [u64; 1];
            i = i + 1;
        }
    }
}

I haven't set up a decent test system yet so to test this example you can overwrite rowan-test-files/main.rowan with the contents above.

Then to run the compiler to generate the associated class file you provide 2 arguments, the path to the stdlib directory and a path to the directory with your project. In this case, we want them to be the same because we don't need the standard library.
So the command to be run will be this:

cargo run --bin rowanc -- -s rowan-test-files rowan-test-files

This will put the newly generated class file in a directory called output.

To then run the runtime with the class file (It should be noted that the runtime only supports x86_64 SystemV systems. I am working on changing this) you run this command:

cargo run --bin rowan -- output/Main.class

What will happen, is that the runtime will link the class with itself and call the main method in the Main class.

In the main method above, what should happen is that:

Garbage collection should run every 5 seconds. This is to test that it actually works.

However, what ends up happening is that in the loop, the call to fetch the print method doesn't get the stack spilled, and a stack map generated. This then causes the printer to get collected when it should still be in scope. This then causes a Rust panic because the object's pointer is null.

Then for completeness, here is the clif ir with where I think a stack map should exist but doesn't exist

function u0:0(i64, i64) system_v {
        sig0 = (i64) -> i64 system_v
        sig1 = (i64, i64, i64) -> i64 system_v
        sig2 = (i64) -> i8 system_v
        sig3 = (i64) system_v
        sig4 = (i64) -> i8 system_v
        sig5 = (i64, i64, i64, i64, i64) -> i64 system_v
        sig6 = (i64) -> i8 system_v
        sig7 = (i64, i64, i64) system_v
        sig8 = (i64) -> i8 system_v
        sig9 = (i64) system_v
        fn0 = u0:7 sig0
        fn1 = u0:8 sig1
        fn2 = u0:5 sig2
        fn3 = u0:5 sig4
        fn4 = u0:9 sig5
        fn5 = u0:5 sig6
        fn6 = u0:5 sig8
        fn7 = u0:6 sig9

    block0(v0: i64, v1: i64):
        v7 -> v0
        v11 -> v0
        jump block1

    block1:
        v2 = iconst.i64 35
        v3 = call fn0(v2)  ; v2 = 35
        v4 = iconst.i64 40
        v5 = iconst.i64 66
        v6 = iconst.i64 69
        v8 = call fn1(v7, v5, v6)  ; v5 = 66, v6 = 69
        v9 = call fn2(v7)
        brif v9, block2, block3(v8)

    block2:
        return

    block3(v10: i64):
        call_indirect.i64 sig3, v8(v7)
        v12 = call fn3(v11)
        brif v12, block4, block5

    block4:
        return

    block5:
        jump block6(v4, v3, v11)  ; v4 = 40

    block6(v26: i64, v28: i64, v30: i64):
        v14 -> v26
        v27 -> v26
        v15 -> v28
        v29 -> v28
        v19 -> v30
        v23 -> v30
        v25 -> v30
        v31 -> v30
        v13 = iconst.i8 1
        brif v13, block7, block8  ; v13 = 1

    block7:
        v16 = iconst.i64 35
        v17 = iconst.i64 -1
        v18 = iconst.i64 36
        v20 = call fn4(v19, v15, v16, v17, v18)  ; v16 = 35, v17 = -1, v18 = 36 // Right here we should have a stack map since we are fetching a method.
        v21 = call fn5(v19)
        brif v21, block10, block11(v20)

    block10:
        return

    block11(v22: i64):
        call_indirect.i64 sig7, v20(v19, v15, v14)
        v24 = call fn6(v23)
        brif v24, block12, block13

    block12:
        return

    block13:
        jump block6(v27, v29, v31)

    block8:
        call fn7(v25)
        return
    }

However, when looking at the clif ir block7, we don't have a stack map. In fact, the last stack map was in block where we call thrash.

    block7:
        v16 = iconst.i64 35
        v17 = iconst.i64 -1
        v18 = iconst.i64 36
        v20 = call fn4(v19, v15, v16, v17, v18)  ; v16 = 35, v17 = -1, v18 = 36 // There should be a stack map here but there isn't one.
        v21 = call fn5(v19)
        brif v21, block10, block11(v20)

You can see the more detailed logs in this gist

Hopefully, this was both enough and not too much information. I look forward to hearing your response.

rowan cranelift JIT logs. GitHub Gist: instantly share code, notes, and snippets.

view this post on Zulip Alec Davis (Jul 03 2025 at 14:58):

I wonder if using when using vars, I need to mark them again as requiring a stackmap. I tried marking the vars at one point as needing a stack map and that didn't change anything.

view this post on Zulip fitzgen (he/him) (Jul 03 2025 at 15:21):

Alec Davis said:

I wonder if using when using vars, I need to mark them again as requiring a stackmap. I tried marking the vars at one point as needing a stack map and that didn't change anything.

if you expect the values that are associated with those variables to be in stack maps, then yes, you need to mark the variables as needing stack maps

view this post on Zulip Alec Davis (Jul 03 2025 at 23:57):

The thing is, I tried that, and it didn't change anything, the same problem occurred. Would I also have to mark the value I get from using the variable as needing a stack map? That is something I haven't tried yet.

view this post on Zulip Alec Davis (Jul 04 2025 at 18:52):

So far, I have given every function call that should get a stack map a stack map but I am still missing some live memory.
I have set variables, and values as needing stack maps and I am still having issues with constructing a stack map. I wonder if when I convert the stack map into a data structure that my runtime can use I am doing something wrong.

I construct my own map that I can look up during the gc phase.

This is how I construct a hashmap where the key is the the instruction pointer and the value is a list of offsets from the stack pointer.

        let compiled_code = self.context.compiled_code().unwrap();
        let stack_maps = compiled_code.buffer.user_stack_maps();
        let mut object_locations = Vec::new();
        for (location, _, map) in stack_maps {
            let objects = map.entries()
                .map(|(_, offset)| offset)
                .collect::<Vec<_>>();
            object_locations.push((*location, objects));
        }
        let locations = object_locations;

        let code = module.get_finalized_function(*id) as *const ();
        let mut object_locations = HashMap::new();
        locations.into_iter()
            .for_each(|(offset, objects)| {
                object_locations.insert(offset as usize + code as usize, objects);
            });

view this post on Zulip Alec Davis (Jul 05 2025 at 23:14):

I think it was just user error on my part. Thank you, everyone for help.


Last updated: Dec 06 2025 at 07:03 UTC