Stream: cranelift

Topic: Implementing Tagged Unions


view this post on Zulip David M (Apr 20 2025 at 14:18):

Hello! I'm creating a programming language using Cranelift JIT as the backend, and I'm currently implementing tagged unions (i.e. Rust's enum), which I'll call just "union" in this post, not to be confused with C-style unions. All values in my language are 64 bits (Int->i64, Float->f64, even the tag is 64 bits which is wasteful of course but it is simple/convenient).

When none of the union variants have any associated data/payload, that's easy, I make a CLIF variable with a type of I64 and store the tag value. When one (or more) of the variants has data of type Int, I create two CLIF variables (both I64), one for the tag and one for the data. Similarly, if one (or more) of the variants has data of type Float, two CLIF variables, I64 for the tag and F64 for the data.

I'm struggling with what to do when one variant has data of type Int and another variant has data of type Float - example code below. Create 3 variables, one for the tag and each possibility of int and float? Or stick with 2 variables (both I64) and use bitcast when I need to set or get a float value from the union?

// this is the motivating example from my language
// translated into Rust for clarity
enum Number {
    Int(i64),
    Float(f64),
}

fn make_number_from_float(f: f64) -> Number {
    Number::Float(f)
}

fn make_number_from_int(i: i64) -> Number {
    Number::Int(i)
}

fn make_number_randomly(i: i64, f: f64) -> Number {
    if random_bool() {
        Number::Int(i)
    } else {
        Number::Float(f)
    }
}

view this post on Zulip Chris Fallin (Apr 21 2025 at 05:02):

Either could be made to work, but if I were in your shoes I would definitely choose to use just one payload value (so two total, including the tag) -- the alternative (all options always exist and are carried everywhere) will significantly increase register pressure / storage overhead. (And consider what happens if you add more types to your tagged union / sum type! You want the same efficiency here that in-memory unions get by overlapping)

view this post on Zulip bjorn3 (Apr 21 2025 at 08:07):

The other option is to always store tagged unions in memory rather than in registers and then pass around pointers to them in registers. You will almost certainly want to support that anyway if you want complex structs bigger than a couple registers.

view this post on Zulip David M (Apr 21 2025 at 12:47):

Thanks!

Chris - Yeah limiting to a max of 2 variables makes sense in this case. I was planning to cap it at 3 variables at most (tag, float, int/int-like thing including pointers), and only in the case where the union has both a float and an int-like variant, but that still seems worse than doing just (Int, Int) with a bitcast for the float, since there's an extra variable always hanging out with a zero or possibly uninitialized value.

bjorn - I haven't added product types to my language yet (I will call them "records") but I'll definitely have to supporrt using stack slots for those (and heap allocation, eventually, when I write a GC :sweat_smile:). I'd probably split something like { x: Float, y: Float} into 2 variables to pass into registers but nothing larger than that. Actually, thinking about it more, I'd have to do this anyways even if records didn't exist in my language because I'm going to allow the variants of a tagged union to contain another tagged union, so those could be arbitrarily large anyways. Thanks!


Last updated: Dec 06 2025 at 07:03 UTC