Stream: cranelift

Topic: ✔ How to properly use ArgumentPurpose::StructArgument?


view this post on Zulip Cherry (Jun 17 2024 at 20:53):

Note: Using Windows x86_64

From my understanding, I can add this as a param with type I64 and ArgumentPurpose::StructArgument(size). I receive a pointer to the struct which I can read from to get the data.

I've tried doing this, but the struct I passed in is not coming out correctly unless I offset the ptr read by 16 bytes.

I know the structs size ahead of time. But I do not know of any way to properly read this. Either that or I'm doing this wrong and missing something.

The test case is quite simple however. I read straight off of the v0 I passed into another function.

defining function funcid1: function u0:1(i64 sarg(32), i64 sret) windows_fastcall {
    sig0 = (i64, i64, i64) windows_fastcall
    fn0 = u0:0 sig0

block0(v0: i64, v1: i64):
    stack_store v0, ss0
    v2 = iconst.i64 0x01af_9243_d8f0
    call fn0(v0, v2, v1)  ; v2 = 0x01af_9243_d8f0
    return
}

view this post on Zulip bjorn3 (Jun 17 2024 at 20:54):

What is the function declaration on the caller's side? Eg if you called it from C what is the function signature and struct definition.

view this post on Zulip bjorn3 (Jun 17 2024 at 20:56):

Also did you mean to pass v0 as struct or as pointer to fn0?

view this post on Zulip Cherry (Jun 17 2024 at 20:59):

bjorn3 said:

Also did you mean to pass v0 as struct or as pointer to fn0?

I meant to pass it as a ptr. My real code actually is storing sarg into an explicit stack, where I pass the stack addr into the function, and read the ptr out of it. But it appears to be the same effect either way.

fn0 has this signature:

pub extern "fastcall" fn __jit_cb(args: *const (), data: &Data, ret: *mut Ret)

My bad for not providing the code. My quick test case is this. I replaced ret with my JITted fn (funcid1) and I'm doing a test call using get_ret()

#[derive(Debug)]
#[repr(C)]
struct TestStruct {
    one: u64,
    two: u64,
    three: u64,
    four: u64,
}

#[inline(never)]
extern "fastcall" fn ret(data: TestStruct) -> TestStruct {
    println!("ret() got {data:?}");
    data
}

#[inline(never)]
extern "fastcall" fn get_ret() {
    let data = TestStruct {
        one: 22,
        two: 44,
        three: 55,
        four: 57,
    };

    let data = ret(data);
    println!("get_ret() got: {data:?}");
}

view this post on Zulip bjorn3 (Jun 17 2024 at 21:02):

What is the definition of the function you called from the jitted function?

view this post on Zulip Jacob Lifshay (Jun 17 2024 at 21:02):

on llvm, sret arguments generally have to be the first argument, cranelift is most likely the same. you have your sret argument as the second argument

view this post on Zulip bjorn3 (Jun 17 2024 at 21:03):

Right, of course. Cranelift indeed requires it to be first. Probably should add a verifier check for that.

view this post on Zulip Cherry (Jun 17 2024 at 21:04):

Jacob Lifshay said:

on llvm, sret arguments generally have to be the first argument, cranelift is most likely the same. you have your sret argument as the second argument

Thank you for this tip. I'll go fix my code to place it first and reply back after with the result.

view this post on Zulip Cherry (Jun 17 2024 at 21:25):

Ok, I have this now, but it seems I still have the same issue with needing to offset sarg by 16 to find the data (you can ignore the explicit stack. same effect happens even with passing and using the direct ptr)

defining function funcid1: function u0:1(i64 sret, i64 sarg(32)) windows_fastcall {
    ss0 = explicit_slot 8, align = 256
    sig0 = (i64, i64, i64) windows_fastcall
    fn0 = u0:0 sig0

block0(v0: i64, v1: i64):
    stack_store v1, ss0
    v2 = iconst.i64 0x023a_d589_0320
    v3 = stack_addr.i64 ss0
    call fn0(v3, v2, v0)  ; v2 = 0x023a_d589_0320
    return
}

view this post on Zulip bjorn3 (Jun 17 2024 at 21:50):

What is the definition of the function being called by the jitter function?

view this post on Zulip Cherry (Jun 17 2024 at 22:02):

bjorn3 said:

What is the definition of the function being called by the jitter function?

I've simplified the test case. Here is all the info:

I am passing a stack addr in, on which I stored sarg's ptr at offset 0. I read the ptr from the stack address passed in.

The JIT fn is calling this signature. stack address is first ptr args
pub extern "fastcall" fn __jit_cb(args: *const (), data: &Data, ret: *mut Ret) {
Inside the jit cb, I read the sarg from the stack addr
unsafe { *args.cast::<*const u8>() };
I then read bytes from this (32 in this test case) into a Vec with nonoverlapping copy, and finally transmute it back to TestStruct

let mut buffer = Vec::with_capacity(count);

    unsafe {
        ptr::copy_nonoverlapping(src, buffer.as_mut_ptr(), count);
    }

    unsafe {
        buffer.set_len(count);
    }

The ir is here

defining function funcid1: function u0:1(i64 sret, i64 sarg(32)) windows_fastcall {
    ss0 = explicit_slot 8, align = 256
    sig0 = (i64, i64, i64) windows_fastcall // <-- __jit_cb
    fn0 = u0:0 sig0 // <-- __jit_cb

block0(v0: i64, v1: i64):
    stack_store v1, ss0
    v2 = iconst.i64 0x01af_78a8_0610
    v3 = stack_addr.i64 ss0
    // *const (), data: &Data, ret: *mut Ret
    call fn0(v3, v2, v0)  ; v2 = 0x01af_78a8_0610
    return
}

After I generate the JIT fn, I set the resulting fn ptr on the static and call get_ret() which calls the JIT fn

#[derive(Debug)]
#[repr(C)]
struct TestStruct {
    one: u64,
    two: u64,
    three: u64,
    four: u64,
}

// funcid1 below
static TEST_FN: OnceLock<extern "fastcall" fn(TestStruct) -> TestStruct> = OnceLock::new();

#[inline(never)]
extern "fastcall" fn get_ret() {
    let data = TestStruct {
        one: 22,
        two: 44,
        three: 55,
        four: 57,
    };

    let data = TEST_FN.get().unwrap()(data);
    println!("get_ret() got: {data:?}");
}

My output is as follows. The TEST_FN is simply reading sargs bytes and returning them in sret, nothing fancy

get_ret() got: TestStruct { one: 0, two: 1853154559352, three: 22, four: 44 }

view this post on Zulip Cherry (Jun 18 2024 at 01:04):

I made a reduced test case here (run it on Win x64)
https://github.com/MolotovCherry/cranelift-misalignment

After making this as simple as I can and still seeing it, I have no idea what to do anymore to solve this. If this really is something I'm doing wrong, I hope someone can point it out

defining function funcid1: function u0:1(i64 sret, i64 sarg(32)) windows_fastcall {
    sig0 = (i64 sret, i64 sarg(32)) windows_fastcall
    fn0 = u0:0 sig0

block0(v0: i64, v1: i64):
    call fn0(v0, v1)
    return
}

// I passed in
//  TestStruct {
//     one: 22,
//     two: 44,
//     three: 55,
//     four: 57,
//  }
cb got TestStruct { one: 0, two: 22, three: 44, four: 55 }
get_ret got: TestStruct { one: 0, two: 22, three: 44, four: 55 }
Contribute to MolotovCherry/cranelift-misalignment development by creating an account on GitHub.

view this post on Zulip bjorn3 (Jun 18 2024 at 08:57):

Looks like for the windows calling convention structs larger than 64bits are passed by-reference rather than at a fixed stack offset. In other words you have to omit the sarg attribute.

See https://github.com/rust-lang/rust/blob/737e42308c6e957575692965d73b17937f936f28/compiler/rustc_target/src/abi/call/x86_win64.rs#L8-L28 for how to calculate the right ABI.

Empowering everyone to build reliable and efficient software. - rust-lang/rust

view this post on Zulip bjorn3 (Jun 18 2024 at 08:59):

You have to pass a pointer when arg.make_indirect() is used and sarg when arg.make_indirect_byval(align) is used.

view this post on Zulip bjorn3 (Jun 18 2024 at 09:00):

Just like LLVM, Cranelift only handles the lower half of the calling convention. The upper half which lowers C types is required to be implemented by the frontend. You can find the rust implementations of this for various architectures at https://github.com/rust-lang/rust/blob/master/compiler/rustc_target/src/abi/call This code is shared by the LLVM, GCC and Cranelift backends of rustc.

Empowering everyone to build reliable and efficient software. - rust-lang/rust

view this post on Zulip Cherry (Jun 18 2024 at 16:19):

Thanks! That was just the info I needed. I guess in the end we can use I8,I16,I32,I64 for bits of 1,2,4,8, where odd ones and not power of 2 ones and > 8 ones are an I64 to a ptr of the respective size. Seems to work properly now!

view this post on Zulip Notification Bot (Jun 18 2024 at 16:30):

Cherry has marked this topic as resolved.


Last updated: Dec 23 2024 at 13:07 UTC