Note: Using Windows x86_64
From my understanding, I can add this as a param with type I64 and ArgumentPurpose::StructArgument(size)
. I receive a pointer to the struct which I can read from to get the data.
I've tried doing this, but the struct I passed in is not coming out correctly unless I offset the ptr read by 16 bytes.
I know the structs size ahead of time. But I do not know of any way to properly read this. Either that or I'm doing this wrong and missing something.
The test case is quite simple however. I read straight off of the v0
I passed into another function.
defining function funcid1: function u0:1(i64 sarg(32), i64 sret) windows_fastcall {
sig0 = (i64, i64, i64) windows_fastcall
fn0 = u0:0 sig0
block0(v0: i64, v1: i64):
stack_store v0, ss0
v2 = iconst.i64 0x01af_9243_d8f0
call fn0(v0, v2, v1) ; v2 = 0x01af_9243_d8f0
return
}
What is the function declaration on the caller's side? Eg if you called it from C what is the function signature and struct definition.
Also did you mean to pass v0 as struct or as pointer to fn0?
bjorn3 said:
Also did you mean to pass v0 as struct or as pointer to fn0?
I meant to pass it as a ptr. My real code actually is storing sarg into an explicit stack, where I pass the stack addr into the function, and read the ptr out of it. But it appears to be the same effect either way.
fn0 has this signature:
pub extern "fastcall" fn __jit_cb(args: *const (), data: &Data, ret: *mut Ret)
My bad for not providing the code. My quick test case is this. I replaced ret
with my JITted fn (funcid1
) and I'm doing a test call using get_ret()
#[derive(Debug)]
#[repr(C)]
struct TestStruct {
one: u64,
two: u64,
three: u64,
four: u64,
}
#[inline(never)]
extern "fastcall" fn ret(data: TestStruct) -> TestStruct {
println!("ret() got {data:?}");
data
}
#[inline(never)]
extern "fastcall" fn get_ret() {
let data = TestStruct {
one: 22,
two: 44,
three: 55,
four: 57,
};
let data = ret(data);
println!("get_ret() got: {data:?}");
}
What is the definition of the function you called from the jitted function?
on llvm, sret arguments generally have to be the first argument, cranelift is most likely the same. you have your sret argument as the second argument
Right, of course. Cranelift indeed requires it to be first. Probably should add a verifier check for that.
Jacob Lifshay said:
on llvm, sret arguments generally have to be the first argument, cranelift is most likely the same. you have your sret argument as the second argument
Thank you for this tip. I'll go fix my code to place it first and reply back after with the result.
Ok, I have this now, but it seems I still have the same issue with needing to offset sarg by 16 to find the data (you can ignore the explicit stack. same effect happens even with passing and using the direct ptr)
defining function funcid1: function u0:1(i64 sret, i64 sarg(32)) windows_fastcall {
ss0 = explicit_slot 8, align = 256
sig0 = (i64, i64, i64) windows_fastcall
fn0 = u0:0 sig0
block0(v0: i64, v1: i64):
stack_store v1, ss0
v2 = iconst.i64 0x023a_d589_0320
v3 = stack_addr.i64 ss0
call fn0(v3, v2, v0) ; v2 = 0x023a_d589_0320
return
}
What is the definition of the function being called by the jitter function?
bjorn3 said:
What is the definition of the function being called by the jitter function?
I've simplified the test case. Here is all the info:
I am passing a stack addr in, on which I stored sarg's ptr at offset 0. I read the ptr from the stack address passed in.
The JIT fn is calling this signature. stack address is first ptr args
pub extern "fastcall" fn __jit_cb(args: *const (), data: &Data, ret: *mut Ret) {
Inside the jit cb, I read the sarg from the stack addr
unsafe { *args.cast::<*const u8>() };
I then read bytes from this (32 in this test case) into a Vec with nonoverlapping copy, and finally transmute it back to TestStruct
let mut buffer = Vec::with_capacity(count);
unsafe {
ptr::copy_nonoverlapping(src, buffer.as_mut_ptr(), count);
}
unsafe {
buffer.set_len(count);
}
The ir is here
defining function funcid1: function u0:1(i64 sret, i64 sarg(32)) windows_fastcall {
ss0 = explicit_slot 8, align = 256
sig0 = (i64, i64, i64) windows_fastcall // <-- __jit_cb
fn0 = u0:0 sig0 // <-- __jit_cb
block0(v0: i64, v1: i64):
stack_store v1, ss0
v2 = iconst.i64 0x01af_78a8_0610
v3 = stack_addr.i64 ss0
// *const (), data: &Data, ret: *mut Ret
call fn0(v3, v2, v0) ; v2 = 0x01af_78a8_0610
return
}
After I generate the JIT fn, I set the resulting fn ptr on the static and call get_ret()
which calls the JIT fn
#[derive(Debug)]
#[repr(C)]
struct TestStruct {
one: u64,
two: u64,
three: u64,
four: u64,
}
// funcid1 below
static TEST_FN: OnceLock<extern "fastcall" fn(TestStruct) -> TestStruct> = OnceLock::new();
#[inline(never)]
extern "fastcall" fn get_ret() {
let data = TestStruct {
one: 22,
two: 44,
three: 55,
four: 57,
};
let data = TEST_FN.get().unwrap()(data);
println!("get_ret() got: {data:?}");
}
My output is as follows. The TEST_FN is simply reading sargs bytes and returning them in sret, nothing fancy
get_ret() got: TestStruct { one: 0, two: 1853154559352, three: 22, four: 44 }
I made a reduced test case here (run it on Win x64)
https://github.com/MolotovCherry/cranelift-misalignment
After making this as simple as I can and still seeing it, I have no idea what to do anymore to solve this. If this really is something I'm doing wrong, I hope someone can point it out
defining function funcid1: function u0:1(i64 sret, i64 sarg(32)) windows_fastcall {
sig0 = (i64 sret, i64 sarg(32)) windows_fastcall
fn0 = u0:0 sig0
block0(v0: i64, v1: i64):
call fn0(v0, v1)
return
}
// I passed in
// TestStruct {
// one: 22,
// two: 44,
// three: 55,
// four: 57,
// }
cb got TestStruct { one: 0, two: 22, three: 44, four: 55 }
get_ret got: TestStruct { one: 0, two: 22, three: 44, four: 55 }
Looks like for the windows calling convention structs larger than 64bits are passed by-reference rather than at a fixed stack offset. In other words you have to omit the sarg
attribute.
See https://github.com/rust-lang/rust/blob/737e42308c6e957575692965d73b17937f936f28/compiler/rustc_target/src/abi/call/x86_win64.rs#L8-L28 for how to calculate the right ABI.
You have to pass a pointer when arg.make_indirect()
is used and sarg
when arg.make_indirect_byval(align)
is used.
Just like LLVM, Cranelift only handles the lower half of the calling convention. The upper half which lowers C types is required to be implemented by the frontend. You can find the rust implementations of this for various architectures at https://github.com/rust-lang/rust/blob/master/compiler/rustc_target/src/abi/call This code is shared by the LLVM, GCC and Cranelift backends of rustc.
Thanks! That was just the info I needed. I guess in the end we can use I8,I16,I32,I64 for bits of 1,2,4,8, where odd ones and not power of 2 ones and > 8 ones are an I64 to a ptr of the respective size. Seems to work properly now!
Cherry has marked this topic as resolved.
Last updated: Nov 22 2024 at 16:03 UTC