Copying memory larger than a single Value (memcpy?) · cranelift

I'm trying to figure out if there's a built-in way in cranelift to copy a memory region larger than what would fit in a value. For instance, I have a user struct, and a value pointing to its base address, and I want to copy it to a stack slot (for which I also have its base address of course). Do I have to emit individual load / store instructions for each of the primitive fields in the flattened structure? Or is there a more optimal way to express my intent to copy a block of memory with the IR?

Another thing I'm wondering about is the distinction between stack_load / stack_store and load / store. For simplicity, I am currently generating all memory operations as load / store, because it's easy to generate code that way. Every time I have a stack slot, I simply use stack_addr to obtain its address. Will emitting the IR like this lead to less optimization opportunities? Or is stack_load / stack_store just an alias for convenience?

Jamey Sharp (May 01 2023 at 16:12):

I don't think there's anything wrong with always using stack_addr. I can't immediately think of any optimization opportunities we'd miss as a result, but I'm not guaranteeing anything. :laughing: If you find that Cranelift is generating silly code sequences as a result, we might be able to fix that with either mid-end optimizations or better back-end lowering rules.

Cranelift: Introduce memcpy and memset instructions · Issue #5479 · bytecodealliance/wasmtime

Feature Introduce instructions that behave like memcpy and memset. These should lower to repe movsb and repe stosb for memcpy and memset respectively on x86_64 with the ermsb feature. According to ...

Chris Fallin (May 01 2023 at 16:14):

FWIW, stack_load / stack_store are legalized to stack_addr followed by regular loads/stores, so there's no need to worry about suboptimal codegen from that particular aspect

Setzer22 (May 01 2023 at 16:33):

Thanks! That seems just what I needed. I was already halfway implementing something myself, but I'd rather use a builtin solution.

I'm so glad to hear this! :sweat_smile: I wasn't looking forward to juggling two different representations of memory during codegen.

On that note, I was taking a look at emit_small_memory_copy and I see it has some alignment requirements on the alignment of the provided src and dst addresses. When allocating memory by reserving a stack slot, how can I make sure the stack slot will have a specific alignment? create_sized_stack_slot only takes a size parameter, but no alignment

fitzgen (he/him) (May 01 2023 at 17:15):

stack_{load,store} might automatically set the "stack" bit on the mem flags, which might give a benefit for our simple alias analysis, but I'm not totally sure if that happens here or is up to the front ends

Jamey Sharp (May 01 2023 at 17:15):

We had some discussion and consensus in https://github.com/bytecodealliance/wasmtime/issues/5922 that stack slots should have an alignment specifier but I don't think anybody is working on that yet. In that issue bjorn3 noted that "if you ensure that every stack slot is a multiple of 16, you get 16 aligned stack slots on all current backends." So that's a workaround you can use for stack-slot alignment for now.

Cranelift: Segfault with unaligned stack load sunk into `extractlane` on x64 · Issue #5922 · bytecodealliance/wasmtime

👋 Hey, Me and @alexcrichton were discussing this via Zulip. The stack_load gets translated into a load.f64x2 notrap aligned v34, and that's why we do the load sinking in the backend. Is stack_load ...

Setzer22 (May 01 2023 at 18:08):

Setzer22 (May 01 2023 at 18:10):

Chris Fallin (May 01 2023 at 18:12):

16-alignment shows up both in ABIs (x86-64 and aarch64 both require SP to be 16-aligned) and in vector ISAs (some 128-bit-vector loads/stores -- actually usually most, except "unaligned" variants -- require 16-alignment)

Chris Fallin (May 01 2023 at 18:12):

Setzer22 (May 01 2023 at 18:30):

Right, makes sense! Just to make sure I understand: This workaround is just to ensure that my own stack slots are always 16-aligned, but cranelift still ensures the stack pointer is set to a multiple of 16 when I call a function using e.g. the SysV calling convention (or any other that requires it) right? I'm guessing I don't need to manually insert any padding stack slots myself for that

bjorn3 (May 01 2023 at 19:24):

Yeah, cranelift will always ensure that the calling convention is followed with respect to stack alignment. It does so even if you use stack slots of odd sizes. Only if you want your own stack slots to be 16 byte aligned do you need to make them all a multiple of 16 bytes as size.

Stream: cranelift

Topic: Copying memory larger than a single Value (memcpy?)

Setzer22 (May 01 2023 at 10:57):

Jamey Sharp (May 01 2023 at 16:12):

Chris Fallin (May 01 2023 at 16:14):

Setzer22 (May 01 2023 at 16:33):

fitzgen (he/him) (May 01 2023 at 17:15):

Jamey Sharp (May 01 2023 at 17:15):

Setzer22 (May 01 2023 at 18:08):

Setzer22 (May 01 2023 at 18:10):

Chris Fallin (May 01 2023 at 18:12):

Chris Fallin (May 01 2023 at 18:12):

Setzer22 (May 01 2023 at 18:30):

bjorn3 (May 01 2023 at 19:24):