Stream: git-wasmtime

Topic: wasmtime / Issue #1105 Add alloca support


view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 14:30):

jyn514 commented on Issue #1105:

How hard would this be to implement? I'm willing to take a shot at it.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 26 2020 at 13:15):

tschneidereit commented on Issue #1105:

@bnjbvr, @cfallin, @julian-seward1, can you comment on this?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 27 2020 at 13:00):

bnjbvr commented on Issue #1105:

I'm assuming the question arises in the context of the new backend.

From looking at LLVM's docs, it seems that alloca always takes a static (= known at compile time) amount of stack space. If that's true, it should be somewhat easy to implement (add amount to SP, adjust the "nominal SP" offset, make sure to deallocate in the return paths).

If one can pass a dynamic input value that's the amount to allocate, it is likely to be much trickier, because we need to be able to track precisely the running SP value within the function's body: that's what the nominal SP offset does in a static manner. It should be implementable, but it might require using a register for this purpose.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 27 2020 at 13:12):

bjorn3 commented on Issue #1105:

From looking at LLVM's docs, it seems that alloca always takes a static (= known at compile time) amount of stack space.

No, it also allows a dynamic input. It is just that a static input is equivalent to using stack slots in Cranelift.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 27 2020 at 18:57):

cfallin commented on Issue #1105:

It's definitely possible to implement this with the new backends. It interacts with the way we address stack slots and spill slots; at least on aarch64, we address function arguments with fp, which stays at the top of the stack frame (invariant to any allocas), but we address stack/spill slots with offsets from sp, because positive offsets are cheaper on aarch64. We track "nominal SP" as an offset from real SP, so we can continue to access this storage while we've temporarily pushed args to set up for a call.

The most straightforward approach would probably be to (i) detect when an alloca (or just a dynamic alloca) is present; then if so, (ii) allocate a separate scratch register in the prologue and copy nominal-SP to that; then (iii) access all stack and spill slots relative to that register. We lose a register in that case but I think that's unavoidable unless we revert to negative offsets from FP (which has a higher cost -- a few percent degradation at least, because it forces add instructions to synthesize addresses when offset more than -0x80, IIRC).

Happy to point out the bits that would need to change in more detail if you would like!

view this post on Zulip Wasmtime GitHub notifications bot (Jul 27 2020 at 18:57):

cfallin edited a comment on Issue #1105:

It's definitely possible to implement this with the new backends. It interacts with the way we address stack slots and spill slots; at least on aarch64, we address function arguments with fp, which stays at the top of the stack frame (invariant to any allocas), but we address stack/spill slots with offsets from sp, because positive offsets are cheaper on aarch64. We track "nominal SP" as an offset from real SP (statically during codegen), so we can continue to access this storage while we've temporarily pushed args to set up for a call.

The most straightforward approach would probably be to (i) detect when an alloca (or just a dynamic alloca) is present; then if so, (ii) allocate a separate scratch register in the prologue and copy nominal-SP to that; then (iii) access all stack and spill slots relative to that register. We lose a register in that case but I think that's unavoidable unless we revert to negative offsets from FP (which has a higher cost -- a few percent degradation at least, because it forces add instructions to synthesize addresses when offset more than -0x80, IIRC).

Happy to point out the bits that would need to change in more detail if you would like!

view this post on Zulip Wasmtime GitHub notifications bot (Jul 27 2020 at 19:00):

cfallin edited a comment on Issue #1105:

It's definitely possible to implement this with the new backends. It interacts with the way we address stack slots and spill slots; at least on aarch64, we address function arguments with fp, which stays at the top of the stack frame (invariant to any allocas), but we address stack/spill slots with offsets from sp, because positive offsets are cheaper on aarch64. We track "nominal SP" as an offset from real SP (statically during codegen), so we can continue to access this storage while we've temporarily pushed args to set up for a call (EDIT: or, with alloca support, after we've decremented real SP to allocate storage).

The most straightforward approach would probably be to (i) detect when an alloca (or just a dynamic alloca) is present; then if so, (ii) allocate a separate scratch register in the prologue and copy nominal-SP to that; then (iii) access all stack and spill slots relative to that register. We lose a register in that case but I think that's unavoidable unless we revert to negative offsets from FP (which has a higher cost -- a few percent degradation at least, because it forces add instructions to synthesize addresses when offset more than -0x80, IIRC).

Happy to point out the bits that would need to change in more detail if you would like!

view this post on Zulip Wasmtime GitHub notifications bot (Jul 27 2020 at 20:20):

peterhuene commented on Issue #1105:

Related to this, at least for the x86-64 ABIs, I would like to see Cranelift stop using RBP as a "traditional" frame pointer as both DWARF and Windows unwind information encode enough information to properly describe frame layout without having to establish a frame pointer for frames of static size. This would free RBP to be used as a GPR for functions that do not have dynamic stack allocations.

In fact, on Windows x64, a "frame pointer" is supposed to be exactly what you describe the "nominal-SP" register as: a register pointing at the base (or somewhere inside) of the "static" part of the local frame and used to reference args/locals (and CSRs for unwind) by positive offset. For that ABI, a frame pointer is therefore generally only established for frames calling alloca.

Right now the x64 prologue/epilogue instructions relating to the establishment of a traditional frame pointer are simply wasted instructions on Windows.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 27 2020 at 20:22):

peterhuene edited a comment on Issue #1105:

Related to this, at least for the x86-64 ABIs, I would like to see Cranelift stop using RBP as a "traditional" frame pointer as both DWARF and Windows unwind information encode enough information to properly describe frame layout without having to establish a frame pointer for frames of static size. This would free RBP to be used as a GPR for functions that do not have dynamic stack allocations or as the "nominal-SP" register for functions that have dynamic stack allocations.

In fact, on Windows x64, a "frame pointer" is supposed to be exactly what you describe the "nominal-SP" register as: a register pointing at the base (or somewhere inside) of the "static" part of the local frame and used to reference args/locals (and CSRs for unwind) by positive offset. For that ABI, a frame pointer is therefore generally only established for frames calling alloca.

Right now the x64 prologue/epilogue instructions relating to the establishment of a traditional frame pointer are simply wasted instructions on Windows.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 27 2020 at 20:25):

peterhuene edited a comment on Issue #1105:

Related to this, at least for the x86-64 ABIs, I would like to see Cranelift stop using RBP as a "traditional" frame pointer as both DWARF and Windows unwind encode enough information to properly describe frame layout without having to establish a frame pointer for frames of static size. This would free RBP to be used as a GPR for functions that do not have dynamic stack allocations or as the "nominal-SP" register for functions that have dynamic stack allocations.

In fact, on Windows x64, a "frame pointer" is supposed to be exactly what you describe the "nominal-SP" register as: a register pointing at the base (or somewhere inside) of the "static" part of the local frame and used to reference args/locals (and CSRs for unwind) by positive offset. For that ABI, a frame pointer is therefore generally only established for frames calling alloca.

Right now the x64 prologue/epilogue instructions relating to the establishment of a traditional frame pointer are simply wasted instructions on Windows.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 27 2020 at 20:27):

bjorn3 commented on Issue #1105:

Related to this, at least for the x86-64 ABIs, I would like to see Cranelift stop using RBP as a "traditional" frame pointer as both DWARF and Windows unwind information encode enough information to properly describe frame layout without having to establish a frame pointer for frames of static size.

This should be an option in my opinion. Using DWARF unwinding for perf profiles as opposed to frame pointers results in much bigger perf.data files and slower perf report, as it requires capturing a big chunk of the stack and then performing the unwinding offline. Online unwinding using DWARF tables is simply too slow.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 27 2020 at 20:32):

peterhuene commented on Issue #1105:

This should be an option in my opinion.

Definitely, but I think omitting a traditional frame pointer should be default for these ABIs, at least for optimized compilations. An option to opt-in when they are legitimately needed (like in the case of a tool relying on them for fast stack walks) makes sense to me.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 27 2020 at 21:22):

cfallin commented on Issue #1105:

@peterhuene that's a good point -- could you create a separate issue for that? I definitely agree that -fomit-frame-pointer optimizations are something we should look into at some point.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 27 2020 at 21:29):

peterhuene commented on Issue #1105:

I opened #1149 a while back specific to Windows. Should we create a more general "omit frame pointers when permitted" issue?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 27 2020 at 22:15):

cfallin commented on Issue #1105:

Sure, I think it makes sense to track with a separate issue; it's a distinct thing that we'd want to do on any platform when we're allowed to (by ABIs and by debug requirements).

view this post on Zulip Wasmtime GitHub notifications bot (Jul 27 2020 at 23:15):

peterhuene commented on Issue #1105:

I've opened #2073.


Last updated: Dec 23 2024 at 12:05 UTC