Stream: git-wasmtime

Topic: wasmtime / issue #4521 Cranelift AArch64: Migrate `Splat`...


view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 20:00):

cfallin commented on issue #4521:

Happy to merge once merge conflicts are resolved.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 26 2022 at 15:56):

akirilov-arm commented on issue #4521:

@cfallin I have also noticed that x16 and x17 are reserved as spill temporaries, hence not allocatable - is that still relevant for regalloc2?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 26 2022 at 16:38):

cfallin commented on issue #4521:

@akirilov-arm yes, unfortunately... regalloc2 actually doesn't need any temporaries anymore, but aarch64 itself does. The reason is that a spillslot may be at a greater offset from sp or fp than we can reach with an imm12, so we need a sequence of instructions to synthesize the address of a spillslot before spilling or reloading. That sequence itself can't require spilling another register if all registers are full (as they are likely to be if we're spilling in the first place), so we need to set aside x16 for that.

If I recall correctly, x17 is used in stack-limit check sequences, but my memory is less clear; I'd have to go look at the use-cases again.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 26 2022 at 16:39):

cfallin commented on issue #4521:

To follow up on that, I guess an alternative approach would be to reserve a small-offset slot to spill another victim to if we need to compute a spillslot address at a large distance away -- so we can bootstrap our way there with no registers initially free. That's a little more complexity but I wouldn't be averse to reviewing a PR if someone wants to attempt that.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 26 2022 at 16:54):

akirilov-arm commented on issue #4521:

Another alternative is to reserve a vector register, which would give us the same space as 2 GPRs.

One further idea - if the preserve_frame_pointers flag introduced by PR #4469 is true, then we should be able to use lr/x30 as a temporary register instead of making it unallocatable as it is right now.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 26 2022 at 17:03):

cfallin commented on issue #4521:

One further idea - if the preserve_frame_pointers flag introduced by PR https://github.com/bytecodealliance/wasmtime/pull/4469 is true, then we should be able to use lr/x30 as a temporary register instead of making it unallocatable as it is right now.

Ah, that's a really interesting idea actually. I guess it's not needed for unwind (that starts from fp), and in the middle of this sequence we won't be doing any calls, and it's otherwise usable as a regular GPR... I like it.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 26 2022 at 17:08):

akirilov-arm commented on issue #4521:

The optimal solution would be to adjust the set of allocatable registers on a per-function basis, so that we could do the same for non-leaf functions (or leaf functions that use the stack) when preserve_frame_pointers is false, but I don't think the backend plumbing is set to enable this.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 26 2022 at 17:14):

akirilov-arm edited a comment on issue #4521:

The optimal solution would be to adjust the set of allocatable registers on a per-function basis, so that we could do the same for non-leaf functions (or leaf functions that use the stack) when preserve_frame_pointers is false, but I don't think the backend plumbing is set to enable this.

P.S. Actually I didn't mean to use x30 as a spill temporary, but as a generic temporary register, i.e. put it in the set of preferred registers for allocation; calls would be set up to clobber it. Restricting it to a spill temporary is also an interesting idea because we would be able to reclaim either x16 or x17.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 26 2022 at 17:16):

akirilov-arm edited a comment on issue #4521:

The optimal solution would be to adjust the set of allocatable registers on a per-function basis, so that we could do the same for non-leaf functions (or leaf functions that use the stack) when preserve_frame_pointers is false, but I don't think the backend plumbing is set to enable this.

P.S. Actually I didn't mean to use x30 as a spill temporary, but as a generic temporary register, i.e. put it in the set of preferred registers for allocation; calls would be set up to clobber it, so that regalloc would do the right thing. Restricting it to a spill temporary is also a viable idea because we would be able to reclaim either x16 or x17; I suppose I am just ambituous and want all of them :smile:.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 26 2022 at 17:17):

jameysharp commented on issue #4521:

I'm curious about the small-offset spillslot idea. I haven't looked at aarch64 details and don't know enough about Cranelift internals yet, but my assumptions are:

Something along those lines?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 26 2022 at 17:22):

cfallin commented on issue #4521:

@jameysharp yep, that's more or less it.

@akirilov-arm re:

I don't think the backend plumbing is set to enable this

we could definitely change that! The only thing I want to hold as a hard requirement is that we don't build it dynamically per-function (because there are lots of tiny functions and that would be a nontrivial cost); right now we build it once when the compiler backend is constructed. We could perhaps build a few versions of it though, and return the right one in the regalloc2::Function trait -- one for leaf functions and one without; and variations based on compiler flags.

Anyway we're getting on quite a tangent here but if you're interested, please feel free to file an issue to "reclaim spilltmp registers on aarch64" as a future enhancement to track this!


Last updated: Dec 23 2024 at 12:05 UTC