Kmeakin opened PR #9088 from Kmeakin:km/pulley-packed-binary-operands
to bytecodealliance:main
:
Pack the
dst
,src1
andsrc2
register operands of binary operators into 2 bytes. Solves theTODO
at the top oflib.rs
Kmeakin requested wasmtime-default-reviewers for a review on PR #9088.
Kmeakin requested cfallin for a review on PR #9088.
cfallin requested fitzgen for a review on PR #9088.
cfallin commented on PR #9088:
Reassigned @fitzgen to review, I'm not sure how this intersects with your other plans to add instructions with large offsets etc; happy to review this myself if you'd like otherwise.
Kmeakin updated PR #9088.
Kmeakin edited PR #9088:
Pack the
dst
,src1
andsrc2
register operands of binary operators into 2 bytes. Solves theTODO
at the top oflib.rs
.This required moving the special registers into their own register class, and adding
smov
andspush
/spop
to interact with SP/FP.
Kmeakin updated PR #9088.
fitzgen submitted PR review:
Thanks!
BinaryOperands
is exactly what I was imagining!I'd prefer holding off on merging this until I can land the Cranelift backend for Pulley though, just so that I don't have to chase instruction set updates, and we can start evolving the backend and the instruction set together and keep them in sync.
fitzgen submitted PR review:
Thanks!
BinaryOperands
is exactly what I was imagining!I'd prefer holding off on merging this until I can land the Cranelift backend for Pulley though, just so that I don't have to chase instruction set updates, and we can start evolving the backend and the instruction set together and keep them in sync.
fitzgen created PR review comment:
I was indeed thinking that the special registers would only be accessed from special instructions, but I think we can do even more here by bundling the whole function prologue and epilogue up into instructions that just take the size to decrement the stack frame by:
enter_frame 16 ... exit_frame 16 ret
where
enter_frame N ==> push lr push fp fp = sp sp = sp - N
and the opposite for
exit_frame N
.
fitzgen created PR review comment:
Why all these changes? Seems unrelated?
fitzgen created PR review comment:
Could we split this out into its own PR?
fitzgen created PR review comment:
We shouldn't need this, or any other general instructions for the special registers. They should only be manipulated by instructions that are doing other stuff as well, eg
call
will setlr
. But these registers aren't allocated by our register allocator and are reserved for their special purpose, so we shouldn't need to have general instructions for them.
github-actions[bot] commented on PR #9088:
Subscribe to Label Action
cc @fitzgen
<details>
This issue or pull request has been labeled: "pulley"Thus the following users have been cc'd because of the following labels:
- fitzgen: pulley
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
Kmeakin submitted PR review.
Kmeakin created PR review comment:
Encodes the invariant that registers are always in the range
0..32
, so indexing into the registers array can never fail. The other option would be to use the#[rustc_layout_scalar_valid_range_end(31)]
attribute, but that is nightly-only.
fitzgen submitted PR review.
fitzgen created PR review comment:
Are you saying we get better codegen with
enum
s here? That's neat, I wouldn't have expected the LLVM IR that rustc emits to change.
Kmeakin updated PR #9088.
Kmeakin submitted PR review.
Kmeakin created PR review comment:
https://godbolt.org/z/jPrq5bc45
rustc emits calls to
assume
for types that have niches, and starting from LLVM 19 function parameters can haverange
metadata attached directly
Kmeakin submitted PR review.
Kmeakin created PR review comment:
I'm a bit confused about the purpose of the
N
toenter_frame
/exit_frame
. Eachpush
/pop
will itself automatically adjust the sp bysize_of<*u8>()
. So in the case of no other registers being stored on the stack, the argument will be zero. In the case where other registers need to be saved on the stack,xpush{32,64}
andxpop{32,64}
will also adjust the sp by 4/8 bytes. So in either case the argument is not needed? I don't think there is any scenario where you would need to adjust the stack pointer without also reading/writing from/to the stack
Kmeakin updated PR #9088.
Kmeakin updated PR #9088.
Kmeakin updated PR #9088.
fitzgen submitted PR review.
fitzgen created PR review comment:
I'm a bit confused about the purpose of the
N
toenter_frame
/exit_frame
.To allocate the stack frame for this function, including all of its regalloc spills and all that.
That said, the way that Cranelift is organized and the abstraction boundaries it has might make doing the full function prologue in a single instruction difficult. We might have to settle for two instructions:
- a macro instruction for
push lr; push fp; fp = sp
(and the opposite for exiting the function, maybe just fold this intoret
)- an instruction for allocating and deallocating stack space:
sp = sp +/- imm32
See
gen_prologue_frame_setup
,gen_epilogue_frame_restore
, andgen_sp_reg_adjust
in../pulley_shared/abi.rs
in https://github.com/bytecodealliance/wasmtime/pull/9089 (or the equivalent methods in, e.g.,../aarch64/abi.rs
). I'm thinking that each of these should produce a single macro instruction because they otherwise always generate the exact same sequence of instructions modulo immediate constant values.The benefit of a single macro instruction, compared to a sequence of composable and more general instructions, is performance. The expensive part of an interpreter is roundtripping through the opcode loop, by using macro instructions we are amortizing that cost.
Does that make sense?
Kmeakin updated PR #9088.
Kmeakin submitted PR review.
Kmeakin created PR review comment:
Decided the extra argument was not needed after all:
https://bytecodealliance.zulipchat.com/#narrow/stream/217126-wasmtime/topic/Pulley.20registers/near/459533622
Kmeakin updated PR #9088.
Kmeakin updated PR #9088.
fitzgen commented on PR #9088:
Heads up: the Pulley backend for Cranelift landed in #9089 and I think this can be rebased and landed now
Kmeakin updated PR #9088.
Kmeakin updated PR #9088.
Kmeakin requested abrown for a review on PR #9088.
Kmeakin requested wasmtime-compiler-reviewers for a review on PR #9088.
Kmeakin updated PR #9088.
Kmeakin edited PR #9088:
Pack the
dst
,src1
andsrc2
register operands of binary operators into 2 bytes. Solves theTODO
at the top oflib.rs
.~This required moving the special registers into their own register class, and adding
smov
andspush
/spop
to interact with SP/FP.~
I held off on splitting special registers into their ownSReg
class because I'm not sure how to integrate it with ISLE/Regalloc
Kmeakin updated PR #9088.
Kmeakin updated PR #9088.
fitzgen submitted PR review:
Thanks!
fitzgen merged PR #9088.
Last updated: Jan 24 2025 at 00:11 UTC