afonso360 opened PR #6324 from afonso360:riscv-vec-vconst
to bytecodealliance:main
:
:wave: Hey,
This PR does a number of things but the main goal is to enable
vconst
on the RISC-V backend.
- Cleanups to
Store
/Emit
instructions
- We now fallback to
LoadAddr
when we can't directly support theAMode
- Moved the instruction encoding to the
encoding.rs
file- Add Constant Pool and MachLabel based
AMode
's- Cleanup the existing
VecLoad
/VecStore
to avoid adding a 0 offset where possible- Add the
vconst
loweringThis is quite big, but it made sense to me to have it all together in a PR. Let me know if you'd like me to split this up a bit more.
afonso360 requested jameysharp for a review on PR #6324.
afonso360 requested wasmtime-compiler-reviewers for a review on PR #6324.
afonso360 edited PR #6324:
:wave: Hey,
This PR does a number of things but the main goal is to enable
vconst
loading from the constant pool on the RISC-V backend.
- Cleanups to
Store
/Emit
instructions
- We now fallback to
LoadAddr
when we can't directly support theAMode
- Moved the instruction encoding to the
encoding.rs
file- Add Constant Pool and MachLabel based
AMode
's- Cleanup the existing
VecLoad
/VecStore
to avoid adding a 0 offset where possible- Add the
vconst
loweringThis is quite big, but it made sense to me to have it all together in a PR. Let me know if you'd like me to split this up a bit more.
jameysharp submitted PR review:
I have a few questions and suggestions, but if you feel like merging this now I think that would be fine.
jameysharp submitted PR review:
I have a few questions and suggestions, but if you feel like merging this now I think that would be fine.
jameysharp created PR review comment:
I'm not real familiar with the
AllocationConsumer
usage patterns yet. Why is it necessary to callallocs.next
in the cases where the result is just thrown away? I see that this is the way the existing code worked, so it doesn't need to change in this PR; I'm just curious.
jameysharp created PR review comment:
I think I am destined to be confused about the math for these labels every single time I look at them. The low 12 bits are interpreted as signed, right? Does this handle that correctly?
jameysharp created PR review comment:
I see this is the prevailing style here, so some notes for possible future work:
Is there any reason not to have a little more type-safety for encoding helpers? For example, instead of passing
op.op_code()
andop.funct_3()
separately, passop
and unpack it in the encoder. Or passrd
as aWritableReg
, since the destination register had better always be a valid destination, and avoid usingto_reg()
at the call site.If all the arguments have more specialized types than
u32
, and those types ensure that they only hold values in the appropriate range for the corresponding field in the instruction encoding, then I'd prefer to remove the& 0b111…
masks everywhere. If any bits outside the mask are set, then that's a bug, and if none are set then the mask doesn't do anything, so the only effect of masking is to hide bugs.I'm also not fond of the masks being written out in binary because it's easy to mis-count how many ones there are.
So if we think there might be bugs where high bits are set then I think I'd want something like
bits |= unsigned_field_width(opcode, 7);
with that function defined like this:fn unsigned_field_width(value: u32, width: u8) -> u32 { debug_assert_eq!(value & (!0 << width), 0); value }
jameysharp created PR review comment:
Could this instead be something like
format!("{}", self.clone().with_allocs(allocs))
?
jameysharp created PR review comment:
Is this correct if the constant ends up being roughly 2048 bytes away, modulo 4096? I'm wondering if the high 20 bits might be off by one sometimes since the two instructions are extracting their 20/12 bits from offsets which differ by 4 bytes.
jameysharp created PR review comment:
I don't know what other vector address modes there might be in the future so I'm not sure whether it makes sense to assume that if they have a base register and an offset of zero then it's always safe to just use the register. I think I'd prefer nested pattern-matching here, and similarly below:
match from { VecAMode::UnitStride { base } => { if let (Some(base), 0) = (base.get_base_register(), base.get_offset_with_state(state)) { base } else { … } } }
jameysharp created PR review comment:
This is a pre-existing issue, but: it's not clear to me whether it's okay to pass the stack or frame pointer registers to
reg_use
, since they aren't allocatable.
cfallin created PR review comment:
The calls to
AllocationConsumer::next()
need to match one-to-one with the args provided to RA2; otherwise all the assignments for the following instructions will be shifted.However, I do agree with your comment below that SP and FP should not be regalloc args in the first place, since they aren't allocatable registers! I had thought we had an assert for this, but maybe not...
afonso360 created PR review comment:
This turned out really neatly, I've implemented these suggestions in https://github.com/bytecodealliance/wasmtime/pull/6325/commits/fb87186256527cfe5f6cd0301162e9583c9955e7 since Alex had some similar concerns. I'm going to get that PR merged, and do something similar for these.
afonso360 created PR review comment:
This comment reminded me that we discussed something almost exactly like this a while ago. And I found it:
https://github.com/bytecodealliance/wasmtime/pull/5951#issuecomment-1459129236The
Call
Relocation in the JIT performs exactly the same arithmetic. And it is different from this one, we only offset the Lo12 by 4 not the Hi20, which makes sense.I'm also going to try to reproduce that off-by-one with a test, I get really confused by this relocation math.
afonso360 edited PR review comment.
afonso360 updated PR #6324.
afonso360 created PR review comment:
I think I got this right, but I'm never confident when this sort of relocation math is involved.
We had 2 issues here. The first one you pointed out, only the Lo12 relocation should have the 4byte offset. The second one was the Hi20 relocation, that needs to get a
0x800
offset so that it skips to the next page as soon as the offset goes out of range for Lo12 since it is signed.
jameysharp submitted PR review:
I think this is ready to go, but I'm taking this week off and don't want to think quite hard enough to decide that. @elliottt, could you give this a quick pass and see if it makes sense to you too?
jameysharp submitted PR review:
I think this is ready to go, but I'm taking this week off and don't want to think quite hard enough to decide that. @elliottt, could you give this a quick pass and see if it makes sense to you too?
jameysharp created PR review comment:
This feels backwards to me, but I think it's probably correct.
The instruction using Hi20 comes first, followed by the Lo12. So if the Hi20 label is at address
x
, then Lo12 is at addressx+4
.Based on that, if we want to compute the same address for both labels, I expected that either Hi20 should use
x+4
, or Lo12 should usex-4
.Of the two instructions, the one which actually examines the program counter is
auipc
, so I think we need to compute the offset relative to that instruction. So I think you are correct that it's Lo12 that needs adjustment.But I guess here we aren't given the address of the instruction, right?
offset
is the distance from this instruction to the address we want, or in other words,target_address - insn_address
. So ifinsn_address
isx-4
, thenoffset
istarget_address - (x - 4)
, which is equivalent to(target_address - x) + 4
.And that's what you've implemented. If there's a way to make the comment more clear that'd be fantastic, but I'm at least reasonably convinced that this is correct.
afonso360 updated PR #6324.
elliottt created PR review comment:
It looks like we're discovering some gaps in capstone :(
elliottt created PR review comment:
Thank you for the additional comment, @afonso360!
elliottt submitted PR review:
This looks good to me, just want to confirm that these offsets are ignored on purpose though :)
elliottt submitted PR review:
This looks good to me, just want to confirm that these offsets are ignored on purpose though :)
elliottt created PR review comment:
Should the offset be used in the
else
branch?
elliottt created PR review comment:
Same offset question here.
afonso360 created PR review comment:
base
here has both the register and the offset inside
elliottt submitted PR review:
Looks good to me, thank you @afonso360!
afonso360 created PR review comment:
Yeah, unfortunately capstone doesn't recognize any V extension instructions :disappointed:
afonso360 merged PR #6324.
Last updated: Dec 23 2024 at 12:05 UTC