Stream: git-wasmtime

Topic: wasmtime / PR #6324 riscv64: Add `vconst` lowerings


view this post on Zulip Wasmtime GitHub notifications bot (May 02 2023 at 09:57):

afonso360 opened PR #6324 from afonso360:riscv-vec-vconst to bytecodealliance:main:

:wave: Hey,

This PR does a number of things but the main goal is to enable vconst on the RISC-V backend.

This is quite big, but it made sense to me to have it all together in a PR. Let me know if you'd like me to split this up a bit more.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2023 at 09:57):

afonso360 requested jameysharp for a review on PR #6324.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2023 at 09:57):

afonso360 requested wasmtime-compiler-reviewers for a review on PR #6324.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2023 at 10:25):

afonso360 edited PR #6324:

:wave: Hey,

This PR does a number of things but the main goal is to enable vconst loading from the constant pool on the RISC-V backend.

This is quite big, but it made sense to me to have it all together in a PR. Let me know if you'd like me to split this up a bit more.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2023 at 18:04):

jameysharp submitted PR review:

I have a few questions and suggestions, but if you feel like merging this now I think that would be fine.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2023 at 18:04):

jameysharp submitted PR review:

I have a few questions and suggestions, but if you feel like merging this now I think that would be fine.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2023 at 18:04):

jameysharp created PR review comment:

I'm not real familiar with the AllocationConsumer usage patterns yet. Why is it necessary to call allocs.next in the cases where the result is just thrown away? I see that this is the way the existing code worked, so it doesn't need to change in this PR; I'm just curious.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2023 at 18:04):

jameysharp created PR review comment:

I think I am destined to be confused about the math for these labels every single time I look at them. The low 12 bits are interpreted as signed, right? Does this handle that correctly?

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2023 at 18:04):

jameysharp created PR review comment:

I see this is the prevailing style here, so some notes for possible future work:

Is there any reason not to have a little more type-safety for encoding helpers? For example, instead of passing op.op_code() and op.funct_3() separately, pass op and unpack it in the encoder. Or pass rd as a WritableReg, since the destination register had better always be a valid destination, and avoid using to_reg() at the call site.

If all the arguments have more specialized types than u32, and those types ensure that they only hold values in the appropriate range for the corresponding field in the instruction encoding, then I'd prefer to remove the & 0b111… masks everywhere. If any bits outside the mask are set, then that's a bug, and if none are set then the mask doesn't do anything, so the only effect of masking is to hide bugs.

I'm also not fond of the masks being written out in binary because it's easy to mis-count how many ones there are.

So if we think there might be bugs where high bits are set then I think I'd want something like bits |= unsigned_field_width(opcode, 7); with that function defined like this:

fn unsigned_field_width(value: u32, width: u8) -> u32 {
    debug_assert_eq!(value & (!0 << width), 0);
    value
}

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2023 at 18:04):

jameysharp created PR review comment:

Could this instead be something like format!("{}", self.clone().with_allocs(allocs))?

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2023 at 18:04):

jameysharp created PR review comment:

Is this correct if the constant ends up being roughly 2048 bytes away, modulo 4096? I'm wondering if the high 20 bits might be off by one sometimes since the two instructions are extracting their 20/12 bits from offsets which differ by 4 bytes.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2023 at 18:04):

jameysharp created PR review comment:

I don't know what other vector address modes there might be in the future so I'm not sure whether it makes sense to assume that if they have a base register and an offset of zero then it's always safe to just use the register. I think I'd prefer nested pattern-matching here, and similarly below:

match from {
    VecAMode::UnitStride { base } => {
        if let (Some(base), 0) = (base.get_base_register(), base.get_offset_with_state(state)) {
            base
        } else {
            
        }
    }
}

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2023 at 18:04):

jameysharp created PR review comment:

This is a pre-existing issue, but: it's not clear to me whether it's okay to pass the stack or frame pointer registers to reg_use, since they aren't allocatable.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2023 at 20:34):

cfallin created PR review comment:

The calls to AllocationConsumer::next() need to match one-to-one with the args provided to RA2; otherwise all the assignments for the following instructions will be shifted.

However, I do agree with your comment below that SP and FP should not be regalloc args in the first place, since they aren't allocatable registers! I had thought we had an assert for this, but maybe not...

view this post on Zulip Wasmtime GitHub notifications bot (May 03 2023 at 21:14):

afonso360 created PR review comment:

This turned out really neatly, I've implemented these suggestions in https://github.com/bytecodealliance/wasmtime/pull/6325/commits/fb87186256527cfe5f6cd0301162e9583c9955e7 since Alex had some similar concerns. I'm going to get that PR merged, and do something similar for these.

view this post on Zulip Wasmtime GitHub notifications bot (May 04 2023 at 09:42):

afonso360 created PR review comment:

This comment reminded me that we discussed something almost exactly like this a while ago. And I found it:
https://github.com/bytecodealliance/wasmtime/pull/5951#issuecomment-1459129236

The Call Relocation in the JIT performs exactly the same arithmetic. And it is different from this one, we only offset the Lo12 by 4 not the Hi20, which makes sense.

I'm also going to try to reproduce that off-by-one with a test, I get really confused by this relocation math.

view this post on Zulip Wasmtime GitHub notifications bot (May 04 2023 at 09:45):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (May 05 2023 at 14:03):

afonso360 updated PR #6324.

view this post on Zulip Wasmtime GitHub notifications bot (May 05 2023 at 14:13):

afonso360 created PR review comment:

I think I got this right, but I'm never confident when this sort of relocation math is involved.

We had 2 issues here. The first one you pointed out, only the Lo12 relocation should have the 4byte offset. The second one was the Hi20 relocation, that needs to get a 0x800 offset so that it skips to the next page as soon as the offset goes out of range for Lo12 since it is signed.

view this post on Zulip Wasmtime GitHub notifications bot (May 08 2023 at 18:01):

jameysharp submitted PR review:

I think this is ready to go, but I'm taking this week off and don't want to think quite hard enough to decide that. @elliottt, could you give this a quick pass and see if it makes sense to you too?

view this post on Zulip Wasmtime GitHub notifications bot (May 08 2023 at 18:01):

jameysharp submitted PR review:

I think this is ready to go, but I'm taking this week off and don't want to think quite hard enough to decide that. @elliottt, could you give this a quick pass and see if it makes sense to you too?

view this post on Zulip Wasmtime GitHub notifications bot (May 08 2023 at 18:01):

jameysharp created PR review comment:

This feels backwards to me, but I think it's probably correct.

The instruction using Hi20 comes first, followed by the Lo12. So if the Hi20 label is at address x, then Lo12 is at address x+4.

Based on that, if we want to compute the same address for both labels, I expected that either Hi20 should use x+4, or Lo12 should use x-4.

Of the two instructions, the one which actually examines the program counter is auipc, so I think we need to compute the offset relative to that instruction. So I think you are correct that it's Lo12 that needs adjustment.

But I guess here we aren't given the address of the instruction, right? offset is the distance from this instruction to the address we want, or in other words, target_address - insn_address. So if insn_address is x-4, then offset is target_address - (x - 4), which is equivalent to (target_address - x) + 4.

And that's what you've implemented. If there's a way to make the comment more clear that'd be fantastic, but I'm at least reasonably convinced that this is correct.

view this post on Zulip Wasmtime GitHub notifications bot (May 08 2023 at 20:37):

afonso360 updated PR #6324.

view this post on Zulip Wasmtime GitHub notifications bot (May 09 2023 at 21:44):

elliottt created PR review comment:

It looks like we're discovering some gaps in capstone :(

view this post on Zulip Wasmtime GitHub notifications bot (May 09 2023 at 21:55):

elliottt created PR review comment:

Thank you for the additional comment, @afonso360!

view this post on Zulip Wasmtime GitHub notifications bot (May 09 2023 at 22:40):

elliottt submitted PR review:

This looks good to me, just want to confirm that these offsets are ignored on purpose though :)

view this post on Zulip Wasmtime GitHub notifications bot (May 09 2023 at 22:40):

elliottt submitted PR review:

This looks good to me, just want to confirm that these offsets are ignored on purpose though :)

view this post on Zulip Wasmtime GitHub notifications bot (May 09 2023 at 22:40):

elliottt created PR review comment:

Should the offset be used in the else branch?

view this post on Zulip Wasmtime GitHub notifications bot (May 09 2023 at 22:40):

elliottt created PR review comment:

Same offset question here.

view this post on Zulip Wasmtime GitHub notifications bot (May 09 2023 at 22:45):

afonso360 created PR review comment:

base here has both the register and the offset inside

view this post on Zulip Wasmtime GitHub notifications bot (May 09 2023 at 22:48):

elliottt submitted PR review:

Looks good to me, thank you @afonso360!

view this post on Zulip Wasmtime GitHub notifications bot (May 09 2023 at 22:52):

afonso360 created PR review comment:

Yeah, unfortunately capstone doesn't recognize any V extension instructions :disappointed:

view this post on Zulip Wasmtime GitHub notifications bot (May 09 2023 at 23:29):

afonso360 merged PR #6324.


Last updated: Dec 23 2024 at 12:05 UTC