afonso360 opened PR #7123 from afonso360:riscv-zcb
to bytecodealliance:main
:
:wave: Hey,
This PR Adds the compressed instructions from the
Zcb
extension.Zcb
adds a few extra compressed instructions to the base C extension.Here's a rough outline:
c.lbu
/c.lh
/c.lh
/c.sb
/c.sh
these are compressed loads stores ofi16
andi8
types. Unlike the regular compressed loads and stores they only have a 1 or 2 bit offset field.c.{z,s}ext.*
these are compressed version of the various sign and zero extend instructions. They are defined in a new instruction format (CSZN) that has a single source/dest register. Additionally some of these instructions are only available in conjunction with other extensions (Zca
orZbb
) since the instruction that they compress is only defined in those extensions.c.mul
/c.not
these are just normal instructions.c.not
uses the new CSZN format,c.mul
uses the existing CA format.It looks like capstone does not recognize the instructions in this extension. It's not totally surprising since its a fairly new extension, and it also doesn't recognize the uncompressed versions of some of these instructions (i.e.
Zca
orZbb
ops).I've also ran the fuzzgen fuzzer on this for a couple of hours, and nothing has popped up yet.
afonso360 requested cfallin for a review on PR #7123.
afonso360 requested wasmtime-compiler-reviewers for a review on PR #7123.
afonso360 submitted PR review.
afonso360 created PR review comment:
This is a really weird sequence of instructions, it looks like we could delete the first move here! I'm not totally sure why this happens.
afonso360 updated PR #7123.
afonso360 updated PR #7123.
alexcrichton submitted PR review.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
It might be worth looking at the vcode for this via
RUST_LOG=trace
(or at least that's what I use) since it may be that the lowering layer is creating this rather than the regalloc layer. If it's not present in vcode after lowering though then it'd be regalloc, and I'm not sure why that would be.
afonso360 submitted PR review.
afonso360 created PR review comment:
I've compiled the following function on the current
main
branch, just so that It's slightly easier to reproduce:function %c_lh(i64) -> i16, i16 { block0(v0: i64): v1 = load.i16 v0+0 v2 = load.i16 v0+2 return v1, v2 }
The vcode I get before regalloc is:
VCode { Entry block: 0 v193 := v196 v194 := v195 Block 0: (original IR block: block0) (instruction range: 0 .. 4) Inst 0: args v192=a0 Inst 1: lh v196,0(v192) Inst 2: lh v195,2(v192) Inst 3: rets v193=a0 v194=a1 }
It doesen't look like there's anything out of the normal to me. But I'm not too familiar at looking at these sorts of regalloc inputs.
<details>
<summary> Here's the full trace log: </summary>
afonso@DESKTOP-1AHKMV2:~/git/wasmtime/cranelift$ RUST_LOG=trace cargo run -- test ./lmao.clif Blocking waiting for file lock on build directory Finished dev [unoptimized + debuginfo] target(s) in 45.38s Running `/home/afonso/git/wasmtime/target/debug/clif-util test ./lmao.clif` INFO file_per_thread_logger > Set up logging; filename prefix is cranelift.dbg. INFO file_per_thread_logger > Set up logging; filename prefix is cranelift.dbg. INFO file_per_thread_logger > Set up logging; filename prefix is cranelift.dbg. INFO file_per_thread_logger > Set up logging; filename prefix is cranelift.dbg. INFO file_per_thread_logger > Set up logging; filename prefix is cranelift.dbg. INFO file_per_thread_logger > Set up logging; filename prefix is cranelift.dbg. INFO file_per_thread_logger > Set up logging; filename prefix is cranelift.dbg. INFO file_per_thread_logger > Set up logging; filename prefix is cranelift.dbg. INFO file_per_thread_logger > Set up logging; filename prefix is cranelift.dbg. INFO file_per_thread_logger > Set up logging; filename prefix is cranelift.dbg. INFO file_per_thread_logger > Set up logging; filename prefix is cranelift.dbg. INFO file_per_thread_logger > Set up logging; filename prefix is cranelift.dbg. INFO file_per_thread_logger > Set up logging; filename prefix is cranelift.dbg. INFO file_per_thread_logger > Set up logging; filename prefix is cranelift.dbg. INFO file_per_thread_logger > Set up logging; filename prefix is cranelift.dbg. INFO file_per_thread_logger > Set up logging; filename prefix is cranelift.dbg. INFO file_per_thread_logger > Set up logging; filename prefix is cranelift.dbg. DEBUG cranelift_codegen::timing > timing: Starting Processing test file, (during <no pass>) INFO cranelift_filetests::runone > --- File: ./lmao.clif DEBUG cranelift_codegen::timing > timing: Starting Parsing textual Cranelift IR, (during Processing test file) DEBUG cranelift_codegen::timing > timing: Ending Parsing textual Cranelift IR: 0ms DEBUG cranelift_codegen::timing > timing: Starting Verify Cranelift IR, (during Processing test file) DEBUG cranelift_codegen::timing > timing: Starting Control flow graph, (during Verify Cranelift IR) DEBUG cranelift_codegen::timing > timing: Ending Control flow graph: 0ms DEBUG cranelift_codegen::timing > timing: Starting Dominator tree, (during Verify Cranelift IR) DEBUG cranelift_codegen::timing > timing: Ending Dominator tree: 0ms DEBUG cranelift_codegen::timing > timing: Ending Verify Cranelift IR: 0ms INFO cranelift_filetests::subtest > Test: compile(%c_lh) riscv64 DEBUG cranelift_codegen::timing > timing: Starting Compilation passes, (during Processing test file) DEBUG cranelift_codegen::timing > timing: Starting Verify Cranelift IR, (during Compilation passes) DEBUG cranelift_codegen::timing > timing: Starting Control flow graph, (during Verify Cranelift IR) DEBUG cranelift_codegen::timing > timing: Ending Control flow graph: 0ms DEBUG cranelift_codegen::timing > timing: Starting Dominator tree, (during Verify Cranelift IR) DEBUG cranelift_codegen::timing > timing: Ending Dominator tree: 0ms DEBUG cranelift_codegen::timing > timing: Ending Verify Cranelift IR: 0ms DEBUG cranelift_codegen::context > Number of CLIF instructions to optimize: 3 DEBUG cranelift_codegen::context > Number of CLIF blocks to optimize: 1 TRACE cranelift_codegen::context > Optimizing (opt level None): function %c_lh(i64) -> i16, i16 fast { block0(v0: i64): v1 = load.i16 v0 v2 = load.i16 v0+2 return v1, v2 } DEBUG cranelift_codegen::timing > timing: Starting Control flow graph, (during Compilation passes) DEBUG cranelift_codegen::timing > timing: Ending Control flow graph: 0ms TRACE cranelift_codegen::legalizer > Pre-legalization function: function %c_lh(i64) -> i16, i16 fast { block0(v0: i64): v1 = load.i16 v0 v2 = load.i16 v0+2 return v1, v2 } TRACE cranelift_codegen::legalizer > Post-legalization function: function %c_lh(i64) -> i16, i16 fast { block0(v0: i64): v1 = load.i16 v0 v2 = load.i16 v0+2 return v1, v2 } DEBUG cranelift_codegen::timing > timing: Starting Verify Cranelift IR, (during Compilation passes) DEBUG cranelift_codegen::timing > timing: Starting Control flow graph, (during Verify Cranelift IR) DEBUG cranelift_codegen::timing > timing: Ending Control flow graph: 0ms DEBUG cranelift_codegen::timing > timing: Starting Dominator tree, (during Verify Cranelift IR) DEBUG cranelift_codegen::timing > timing: Ending Dominator tree: 0ms DEBUG cranelift_codegen::timing > timing: Ending Verify Cranelift IR: 0ms DEBUG cranelift_codegen::timing > timing: Starting Dominator tree, (during Compilation passes) DEBUG cranelift_codegen::timing > timing: Ending Dominator tree: 0ms DEBUG cranelift_codegen::timing > timing: Starting Remove unreachable blocks, (during Compilation passes) DEBUG cranelift_codegen::timing > timing: Ending Remove unreachable blocks: 0ms DEBUG cranelift_codegen::timing > timing: Starting Verify Cranelift IR, (during Compilation passes) DEBUG cranelift_codegen::timing > timing: Starting Control flow graph, (during Verify Cranelift IR) DEBUG cranelift_codegen::timing > timing: Ending Control flow graph: 0ms DEBUG cranelift_codegen::timing > timing: Starting Dominator tree, (during Verify Cranelift IR) DEBUG cranelift_codegen::timing > timing: Ending Dominator tree: 0ms DEBUG cranelift_codegen::timing > timing: Ending Verify Cranelift IR: 0ms DEBUG cranelift_codegen::timing > timing: Starting Remove constant phi-nodes, (during Compilation passes) DEBUG cranelift_codegen::remove_constant_phis > do_remove_constant_phis: done, 1 iters. 0 formals, of which 0 const. DEBUG cranelift_codegen::timing > timing: Ending Remove constant phi-nodes: 0ms DEBUG cranelift_codegen::timing > timing: Starting Verify Cranelift IR, (during Compilation passes) DEBUG cranelift_codegen::timing > timing: Starting Control flow graph, (during Verify Cranelift IR) DEBUG cranelift_codegen::timing > timing: Ending Control flow graph: 0ms DEBUG cranelift_codegen::timing > timing: Starting Dominator tree, (during Verify Cranelift IR) DEBUG cranelift_codegen::timing > timing: Ending Dominator tree: 0ms DEBUG cranelift_codegen::timing > timing: Ending Verify Cranelift IR: 0ms TRACE cranelift_codegen::machinst::abi > ABISig: sig Signature { params: [AbiParam { value_type: types::I64, purpose: Normal, extension: None }], returns: [AbiParam { value_type: types::I16, purpose: Normal, extension: None }, AbiParam { value_type: types::I16, purpose: Normal, extension: None }], call_conv: Fast } => args end = 3 rets end = 2 arg stack = 0 ret stack = 0 stack_ret_arg = false TRACE cranelift_codegen::machinst::abi > ABI: func signature Signature { params: [AbiParam { value_type: types::I64, purpose: Normal, extension: None }], returns: [AbiParam { value_type: types::I16, purpose: Normal, extension: None }, AbiParam { value_type: types::I16, purpose: Normal, extension: None }], call_conv: Fast } TRACE cranelift_codegen::machinst::blockorder > BlockLoweringOrder: function body function %c_lh(i64) -> i16, i16 fast { block0(v0: i64): v1 = load.i16 v0 v2 = load.i16 v0+2 return v1, v2 } TRACE cranelift_codegen::machinst::blockorder > BlockLoweringOrder: BlockLoweringOrder { lowered_order: [ Orig { block: block0, }, ], lowered_succ_indices: [], lowered_succ_ranges: [ ( None, 0..0, ), ], cold_blocks: {}, indirect_branch_targets: {}, } TRACE cranelift_codegen::machinst::lower > bb block0 param v0: regs ValueRegs { parts: [v192, v2097151] } TRACE cranelift_codegen::machinst::lower > bb block0 inst inst0 (Load { opcode: Load, arg: v0, flags: MemFlags { bits: 0 }, offset: Offset32(0) }): result v1 regs ValueRegs { parts: [v193, v2097151] } TRACE cranelift_codegen::machinst::lower > bb block0 inst inst1 (Load { opcode: Load, arg: v0, flags: MemFlags { bits: 0 }, offset: Offset32(2) }): result v2 regs ValueRegs { parts: [v194, v2097151] } TRACE cranelift_codegen::machinst::lower > bb block0 inst inst0 has color 1 TRACE cranelift_codegen::machinst::lower > -> side-effecting; incrementing color for next inst TRACE cranelift_codegen::machinst::lower > bb block0 inst inst1 has color 2 TRACE cranelift_codegen::machinst::lower > -> side-effecting; incrementing color for next inst TRACE cranelift_codegen::machinst::lower > bb block0 inst inst2 has color 3 TRACE cranelift_codegen::machinst::lower > -> side-effecting; incrementing color for next inst TRACE cranelift_codegen::machinst::lower [message truncated]
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Ah yeah alas that looks "good" to me so the behavior causing this I think relies on regalloc behavior, and that I'm much less familiar with so I won't be able to help much.
afonso360 merged PR #7123.
Last updated: Jan 24 2025 at 00:11 UTC