bjorn3 commented on issue #5951:
Are you sure this is correct and not a case where two functions end up more than the max relocation distance away by chance? We don't handle that case correctly right now. See https://github.com/bytecodealliance/wasmtime/issues/4000.
afonso360 commented on issue #5951:
I don't think so, I gave this a pretty good run in the fuzzer (~24h) with these changes and it stopped complaining. It also fixed the same bug that was previously reported by the fuzzer.
Also, would that happen with 4-5 functions in the test case? I'll try to get it again, but I think that was how many there were.
afonso360 edited a comment on issue #5951:
I don't think so, I gave this a pretty good run in the fuzzer (~24h) with these changes and it stopped complaining. It also fixed the same bug that was previously reported by the fuzzer.
Also, could that happen with 4-5 functions in the test case? I'll try to get it again, but I think that was how many there were.
bjorn3 commented on issue #5951:
Also, could that happen with 4-5 functions in the test case? I'll try to get it again, but I think that was how many there were.
If those functions don't fit in a single page and you are very unlucky, yes it can happen.
jameysharp commented on issue #5951:
@elliottt and I just spent a while puzzling over this. We believe that something like this PR is necessary for correctness, and that switching to
wrapping_sub
is the only correct solution. But we really had to think about it.The point of this calculation is that
hi20
should end up being close enough topcrel
that a _signed_ 12-bit offset can hold the difference. (That it's signed wasn't clear from either the ABI doc or this code, and it'd be nice to have a comment here saying so.) This explains why subtracting one from the other underflows sometimes: the result is actually expected to be negative sometimes. In fact it should happen for exactly half the possible values ofpcrel
.Trevor had a suggestion I really liked: Do all the intermediate arithmetic for
pcrel
oni32
, and only convert tou32
at the end, when patching the instructions. This felt right intuitively, sincelo12
is meant to be interpreted as signed. Unfortunately, if we do that without also changing towrapping_sub
like you've done here, it's still possible to overflow the subtraction whenpcrel
is greater thani32::MAX-0x800
.Since
wrapping_sub
is required either way, I guess we might as well stick with unsigned arithmetic, and merge this PR as-is.One thing that confused us, if you want to add a comment into this PR: Unlike the ABI documentation linked in this code,
hi20
isn't right-shifted 12 bits. I believe that's because you would otherwise have to left-shift it again to place it inauipc
's immediate field, right?I also thought about defining
lo12
with... << 20
instead of... & 0xFFF
, so both values are aligned in the appropriate immediate-operand fields. (The bit-masking isn't strictly necessary since the masked-out bits get shifted out anyway.) But I think that is less clear than the way you have it now.
afonso360 commented on issue #5951:
I've left the fuzzer running since yesterday on riscv64 (took about 18hours!) to try and find this again since I lost the original case.
<details>
<summary>Testcase</summary>;; Run test case test interpret test run set enable_alias_analysis=false set use_egraphs=false set enable_simd=true set enable_safepoints=true set enable_llvm_abi_extensions=true set unwind_info=false set machine_code_cfg_info=true set enable_jump_tables=false set enable_heap_access_spectre_mitigation=false set enable_table_access_spectre_mitigation=false target riscv64gc function %d() system_v { ss0 = explicit_slot 126 ss1 = explicit_slot 126 ss2 = explicit_slot 0 sig0 = (f32) -> f32 system_v sig1 = (f64) -> f64 system_v sig2 = (f32) -> f32 system_v sig3 = (f64) -> f64 system_v sig4 = (f32) -> f32 system_v sig5 = (f64) -> f64 system_v fn0 = %CeilF32 sig0 fn1 = colocated %CeilF64 sig1 fn2 = colocated %FloorF32 sig2 fn3 = colocated %FloorF64 sig3 fn4 = colocated %TruncF32 sig4 fn5 = colocated %TruncF64 sig5 block0: v0 = iconst.i8 0 v1 = iconst.i16 0 v2 = iconst.i32 0 v3 = iconst.i64 0 v4 = uextend.i128 v3 ; v3 = 0 stack_store v4, ss0 stack_store v4, ss0+16 stack_store v4, ss0+32 stack_store v4, ss0+48 stack_store v4, ss0+64 stack_store v4, ss0+80 stack_store v4, ss0+96 stack_store v3, ss0+112 ; v3 = 0 stack_store v2, ss0+120 ; v2 = 0 stack_store v1, ss0+124 ; v1 = 0 stack_store v4, ss1 stack_store v4, ss1+16 stack_store v4, ss1+32 stack_store v4, ss1+48 stack_store v4, ss1+64 stack_store v4, ss1+80 stack_store v4, ss1+96 stack_store v3, ss1+112 ; v3 = 0 stack_store v2, ss1+120 ; v2 = 0 stack_store v1, ss1+124 ; v1 = 0 return } function %c() system_v { sig0 = () system_v sig1 = (f32) -> f32 system_v sig2 = (f64) -> f64 system_v sig3 = (f32) -> f32 system_v sig4 = (f64) -> f64 system_v sig5 = (f32) -> f32 system_v sig6 = (f64) -> f64 system_v fn0 = %d sig0 fn1 = %CeilF32 sig1 fn2 = %CeilF64 sig2 fn3 = %FloorF32 sig3 fn4 = %FloorF64 sig4 fn5 = %TruncF32 sig5 fn6 = %TruncF64 sig6 block0: v0 = iconst.i8 0 v1 = iconst.i16 0 v2 = iconst.i32 0 v3 = iconst.i64 0 v4 = uextend.i128 v3 ; v3 = 0 return } function %b(i32 sext, i8 sext, i8 sext, i128) system_v { sig0 = () system_v sig1 = () system_v sig2 = (f32) -> f32 system_v sig3 = (f64) -> f64 system_v sig4 = (f32) -> f32 system_v sig5 = (f64) -> f64 system_v sig6 = (f32) -> f32 system_v sig7 = (f64) -> f64 system_v fn0 = %d sig0 fn1 = %c sig1 fn2 = colocated %CeilF32 sig2 fn3 = %CeilF64 sig3 fn4 = %FloorF32 sig4 fn5 = %FloorF64 sig5 fn6 = %TruncF32 sig6 fn7 = %TruncF64 sig7 block0(v0: i32, v1: i8, v2: i8, v3: i128): v4 = iconst.i8 0 v5 = iconst.i16 0 v6 = iconst.i32 0 v7 = iconst.i64 0 v8 = uextend.i128 v7 ; v7 = 0 return } function %a(i64 sext, f32, i32 uext, i16 sext, f32, i16 sext, i64 uext, f64, i128 sext, i8 sext) -> f64, i128 sext, i8 sext, i8 sext, i16 sext, i64 sext, f64, i32 sext, i64 sext, i64 sext, i64 sext, i64 sext system_v { ss0 = explicit_slot 26 ss1 = explicit_slot 26 sig0 = () system_v sig1 = () system_v sig2 = (i32 sext, i8 sext, i8 sext, i128) system_v sig3 = (f32) -> f32 system_v sig4 = (f64) -> f64 system_v sig5 = (f32) -> f32 system_v sig6 = (f64) -> f64 system_v sig7 = (f32) -> f32 system_v sig8 = (f64) -> f64 system_v fn0 = colocated %d sig0 fn1 = colocated %c sig1 fn2 = colocated %b sig2 fn3 = colocated %CeilF32 sig3 fn4 = colocated %CeilF64 sig4 fn5 = colocated %FloorF32 sig5 fn6 = colocated %FloorF64 sig6 fn7 = colocated %TruncF32 sig7 fn8 = colocated %TruncF64 sig8 block0(v0: i64, v1: f32, v2: i32, v3: i16, v4: f32, v5: i16, v6: i64, v7: f64, v8: i128, v9: i8): v10 = iconst.i16 0xffff_ffff_ffff_9b9b v11 = iconst.i16 0xffff_ffff_ffff_9b9b v12 = iconst.i8 0 v13 = iconst.i16 0 v14 = iconst.i32 0 v15 = iconst.i64 0 v16 = uextend.i128 v15 ; v15 = 0 stack_store v16, ss0 stack_store v15, ss0+16 ; v15 = 0 stack_store v13, ss0+24 ; v13 = 0 stack_store v16, ss1 stack_store v15, ss1+16 ; v15 = 0 stack_store v13, ss1+24 ; v13 = 0 v46 = fcmp ne v4, v4 v47 = f32const -0x1.000000p0 v48 = f32const 0x1.000000p32 v49 = fcmp le v4, v47 ; v47 = -0x1.000000p0 v50 = fcmp ge v4, v48 ; v48 = 0x1.000000p32 v51 = bor v49, v50 v52 = bor v46, v51 v53 = f32const 0x1.000000p0 v54 = select v52, v53, v4 ; v53 = 0x1.000000p0 v17 = fcvt_to_uint.i32 v54 v42 = iconst.i16 0 v43 = iconst.i16 1 v44 = icmp eq v3, v42 ; v42 = 0 v45 = select v44, v43, v3 ; v43 = 1 v18 = urem v11, v45 ; v11 = 0xffff_ffff_ffff_9b9b v19 = bxor v5, v18 v20 = select_spectre_guard v9, v17, v17 v55 = fcmp ne v7, v7 v56 = f64const -0x1.0000000000000p0 v57 = f64const 0x1.0000000000000p32 v58 = fcmp le v7, v56 ; v56 = -0x1.0000000000000p0 v59 = fcmp ge v7, v57 ; v57 = 0x1.0000000000000p32 v60 = bor v58, v59 v61 = bor v55, v60 v62 = f64const 0x1.0000000000000p0 v63 = select v61, v62, v7 ; v62 = 0x1.0000000000000p0 v21 = fcvt_to_uint.i32 v63 v64 = fcmp ne v7, v7 v65 = f64const -0x1.0000000000000p0 v66 = f64const 0x1.0000000000000p32 v67 = fcmp le v7, v65 ; v65 = -0x1.0000000000000p0 v68 = fcmp ge v7, v66 ; v66 = 0x1.0000000000000p32 v69 = bor v67, v68 v70 = bor v64, v69 v71 = f64const 0x1.0000000000000p0 v72 = select v70, v71, v7 ; v71 = 0x1.0000000000000p0 v22 = fcvt_to_uint.i32 v72 v73 = fcmp ne v7, v7 v74 = f64const -0x1.0000000000000p0 v75 = f64const 0x1.0000000000000p32 v76 = fcmp le v7, v74 ; v74 = -0x1.0000000000000p0 v77 = fcmp ge v7, v75 ; v75 = 0x1.0000000000000p32 v78 = bor v76, v77 v79 = bor v73, v78 v80 = f64const 0x1.0000000000000p0 v81 = select v79, v80, v7 ; v80 = 0x1.0000000000000p0 v23 = fcvt_to_uint.i32 v81 v24 = bxor v4, v4 v82 = fcmp ne v7, v7 v83 = f64const -0x1.0000000000000p0 v84 = f64const 0x1.0000000000000p32 v85 = fcmp le v7, v83 ; v83 = -0x1.0000000000000p0 v86 = fcmp ge v7, v84 ; v84 = 0x1.0000000000000p32 v87 = bor v85, v86 v88 = bor v82, v87 v89 = f64const 0x1.0000000000000p0 v90 = select v88, v89, v7 ; v89 = 0x1.0000000000000p0 v25 = fcvt_to_uint.i32 v90 v91 = fcmp ne v7, v7 v92 = f64const -0x1.0000000000000p0 v93 = f64const 0x1.0000000000000p32 v94 = fcmp le v7, v92 ; v92 = -0x1.0000000000000p0 v95 = fcmp ge v7, v93 ; v93 = 0x1.0000000000000p32 v96 = bor v94, v95 v97 = bor v91, v96 v98 = f64const 0x1.0000000000000p0 v99 = select v97, v98, v7 ; v98 = 0x1.0000000000000p0 v26 = fcvt_to_uint.i32 v99 v27 = stack_addr.i64 ss1+20 v28 = load.i16 notrap v27 v29 = iadd v8, v8 v30 = iadd v29, v29 call fn0() v100 = fcmp ne v24, v24 v101 = f32const -0x1.000000p0 v102 = f32const 0x1.000000p32 v103 = fcmp le v24, v101 ; v101 = -0x1.000000p0 v104 = fcmp ge v24, v102 ; v102 = 0x1.000000p32 v105 = bor v103, v104 v106 = bor v100, v105 v107 = f32const 0x1.000000p0 v108 = select v106, v107, v24 ; v107 = 0x1.000000p0 v31 = fcvt_to_uint.i32 v108 v109 = fcmp ne v24, v24 v110 = f32const -0x1.000000p0 v111 = f32const 0x1.000000p32 v112 = fcmp le v24, v110 ; v110 = -0x1.000000p0 v113 = fcmp ge v24, v111 ; v111 = 0x1.000000p32 v114 = bor v112, v113 v115 = bor v109, v114 v116 = f32const 0x1.000000p0 v117 = select v115, v116, v24 ; v116 = 0x1.000000p0 v32 = fcvt_to_uint.i32 v117 v118 = fcmp ne v24, v24 v119 = f32const -0x1.000000p0 v120 = f32const 0x1.000000p32 v121 = fcmp le v24, v119 ; v119 = -0x1.000000p0 v122 = fcmp ge v24, v120 ; v120 = 0x1.000000p32 v123 = bor v121, v122 v124 = bor v118, v123 v125 = f32const 0x1.000000p0 v126 = select v124, v125, v24 ; v125 = 0x1.000000p0 v33 = fcvt_to_uint.i32 v126 v34 = fcvt_from_sint.f32 v0 v127 = fcmp ne v34, v34 v128 = f32const -0x1.000000p0 v129 = f32const 0x1.000000p32 v130 = fcmp le v34, v128 ; v128 = -0x1.000000p0 v131 = fcmp ge v34, v129 ; v129 = 0x1.000000p32 v132 = bor v130, v131 v133 = bor v127, v132 v134 = f32const 0x1.000000p0 v135 = select v133, v134, v34 ; v134 = 0x1.000000p0 v35 = fcvt_to_uint.i32 v135 v136 = fcmp ne v34, v34 v137 = f32const -0x1.000000p0 v138 = f32const 0x1.000000p32 v139 = fcmp le v34, v137 ; v137 = -0x1.000000p0 v140 = fcmp ge v34, v138 ; v138 = 0x1.000000p32 v141 = bor v139, v140 v142 = bor v136, v141 v143 = f32const 0x1.000000p0 v144 = select v142, v143, v34 ; v143 = 0x1.000000p0 v36 = fcvt_to_uint.i32 v144 v145 = fcmp ne v34, v34 v146 = f32const -0x1.000000p0 v147 = f32const 0x1.000000p32 v148 = fcmp le v34, v146 ; v146 = -0x1.000000p0 v149 = fcmp ge v34, v147 ; v147 = 0x1.000000p32 v150 = bor v148, v149 v151 = bor v145, v150 v152 = f32const 0x1.000000p0 v153 = select v151, v152, v34 ; v152 = 0x1.000000p0 v37 = fcvt_to_uint.i32 v153 v154 = fcmp ne v34, v34 v155 = f32const -0x1.000000p0 v156 = f32const 0x [message truncated]
afonso360 edited a comment on issue #5951:
I've left the fuzzer running since yesterday on riscv64 (took about 18hours!) to try and find this again since I lost the original case.
<details>
<summary>Testcase</summary>;; Run test case test interpret test run set enable_alias_analysis=false set use_egraphs=false set enable_simd=true set enable_safepoints=true set enable_llvm_abi_extensions=true set unwind_info=false set machine_code_cfg_info=true set enable_jump_tables=false set enable_heap_access_spectre_mitigation=false set enable_table_access_spectre_mitigation=false target riscv64gc function %d() system_v { ss0 = explicit_slot 126 ss1 = explicit_slot 126 ss2 = explicit_slot 0 sig0 = (f32) -> f32 system_v sig1 = (f64) -> f64 system_v sig2 = (f32) -> f32 system_v sig3 = (f64) -> f64 system_v sig4 = (f32) -> f32 system_v sig5 = (f64) -> f64 system_v fn0 = %CeilF32 sig0 fn1 = colocated %CeilF64 sig1 fn2 = colocated %FloorF32 sig2 fn3 = colocated %FloorF64 sig3 fn4 = colocated %TruncF32 sig4 fn5 = colocated %TruncF64 sig5 block0: v0 = iconst.i8 0 v1 = iconst.i16 0 v2 = iconst.i32 0 v3 = iconst.i64 0 v4 = uextend.i128 v3 ; v3 = 0 stack_store v4, ss0 stack_store v4, ss0+16 stack_store v4, ss0+32 stack_store v4, ss0+48 stack_store v4, ss0+64 stack_store v4, ss0+80 stack_store v4, ss0+96 stack_store v3, ss0+112 ; v3 = 0 stack_store v2, ss0+120 ; v2 = 0 stack_store v1, ss0+124 ; v1 = 0 stack_store v4, ss1 stack_store v4, ss1+16 stack_store v4, ss1+32 stack_store v4, ss1+48 stack_store v4, ss1+64 stack_store v4, ss1+80 stack_store v4, ss1+96 stack_store v3, ss1+112 ; v3 = 0 stack_store v2, ss1+120 ; v2 = 0 stack_store v1, ss1+124 ; v1 = 0 return } function %c() system_v { sig0 = () system_v sig1 = (f32) -> f32 system_v sig2 = (f64) -> f64 system_v sig3 = (f32) -> f32 system_v sig4 = (f64) -> f64 system_v sig5 = (f32) -> f32 system_v sig6 = (f64) -> f64 system_v fn0 = %d sig0 fn1 = %CeilF32 sig1 fn2 = %CeilF64 sig2 fn3 = %FloorF32 sig3 fn4 = %FloorF64 sig4 fn5 = %TruncF32 sig5 fn6 = %TruncF64 sig6 block0: v0 = iconst.i8 0 v1 = iconst.i16 0 v2 = iconst.i32 0 v3 = iconst.i64 0 v4 = uextend.i128 v3 ; v3 = 0 return } function %b(i32 sext, i8 sext, i8 sext, i128) system_v { sig0 = () system_v sig1 = () system_v sig2 = (f32) -> f32 system_v sig3 = (f64) -> f64 system_v sig4 = (f32) -> f32 system_v sig5 = (f64) -> f64 system_v sig6 = (f32) -> f32 system_v sig7 = (f64) -> f64 system_v fn0 = %d sig0 fn1 = %c sig1 fn2 = colocated %CeilF32 sig2 fn3 = %CeilF64 sig3 fn4 = %FloorF32 sig4 fn5 = %FloorF64 sig5 fn6 = %TruncF32 sig6 fn7 = %TruncF64 sig7 block0(v0: i32, v1: i8, v2: i8, v3: i128): v4 = iconst.i8 0 v5 = iconst.i16 0 v6 = iconst.i32 0 v7 = iconst.i64 0 v8 = uextend.i128 v7 ; v7 = 0 return } function %a(i64 sext, f32, i32 uext, i16 sext, f32, i16 sext, i64 uext, f64, i128 sext, i8 sext) -> f64, i128 sext, i8 sext, i8 sext, i16 sext, i64 sext, f64, i32 sext, i64 sext, i64 sext, i64 sext, i64 sext system_v { ss0 = explicit_slot 26 ss1 = explicit_slot 26 sig0 = () system_v sig1 = () system_v sig2 = (i32 sext, i8 sext, i8 sext, i128) system_v sig3 = (f32) -> f32 system_v sig4 = (f64) -> f64 system_v sig5 = (f32) -> f32 system_v sig6 = (f64) -> f64 system_v sig7 = (f32) -> f32 system_v sig8 = (f64) -> f64 system_v fn0 = colocated %d sig0 fn1 = colocated %c sig1 fn2 = colocated %b sig2 fn3 = colocated %CeilF32 sig3 fn4 = colocated %CeilF64 sig4 fn5 = colocated %FloorF32 sig5 fn6 = colocated %FloorF64 sig6 fn7 = colocated %TruncF32 sig7 fn8 = colocated %TruncF64 sig8 block0(v0: i64, v1: f32, v2: i32, v3: i16, v4: f32, v5: i16, v6: i64, v7: f64, v8: i128, v9: i8): v10 = iconst.i16 0xffff_ffff_ffff_9b9b v11 = iconst.i16 0xffff_ffff_ffff_9b9b v12 = iconst.i8 0 v13 = iconst.i16 0 v14 = iconst.i32 0 v15 = iconst.i64 0 v16 = uextend.i128 v15 ; v15 = 0 stack_store v16, ss0 stack_store v15, ss0+16 ; v15 = 0 stack_store v13, ss0+24 ; v13 = 0 stack_store v16, ss1 stack_store v15, ss1+16 ; v15 = 0 stack_store v13, ss1+24 ; v13 = 0 v46 = fcmp ne v4, v4 v47 = f32const -0x1.000000p0 v48 = f32const 0x1.000000p32 v49 = fcmp le v4, v47 ; v47 = -0x1.000000p0 v50 = fcmp ge v4, v48 ; v48 = 0x1.000000p32 v51 = bor v49, v50 v52 = bor v46, v51 v53 = f32const 0x1.000000p0 v54 = select v52, v53, v4 ; v53 = 0x1.000000p0 v17 = fcvt_to_uint.i32 v54 v42 = iconst.i16 0 v43 = iconst.i16 1 v44 = icmp eq v3, v42 ; v42 = 0 v45 = select v44, v43, v3 ; v43 = 1 v18 = urem v11, v45 ; v11 = 0xffff_ffff_ffff_9b9b v19 = bxor v5, v18 v20 = select_spectre_guard v9, v17, v17 v55 = fcmp ne v7, v7 v56 = f64const -0x1.0000000000000p0 v57 = f64const 0x1.0000000000000p32 v58 = fcmp le v7, v56 ; v56 = -0x1.0000000000000p0 v59 = fcmp ge v7, v57 ; v57 = 0x1.0000000000000p32 v60 = bor v58, v59 v61 = bor v55, v60 v62 = f64const 0x1.0000000000000p0 v63 = select v61, v62, v7 ; v62 = 0x1.0000000000000p0 v21 = fcvt_to_uint.i32 v63 v64 = fcmp ne v7, v7 v65 = f64const -0x1.0000000000000p0 v66 = f64const 0x1.0000000000000p32 v67 = fcmp le v7, v65 ; v65 = -0x1.0000000000000p0 v68 = fcmp ge v7, v66 ; v66 = 0x1.0000000000000p32 v69 = bor v67, v68 v70 = bor v64, v69 v71 = f64const 0x1.0000000000000p0 v72 = select v70, v71, v7 ; v71 = 0x1.0000000000000p0 v22 = fcvt_to_uint.i32 v72 v73 = fcmp ne v7, v7 v74 = f64const -0x1.0000000000000p0 v75 = f64const 0x1.0000000000000p32 v76 = fcmp le v7, v74 ; v74 = -0x1.0000000000000p0 v77 = fcmp ge v7, v75 ; v75 = 0x1.0000000000000p32 v78 = bor v76, v77 v79 = bor v73, v78 v80 = f64const 0x1.0000000000000p0 v81 = select v79, v80, v7 ; v80 = 0x1.0000000000000p0 v23 = fcvt_to_uint.i32 v81 v24 = bxor v4, v4 v82 = fcmp ne v7, v7 v83 = f64const -0x1.0000000000000p0 v84 = f64const 0x1.0000000000000p32 v85 = fcmp le v7, v83 ; v83 = -0x1.0000000000000p0 v86 = fcmp ge v7, v84 ; v84 = 0x1.0000000000000p32 v87 = bor v85, v86 v88 = bor v82, v87 v89 = f64const 0x1.0000000000000p0 v90 = select v88, v89, v7 ; v89 = 0x1.0000000000000p0 v25 = fcvt_to_uint.i32 v90 v91 = fcmp ne v7, v7 v92 = f64const -0x1.0000000000000p0 v93 = f64const 0x1.0000000000000p32 v94 = fcmp le v7, v92 ; v92 = -0x1.0000000000000p0 v95 = fcmp ge v7, v93 ; v93 = 0x1.0000000000000p32 v96 = bor v94, v95 v97 = bor v91, v96 v98 = f64const 0x1.0000000000000p0 v99 = select v97, v98, v7 ; v98 = 0x1.0000000000000p0 v26 = fcvt_to_uint.i32 v99 v27 = stack_addr.i64 ss1+20 v28 = load.i16 notrap v27 v29 = iadd v8, v8 v30 = iadd v29, v29 call fn0() v100 = fcmp ne v24, v24 v101 = f32const -0x1.000000p0 v102 = f32const 0x1.000000p32 v103 = fcmp le v24, v101 ; v101 = -0x1.000000p0 v104 = fcmp ge v24, v102 ; v102 = 0x1.000000p32 v105 = bor v103, v104 v106 = bor v100, v105 v107 = f32const 0x1.000000p0 v108 = select v106, v107, v24 ; v107 = 0x1.000000p0 v31 = fcvt_to_uint.i32 v108 v109 = fcmp ne v24, v24 v110 = f32const -0x1.000000p0 v111 = f32const 0x1.000000p32 v112 = fcmp le v24, v110 ; v110 = -0x1.000000p0 v113 = fcmp ge v24, v111 ; v111 = 0x1.000000p32 v114 = bor v112, v113 v115 = bor v109, v114 v116 = f32const 0x1.000000p0 v117 = select v115, v116, v24 ; v116 = 0x1.000000p0 v32 = fcvt_to_uint.i32 v117 v118 = fcmp ne v24, v24 v119 = f32const -0x1.000000p0 v120 = f32const 0x1.000000p32 v121 = fcmp le v24, v119 ; v119 = -0x1.000000p0 v122 = fcmp ge v24, v120 ; v120 = 0x1.000000p32 v123 = bor v121, v122 v124 = bor v118, v123 v125 = f32const 0x1.000000p0 v126 = select v124, v125, v24 ; v125 = 0x1.000000p0 v33 = fcvt_to_uint.i32 v126 v34 = fcvt_from_sint.f32 v0 v127 = fcmp ne v34, v34 v128 = f32const -0x1.000000p0 v129 = f32const 0x1.000000p32 v130 = fcmp le v34, v128 ; v128 = -0x1.000000p0 v131 = fcmp ge v34, v129 ; v129 = 0x1.000000p32 v132 = bor v130, v131 v133 = bor v127, v132 v134 = f32const 0x1.000000p0 v135 = select v133, v134, v34 ; v134 = 0x1.000000p0 v35 = fcvt_to_uint.i32 v135 v136 = fcmp ne v34, v34 v137 = f32const -0x1.000000p0 v138 = f32const 0x1.000000p32 v139 = fcmp le v34, v137 ; v137 = -0x1.000000p0 v140 = fcmp ge v34, v138 ; v138 = 0x1.000000p32 v141 = bor v139, v140 v142 = bor v136, v141 v143 = f32const 0x1.000000p0 v144 = select v142, v143, v34 ; v143 = 0x1.000000p0 v36 = fcvt_to_uint.i32 v144 v145 = fcmp ne v34, v34 v146 = f32const -0x1.000000p0 v147 = f32const 0x1.000000p32 v148 = fcmp le v34, v146 ; v146 = -0x1.000000p0 v149 = fcmp ge v34, v147 ; v147 = 0x1.000000p32 v150 = bor v148, v149 v151 = bor v145, v150 v152 = f32const 0x1.000000p0 v153 = select v151, v152, v34 ; v152 = 0x1.000000p0 v37 = fcvt_to_uint.i32 v153 v154 = fcmp ne v34, v34 v155 = f32const -0x1.000000p0 v156 = f32c [message truncated]
bjorn3 commented on issue #5951:
If it fits in a single page then it is likely not the issue I was talking about. Or does the fuzzer call finalize_definitions in between two function compilations? That would force them to end up in separately allocated pages.
afonso360 commented on issue #5951:
No, we define and declare all of them, and call
finalize_definitions
once inTestFileCompiler::compile
.
afonso360 edited a comment on issue #5951:
No, we define and declare all of them (inc trampolines), and call
finalize_definitions
once inTestFileCompiler::compile
.
afonso360 commented on issue #5951:
I believe that's because you would otherwise have to left-shift it again to place it in auipc's immediate field, right?
Yes, but also we need it without the left shifts to calculate lo12, and at that point there was a bunch of shifting going on which seemed even more confusing. I've added that remark as a comment though.
Could you double check if the comments match what you expected?
jameysharp commented on issue #5951:
You've covered almost everything that confused me. Thanks! The remaining bit is that
lo12
is also a signed offset, +/- 2kB, relative to PC+hi20.
Last updated: Nov 22 2024 at 16:03 UTC