Stream: git-wasmtime

Topic: wasmtime / issue #12197 Cranelift: Tracking Issue: Missin...


view this post on Zulip Wasmtime GitHub notifications bot (Dec 22 2025 at 07:00):

abc767234318 opened issue #12197:

As suggested in #12171 , I am aggregating the missing lowering rules and internal errors I've encountered into a single tracking issue. I will update this list as I discover more cases.

X86_64

aarch64

S390x

view this post on Zulip Wasmtime GitHub notifications bot (Dec 22 2025 at 07:00):

abc767234318 added the bug label to Issue #12197.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 22 2025 at 07:00):

abc767234318 added the cranelift label to Issue #12197.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 22 2025 at 16:27):

alexcrichton added the cranelift:area:aarch64 label to Issue #12197.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 22 2025 at 16:27):

alexcrichton added the cranelift:area:x86 label to Issue #12197.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 22 2025 at 16:27):

alexcrichton added the cranelift:area:s390x label to Issue #12197.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 23 2025 at 16:29):

jgraef commented on issue #12197:

insertlane on i32x2 returns an error as well:

should be implemented in ISLE: inst = `v5 = insertlane.i32x2 v4, v3, 1`, type = `Some(types::I32X2)`

<details>
<summary>Click to view reproduction (.clif)</summary>

function %main() -> f32x4, i32x2 system_v {
    ss0 = explicit_slot 4, align = 4, key = 0
    ss1 = explicit_slot 4, align = 4, key = 1

block0:
    v0 = iconst.i32 1
    stack_store v0, ss0  ; v0 = 1
    v1 = iconst.i32 2
    stack_store v1, ss1  ; v1 = 2
    v2 = stack_load.i32 ss0
    v3 = stack_load.i32 ss1
    v4 = splat.i32x2 v2
    v5 = insertlane v4, v3, 1
    v6 = f32const 0.0
    v7 = splat.f32x4 v6  ; v6 = 0.0
    return v7, v5
}

</details>

view this post on Zulip Wasmtime GitHub notifications bot (Dec 23 2025 at 16:46):

jgraef edited a comment on issue #12197:

insertlane on i32x2 returns an error as well:

should be implemented in ISLE: inst = `v5 = insertlane.i32x2 v4, v3, 1`, type = `Some(types::I32X2)`

<details>
<summary>Click to view reproduction (.clif)</summary>

function %main() -> f32x4, i32x2 system_v {
    ss0 = explicit_slot 4, align = 4, key = 0
    ss1 = explicit_slot 4, align = 4, key = 1

block0:
    v0 = iconst.i32 1
    stack_store v0, ss0  ; v0 = 1
    v1 = iconst.i32 2
    stack_store v1, ss1  ; v1 = 2
    v2 = stack_load.i32 ss0
    v3 = stack_load.i32 ss1
    v4 = splat.i32x2 v2
    v5 = insertlane v4, v3, 1
    v6 = f32const 0.0
    v7 = splat.f32x4 v6  ; v6 = 0.0
    return v7, v5
}

</details>

Edit: error occurs when compiling to x86_64

view this post on Zulip Wasmtime GitHub notifications bot (Dec 23 2025 at 19:39):

theotherjimmy commented on issue #12197:

I'll handle the s390x panic. Thanks for the reproducer.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 23 2025 at 20:01):

theotherjimmy edited a comment on issue #12197:

I'll handle the s390x panic. Thanks for the reproducer. I noticed that the reproducer actually does not use v10 at all, using v11 only


It looks like this affects all binary operations on floating point numbers. I was able to get almost the same failure with the following combinations:

view this post on Zulip Wasmtime GitHub notifications bot (Dec 23 2025 at 20:01):

theotherjimmy edited a comment on issue #12197:

I'll handle the s390x panic. Thanks for the reproducer. I noticed that the reproducer actually does not use v10 at all, using v11 only


It looks like this affects all binary operations on floating point numbers. I was able to get almost the same failure with the following combinations:

view this post on Zulip Wasmtime GitHub notifications bot (Dec 23 2025 at 20:01):

theotherjimmy edited a comment on issue #12197:

I'll handle the s390x panic. Thanks for the reproducer. I noticed that the reproducer actually does not use v10 at all, using v11 only


It looks like this affects all binary operations on floating point numbers. I was able to get almost the same failure with the following combinations:

I think this will be a bit more of a lift, since these operations require moving the floating point numbers into GPRs, then moving the result back.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 23 2025 at 20:04):

theotherjimmy edited a comment on issue #12197:

I'll handle the s390x panic. Thanks for the reproducer. I noticed that the reproducer actually does not use v10 at all, using v11 only


It looks like this affects all binary operations on floating point numbers. I was able to get almost the same failure with the following combinations:

That implies that the following reproducer would be simpler:

function %bnot_f32() -> f32 fast {
block0:
  v10 = f32const -0x1.fffffep127
  v21 = bnot v10
  return  v21
}

I think this will be a bit more of a lift, since these operations require moving the floating point numbers into GPRs, then moving the result back.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 23 2025 at 20:04):

theotherjimmy edited a comment on issue #12197:

I'll handle the s390x panic. Thanks for the reproducer. I noticed that the reproducer actually does not use v10 at all, using v11 only


It looks like this affects all binary operations on floating point numbers. I was able to get almost the same failure with the following combinations:

That implies that the following reproducer would be simpler:

function %bnot_f32(f32) -> f32 fast {
block0(v0: f32):
  v1 = bnot v0
  return  v1
}

I think this will be a bit more of a lift, since these operations require moving the floating point numbers into GPRs, then moving the result back.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 23 2025 at 20:11):

theotherjimmy edited a comment on issue #12197:

I'll handle the s390x panic. Thanks for the reproducer. I noticed that the reproducer actually does not use v10 at all, using v11 only


It looks like this affects all binary operations on floating point numbers. I was able to get almost the same failure with the following combinations:

That implies that the following reproducer would be simpler:

function %bnot_f32(f32) -> f32 fast {
block0(v0: f32):
  v1 = bnot v0
  return  v1
}

I think this will be a bit more of a lift, since these operations require moving the floating point numbers into GPRs, then moving the result back. Alternatively, it's possible to use the lower 16 vector register aliases with vector bit operations to avoid the moves. Making use of the floating point-vector register overlay would require that the register allocator to understand that this happens. Is this the case?

view this post on Zulip Wasmtime GitHub notifications bot (Dec 27 2025 at 13:34):

abc767234318 edited issue #12197:

As suggested in #12171 , I am aggregating the missing lowering rules and internal errors I've encountered into a single tracking issue. I will update this list as I discover more cases.

X86_64

aarch64

S390x

view this post on Zulip Wasmtime GitHub notifications bot (Jan 08 2026 at 13:16):

theotherjimmy commented on issue #12197:

s390x issue fixed in #12232 , can we remove the s390x tag? or is there something else needed?

view this post on Zulip Wasmtime GitHub notifications bot (Jan 08 2026 at 15:25):

alexcrichton removed the cranelift:area:s390x label from Issue #12197.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2026 at 09:43):

abc767234318 edited issue #12197:

As suggested in #12171 , I am aggregating the missing lowering rules and internal errors I've encountered into a single tracking issue. I will update this list as I discover more cases.

X86_64

aarch64

S390x

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2026 at 10:22):

abc767234318 edited issue #12197:

As suggested in #12171 , I am aggregating the missing lowering rules and internal errors I've encountered into a single tracking issue. I will update this list as I discover more cases.

X86_64

aarch64

RISCV64

S390x

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2026 at 12:47):

bungcip commented on issue #12197:

found some ISLE panic when using TLS

    Unsupported feature: should be implemented in ISLE: inst = `v0 = tls_value.i64 gv0`, type = `Some(types::I64)`

clif file:

test compile
target x86_64

function %tls_repro() -> i32 system_v {
    gv0 = symbol colocated tls userextname0

block0:
    v0 = global_value.i64 gv0
    v1 = load.i32 v0
    return v1
}

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2026 at 13:18):

bjorn3 commented on issue #12197:

You need to set the tls_model option to elf_gd, macho or coff depending on the target object file for TLS to work. Different object file formats require different sequences to perform TLS accesses.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 30 2026 at 09:13):

bungcip commented on issue #12197:

thank @bjorn3 for the info, i finally get the neccesary docs for setting tls_model and make the crash gone

view this post on Zulip Wasmtime GitHub notifications bot (Apr 26 2026 at 11:43):

ahqsoftwares commented on issue #12197:

Also - I think insertlane.i8x8 for index 1 is not implemented

thread '<unnamed>' (6032) panicked at savm\src\acaot\native\cranelift\mod.rs:318:25:
called `Result::unwrap()` on an `Err` value: Unsupported("should be implemented in ISLE: inst = `v57 = insertlane.i8x8 v55, v56, 1`, type = `Some(types::I8X8)`")

<details>
<summary>Click to view IR</summary>

function u0:0(i64 vmctx) windows_fastcall {
    ss0 = explicit_dynamic_slot 192, align = 64
    ss1 = explicit_dynamic_slot 64, align = 64
    gv0 = vmctx

block0(v0: i64):
    v75 = iconst.i64 0
    v24 -> v75
    v74 -> v75
    v73 = iconst.i64 0
    v13 -> v73
    v72 -> v73
    v1 = global_value.i64 gv0
    v2 = load.i64 notrap aligned v1+72
    v3 = load.i64 notrap aligned v1+88
    v4 = iconst.i64 4
    v5 = band v3, v4  ; v4 = 4
    brif v5, block2, block1

block1:
    v6 = iconst.i64 100
    v66 -> v6
    v7 = iconst.i64 200
    v67 -> v7
    v8 = bitcast.i64 little v6  ; v6 = 100
    v9 = iconst.i64 0
    v10 = bitcast.i64 little v7  ; v7 = 200
    v11 = iconst.i64 0
    v12 = iadd v8, v10
    v68 -> v12
    v14 = bitcast.i64 little v13  ; v13 = 0
    v15 = iconst.i64 0x00c8_001e_c81e
    v69 -> v15
    v16 = iconst.i64 0x0320_0384_381e
    v70 -> v16
    v17 = bitcast.i32x2 little v15  ; v15 = 0x00c8_001e_c81e
    v18 = iconst.i32 0
    v19 = extractlane v17, 1
    v20 = bitcast.i32x2 little v16  ; v16 = 0x0320_0384_381e
    v21 = iconst.i32 0
    v22 = extractlane v20, 1
    v23 = iadd v19, v22
    v25 = bitcast.i32x2 little v24  ; v24 = 0
    v26 = insertlane v25, v23, 1
    v27 = bitcast.i64 little v26
    v28 = bitcast.i16x4 little v15  ; v15 = 0x00c8_001e_c81e
    v29 = iconst.i16 0
    v30 = extractlane v28, 1
    v31 = bitcast.i16x4 little v16  ; v16 = 0x0320_0384_381e
    v32 = iconst.i16 0
    v33 = extractlane v31, 1
    v34 = iadd v30, v33
    v35 = bitcast.i16x4 little v27
    v36 = insertlane v35, v34, 1
    v37 = bitcast.i64 little v36
    v38 = bitcast.i8x8 little v15  ; v15 = 0x00c8_001e_c81e
    v39 = iconst.i8 0
    v40 = scalar_to_vector.i8x2 v39  ; v39 = 0
    v41 = extractlane v38, 0
    v42 = insertlane v40, v41, 0
    v43 = extractlane v38, 1
    v44 = insertlane v42, v43, 1
    v45 = bitcast.i8x8 little v16  ; v16 = 0x0320_0384_381e
    v46 = iconst.i8 0
    v47 = scalar_to_vector.i8x2 v46  ; v46 = 0
    v48 = extractlane v45, 0
    v49 = insertlane v47, v48, 0
    v50 = extractlane v45, 1
    v51 = insertlane v49, v50, 1
    v52 = iadd v44, v51
    v53 = bitcast.i8x8 little v37
    v54 = extractlane v52, 0
    v55 = insertlane v53, v54, 0
    v56 = extractlane v52, 1
    v57 = insertlane v55, v56, 1
    v58 = bitcast.i64 little v57
    v71 -> v58
    jump block5

block2:
    v59 = load.i64 notrap aligned v1+96
    jump block3

block3:
    trap user30

block4:
    v60 = stack_addr.i64 ss0
    v61 = load.i64 aligned can_move v1+64
    v62 = load.i64x2 aligned v60
    store aligned v62, v61
    v63 = load.i64x2 aligned v60+16
    store aligned v63, v61+16
    v64 = load.i64x2 aligned v60+32
    store aligned v64, v61+32
    v65 = load.i64x2 aligned v60+48
    store aligned v65, v61+48
    return

block5:
    store.i64 notrap aligned v66, v1  ; v66 = 100
    store.i64 notrap aligned v67, v1+8  ; v67 = 200
    store.i64 notrap aligned v68, v1+16
    store.i64 notrap aligned v69, v1+24  ; v69 = 0x00c8_001e_c81e
    store.i64 notrap aligned v70, v1+32  ; v70 = 0x0320_0384_381e
    store.i64 notrap aligned v71, v1+40
    return
}

</details>

view this post on Zulip Wasmtime GitHub notifications bot (Apr 26 2026 at 12:42):

bjorn3 commented on issue #12197:

That is not valid ir. Does the ir verifier catch that? If not a rule for it amd extractlane should be added to the verifier.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 26 2026 at 12:45):

ahqsoftwares commented on issue #12197:

ir verifier does not catch it actually

view this post on Zulip Wasmtime GitHub notifications bot (Apr 26 2026 at 13:00):

bjorn3 edited a comment on issue #12197:

That is not valid ir. Does the ir verifier catch that? If not, a rule for it and extractlane should be added to the verifier.


Last updated: May 03 2026 at 22:13 UTC