Stream: git-wasmtime

Topic: wasmtime / Issue #1306 Cranelift: Register allocator ente...


view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2020 at 20:56):

abrown opened Issue #1306:

As I was attempting to compile some SIMD Wasm code, I discovered that wasmtime would enter an infinite loop in the register allocator. With logging enabled, the register allocator gets stuck with the number of global registers:

...
 DEBUG cranelift_codegen::regalloc::solver     > add_killed_var(v6985:FPR, from=%xmm0)
 DEBUG cranelift_codegen::regalloc::solver     > -> new var: v6985(FPR, from %xmm0, in)
 DEBUG cranelift_codegen::regalloc::solver     > real_solve for Solver { inputs_done: true,
  in:  [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  assignments: []
  vars: [v191(FPR, out, global, def), v6985(FPR, from %xmm0, in, 1)]
  moves: []
}

 DEBUG cranelift_codegen::regalloc::coloring   > Not enough global registers for v191, trying as local
 DEBUG cranelift_codegen::regalloc::solver     > real_solve for Solver { inputs_done: true,
  in:  [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  assignments: []
  vars: [v191(FPR, out, def, 1), v6985(FPR, from %xmm0, in, 1)]
  moves: []
}

 DEBUG cranelift_codegen::regalloc::coloring   >     color v191 -> %xmm0 (global to be replaced)
 DEBUG cranelift_codegen::regalloc::coloring   > Replacing global defs on v191 = copy.i64x2 v6985
 DEBUG cranelift_codegen::regalloc::coloring   >   + v191 = copy.i64x2 v6986 with v6986 in %xmm0
 DEBUG cranelift_codegen::regalloc::coloring   > Done: v6986 = copy.i64x2 v6985
 DEBUG cranelift_codegen::regalloc::coloring   > Coloring v191 = copy.i64x2 v6986
    from [ GPR: ---b--sd89012345 FPR32: -----------------789012345678901 FPR: ---------------- FLAG: f ]
 DEBUG cranelift_codegen::regalloc::coloring   >     kill v6986 in %xmm0 (local FPR)
 DEBUG cranelift_codegen::regalloc::coloring   >     glob [ GPR: a--b--sd89012345 FPR32: ----------------6789012345678901 FPR: ---------------- FLAG: f ]
...

You can see how v6986 is replacing v6985 above and this is the pattern that will repeat forever. You may notice as well that the code above is using 16 FPR registers which I temporarily added at https://github.com/abrown/wasmtime/blob/7b0463a24cdcf525057349c53c6a46a436c21a80/cranelift/codegen/meta/src/isa/x86/encodings.rs#L346 to see if relieving register pressure would get rid of the issue. It doesn't.

I extracted the function that I believe is causing the issue into clif [clif.txt](https://github.com/bytecodealliance/wasmtime/files/4326538/clif.txt) .txt (attached) and you can see below a couple of different ways of looking at the problem (bugpoint loops forever, compile fails). I can attach the original Wasm if that would be helpful but that introduces even more functions to worry about.

clif-util bugpoint scratch.clif loops forever.

To panic, run clif-util compile -dDpv scratch.clif.

https://github.com/abrown/wasmtime/tree/fix-simd-locals/cranelift

rustc 1.41.1, cargo 1.41.0, Fedora 31 on kernel 5.5.7

view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2020 at 20:56):

abrown labeled Issue #1306:

As I was attempting to compile some SIMD Wasm code, I discovered that wasmtime would enter an infinite loop in the register allocator. With logging enabled, the register allocator gets stuck with the number of global registers:

...
 DEBUG cranelift_codegen::regalloc::solver     > add_killed_var(v6985:FPR, from=%xmm0)
 DEBUG cranelift_codegen::regalloc::solver     > -> new var: v6985(FPR, from %xmm0, in)
 DEBUG cranelift_codegen::regalloc::solver     > real_solve for Solver { inputs_done: true,
  in:  [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  assignments: []
  vars: [v191(FPR, out, global, def), v6985(FPR, from %xmm0, in, 1)]
  moves: []
}

 DEBUG cranelift_codegen::regalloc::coloring   > Not enough global registers for v191, trying as local
 DEBUG cranelift_codegen::regalloc::solver     > real_solve for Solver { inputs_done: true,
  in:  [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  assignments: []
  vars: [v191(FPR, out, def, 1), v6985(FPR, from %xmm0, in, 1)]
  moves: []
}

 DEBUG cranelift_codegen::regalloc::coloring   >     color v191 -> %xmm0 (global to be replaced)
 DEBUG cranelift_codegen::regalloc::coloring   > Replacing global defs on v191 = copy.i64x2 v6985
 DEBUG cranelift_codegen::regalloc::coloring   >   + v191 = copy.i64x2 v6986 with v6986 in %xmm0
 DEBUG cranelift_codegen::regalloc::coloring   > Done: v6986 = copy.i64x2 v6985
 DEBUG cranelift_codegen::regalloc::coloring   > Coloring v191 = copy.i64x2 v6986
    from [ GPR: ---b--sd89012345 FPR32: -----------------789012345678901 FPR: ---------------- FLAG: f ]
 DEBUG cranelift_codegen::regalloc::coloring   >     kill v6986 in %xmm0 (local FPR)
 DEBUG cranelift_codegen::regalloc::coloring   >     glob [ GPR: a--b--sd89012345 FPR32: ----------------6789012345678901 FPR: ---------------- FLAG: f ]
...

You can see how v6986 is replacing v6985 above and this is the pattern that will repeat forever. You may notice as well that the code above is using 16 FPR registers which I temporarily added at https://github.com/abrown/wasmtime/blob/7b0463a24cdcf525057349c53c6a46a436c21a80/cranelift/codegen/meta/src/isa/x86/encodings.rs#L346 to see if relieving register pressure would get rid of the issue. It doesn't.

I extracted the function that I believe is causing the issue into clif [clif.txt](https://github.com/bytecodealliance/wasmtime/files/4326538/clif.txt) .txt (attached) and you can see below a couple of different ways of looking at the problem (bugpoint loops forever, compile fails). I can attach the original Wasm if that would be helpful but that introduces even more functions to worry about.

clif-util bugpoint scratch.clif loops forever.

To panic, run clif-util compile -dDpv scratch.clif.

https://github.com/abrown/wasmtime/tree/fix-simd-locals/cranelift

rustc 1.41.1, cargo 1.41.0, Fedora 31 on kernel 5.5.7

view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2020 at 20:56):

abrown labeled Issue #1306:

As I was attempting to compile some SIMD Wasm code, I discovered that wasmtime would enter an infinite loop in the register allocator. With logging enabled, the register allocator gets stuck with the number of global registers:

...
 DEBUG cranelift_codegen::regalloc::solver     > add_killed_var(v6985:FPR, from=%xmm0)
 DEBUG cranelift_codegen::regalloc::solver     > -> new var: v6985(FPR, from %xmm0, in)
 DEBUG cranelift_codegen::regalloc::solver     > real_solve for Solver { inputs_done: true,
  in:  [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  assignments: []
  vars: [v191(FPR, out, global, def), v6985(FPR, from %xmm0, in, 1)]
  moves: []
}

 DEBUG cranelift_codegen::regalloc::coloring   > Not enough global registers for v191, trying as local
 DEBUG cranelift_codegen::regalloc::solver     > real_solve for Solver { inputs_done: true,
  in:  [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  assignments: []
  vars: [v191(FPR, out, def, 1), v6985(FPR, from %xmm0, in, 1)]
  moves: []
}

 DEBUG cranelift_codegen::regalloc::coloring   >     color v191 -> %xmm0 (global to be replaced)
 DEBUG cranelift_codegen::regalloc::coloring   > Replacing global defs on v191 = copy.i64x2 v6985
 DEBUG cranelift_codegen::regalloc::coloring   >   + v191 = copy.i64x2 v6986 with v6986 in %xmm0
 DEBUG cranelift_codegen::regalloc::coloring   > Done: v6986 = copy.i64x2 v6985
 DEBUG cranelift_codegen::regalloc::coloring   > Coloring v191 = copy.i64x2 v6986
    from [ GPR: ---b--sd89012345 FPR32: -----------------789012345678901 FPR: ---------------- FLAG: f ]
 DEBUG cranelift_codegen::regalloc::coloring   >     kill v6986 in %xmm0 (local FPR)
 DEBUG cranelift_codegen::regalloc::coloring   >     glob [ GPR: a--b--sd89012345 FPR32: ----------------6789012345678901 FPR: ---------------- FLAG: f ]
...

You can see how v6986 is replacing v6985 above and this is the pattern that will repeat forever. You may notice as well that the code above is using 16 FPR registers which I temporarily added at https://github.com/abrown/wasmtime/blob/7b0463a24cdcf525057349c53c6a46a436c21a80/cranelift/codegen/meta/src/isa/x86/encodings.rs#L346 to see if relieving register pressure would get rid of the issue. It doesn't.

I extracted the function that I believe is causing the issue into clif [clif.txt](https://github.com/bytecodealliance/wasmtime/files/4326538/clif.txt) .txt (attached) and you can see below a couple of different ways of looking at the problem (bugpoint loops forever, compile fails). I can attach the original Wasm if that would be helpful but that introduces even more functions to worry about.

clif-util bugpoint scratch.clif loops forever.

To panic, run clif-util compile -dDpv scratch.clif.

https://github.com/abrown/wasmtime/tree/fix-simd-locals/cranelift

rustc 1.41.1, cargo 1.41.0, Fedora 31 on kernel 5.5.7

view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2020 at 20:56):

abrown edited Issue #1306:

As I was attempting to compile some SIMD Wasm code, I discovered that wasmtime would enter an infinite loop in the register allocator. With logging enabled, the register allocator gets stuck with the number of global registers:

...
 DEBUG cranelift_codegen::regalloc::solver     > add_killed_var(v6985:FPR, from=%xmm0)
 DEBUG cranelift_codegen::regalloc::solver     > -> new var: v6985(FPR, from %xmm0, in)
 DEBUG cranelift_codegen::regalloc::solver     > real_solve for Solver { inputs_done: true,
  in:  [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  assignments: []
  vars: [v191(FPR, out, global, def), v6985(FPR, from %xmm0, in, 1)]
  moves: []
}

 DEBUG cranelift_codegen::regalloc::coloring   > Not enough global registers for v191, trying as local
 DEBUG cranelift_codegen::regalloc::solver     > real_solve for Solver { inputs_done: true,
  in:  [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  assignments: []
  vars: [v191(FPR, out, def, 1), v6985(FPR, from %xmm0, in, 1)]
  moves: []
}

 DEBUG cranelift_codegen::regalloc::coloring   >     color v191 -> %xmm0 (global to be replaced)
 DEBUG cranelift_codegen::regalloc::coloring   > Replacing global defs on v191 = copy.i64x2 v6985
 DEBUG cranelift_codegen::regalloc::coloring   >   + v191 = copy.i64x2 v6986 with v6986 in %xmm0
 DEBUG cranelift_codegen::regalloc::coloring   > Done: v6986 = copy.i64x2 v6985
 DEBUG cranelift_codegen::regalloc::coloring   > Coloring v191 = copy.i64x2 v6986
    from [ GPR: ---b--sd89012345 FPR32: -----------------789012345678901 FPR: ---------------- FLAG: f ]
 DEBUG cranelift_codegen::regalloc::coloring   >     kill v6986 in %xmm0 (local FPR)
 DEBUG cranelift_codegen::regalloc::coloring   >     glob [ GPR: a--b--sd89012345 FPR32: ----------------6789012345678901 FPR: ---------------- FLAG: f ]
...

You can see how v6986 is replacing v6985 above and this is the pattern that will repeat forever. You may notice as well that the code above is using 16 FPR registers which I temporarily added at https://github.com/abrown/wasmtime/blob/7b0463a24cdcf525057349c53c6a46a436c21a80/cranelift/codegen/meta/src/isa/x86/encodings.rs#L346 to see if relieving register pressure would get rid of the issue. It doesn't.

I extracted the function that I believe is causing the issue into clif [clif.txt](https://github.com/bytecodealliance/wasmtime/files/4326538/clif.txt) .txt (attached) and you can see below a couple of different ways of looking at the problem (bugpoint loops forever, compile fails). I can attach the original Wasm if that would be helpful but that introduces even more functions to worry about.

clif.txt

clif-util bugpoint scratch.clif loops forever.

To panic, run clif-util compile -dDpv scratch.clif.

https://github.com/abrown/wasmtime/tree/fix-simd-locals/cranelift

rustc 1.41.1, cargo 1.41.0, Fedora 31 on kernel 5.5.7

view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2020 at 20:57):

abrown edited Issue #1306:

As I was attempting to compile some SIMD Wasm code, I discovered that wasmtime would enter an infinite loop in the register allocator. With logging enabled, the register allocator gets stuck with the number of global registers:

...
 DEBUG cranelift_codegen::regalloc::solver     > add_killed_var(v6985:FPR, from=%xmm0)
 DEBUG cranelift_codegen::regalloc::solver     > -> new var: v6985(FPR, from %xmm0, in)
 DEBUG cranelift_codegen::regalloc::solver     > real_solve for Solver { inputs_done: true,
  in:  [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  assignments: []
  vars: [v191(FPR, out, global, def), v6985(FPR, from %xmm0, in, 1)]
  moves: []
}

 DEBUG cranelift_codegen::regalloc::coloring   > Not enough global registers for v191, trying as local
 DEBUG cranelift_codegen::regalloc::solver     > real_solve for Solver { inputs_done: true,
  in:  [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  assignments: []
  vars: [v191(FPR, out, def, 1), v6985(FPR, from %xmm0, in, 1)]
  moves: []
}

 DEBUG cranelift_codegen::regalloc::coloring   >     color v191 -> %xmm0 (global to be replaced)
 DEBUG cranelift_codegen::regalloc::coloring   > Replacing global defs on v191 = copy.i64x2 v6985
 DEBUG cranelift_codegen::regalloc::coloring   >   + v191 = copy.i64x2 v6986 with v6986 in %xmm0
 DEBUG cranelift_codegen::regalloc::coloring   > Done: v6986 = copy.i64x2 v6985
 DEBUG cranelift_codegen::regalloc::coloring   > Coloring v191 = copy.i64x2 v6986
    from [ GPR: ---b--sd89012345 FPR32: -----------------789012345678901 FPR: ---------------- FLAG: f ]
 DEBUG cranelift_codegen::regalloc::coloring   >     kill v6986 in %xmm0 (local FPR)
 DEBUG cranelift_codegen::regalloc::coloring   >     glob [ GPR: a--b--sd89012345 FPR32: ----------------6789012345678901 FPR: ---------------- FLAG: f ]
...

You can see how v6986 is replacing v6985 above and this is the pattern that will repeat forever. You may notice as well that the code above is using 16 FPR registers which I temporarily added at https://github.com/abrown/wasmtime/blob/7b0463a24cdcf525057349c53c6a46a436c21a80/cranelift/codegen/meta/src/isa/x86/encodings.rs#L346 to see if relieving register pressure would get rid of the issue. It doesn't.

I extracted the function that I believe is causing the issue into the attached clif.txt and you can see below a couple of different ways of looking at the problem (bugpoint loops forever, compile fails). I can attach the original Wasm if that would be helpful but that introduces even more functions to worry about.

clif-util bugpoint scratch.clif loops forever.

To panic, run clif-util compile -dDpv scratch.clif.

https://github.com/abrown/wasmtime/tree/fix-simd-locals/cranelift

rustc 1.41.1, cargo 1.41.0, Fedora 31 on kernel 5.5.7

view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2020 at 20:58):

abrown edited Issue #1306:

As I was attempting to compile some SIMD Wasm code, I discovered that wasmtime would enter an infinite loop in the register allocator. With logging enabled, the register allocator gets stuck with the number of global registers:

...
 DEBUG cranelift_codegen::regalloc::solver     > add_killed_var(v6985:FPR, from=%xmm0)
 DEBUG cranelift_codegen::regalloc::solver     > -> new var: v6985(FPR, from %xmm0, in)
 DEBUG cranelift_codegen::regalloc::solver     > real_solve for Solver { inputs_done: true,
  in:  [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  assignments: []
  vars: [v191(FPR, out, global, def), v6985(FPR, from %xmm0, in, 1)]
  moves: []
}

 DEBUG cranelift_codegen::regalloc::coloring   > Not enough global registers for v191, trying as local
 DEBUG cranelift_codegen::regalloc::solver     > real_solve for Solver { inputs_done: true,
  in:  [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  assignments: []
  vars: [v191(FPR, out, def, 1), v6985(FPR, from %xmm0, in, 1)]
  moves: []
}

 DEBUG cranelift_codegen::regalloc::coloring   >     color v191 -> %xmm0 (global to be replaced)
 DEBUG cranelift_codegen::regalloc::coloring   > Replacing global defs on v191 = copy.i64x2 v6985
 DEBUG cranelift_codegen::regalloc::coloring   >   + v191 = copy.i64x2 v6986 with v6986 in %xmm0
 DEBUG cranelift_codegen::regalloc::coloring   > Done: v6986 = copy.i64x2 v6985
 DEBUG cranelift_codegen::regalloc::coloring   > Coloring v191 = copy.i64x2 v6986
    from [ GPR: ---b--sd89012345 FPR32: -----------------789012345678901 FPR: ---------------- FLAG: f ]
 DEBUG cranelift_codegen::regalloc::coloring   >     kill v6986 in %xmm0 (local FPR)
 DEBUG cranelift_codegen::regalloc::coloring   >     glob [ GPR: a--b--sd89012345 FPR32: ----------------6789012345678901 FPR: ---------------- FLAG: f ]
...

You can see how v6986 is replacing v6985 above and this is the pattern that will repeat forever. You may notice as well that the code above is using 16 FPR registers which I temporarily added at https://github.com/abrown/wasmtime/blob/7b0463a24cdcf525057349c53c6a46a436c21a80/cranelift/codegen/meta/src/isa/x86/encodings.rs#L346 to see if relieving register pressure would get rid of the issue. It doesn't.

I extracted the function that I believe is causing the issue into the attached clif.txt and you can see below a couple of different ways of looking at the problem (bugpoint loops forever, compile fails). I can attach the original Wasm if that would be helpful but that introduces even more functions to worry about.

clif-util bugpoint clif.txt loops forever.

To panic, run clif-util compile -dDpv clif.txt:

$ target/debug/clif-util compile -dDpv text.clif
thread 'main' panicked at 'FPR8:%xmm5 is already free in [ GPR: -------d89012345 FPR32: -----5---------56789012345678901 FPR: -----5---------5 FLAG: f ]', cranelift/codegen/src/regalloc/register_set.rs:73:9

https://github.com/abrown/wasmtime/tree/fix-simd-locals/cranelift

rustc 1.41.1, cargo 1.41.0, Fedora 31 on kernel 5.5.7

view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2020 at 20:59):

abrown edited Issue #1306:

As I was attempting to compile some SIMD Wasm code, I discovered that wasmtime would enter an infinite loop in the register allocator. With logging enabled, the register allocator gets stuck with the number of global registers:

...
 DEBUG cranelift_codegen::regalloc::solver     > add_killed_var(v6985:FPR, from=%xmm0)
 DEBUG cranelift_codegen::regalloc::solver     > -> new var: v6985(FPR, from %xmm0, in)
 DEBUG cranelift_codegen::regalloc::solver     > real_solve for Solver { inputs_done: true,
  in:  [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  assignments: []
  vars: [v191(FPR, out, global, def), v6985(FPR, from %xmm0, in, 1)]
  moves: []
}

 DEBUG cranelift_codegen::regalloc::coloring   > Not enough global registers for v191, trying as local
 DEBUG cranelift_codegen::regalloc::solver     > real_solve for Solver { inputs_done: true,
  in:  [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  assignments: []
  vars: [v191(FPR, out, def, 1), v6985(FPR, from %xmm0, in, 1)]
  moves: []
}

 DEBUG cranelift_codegen::regalloc::coloring   >     color v191 -> %xmm0 (global to be replaced)
 DEBUG cranelift_codegen::regalloc::coloring   > Replacing global defs on v191 = copy.i64x2 v6985
 DEBUG cranelift_codegen::regalloc::coloring   >   + v191 = copy.i64x2 v6986 with v6986 in %xmm0
 DEBUG cranelift_codegen::regalloc::coloring   > Done: v6986 = copy.i64x2 v6985
 DEBUG cranelift_codegen::regalloc::coloring   > Coloring v191 = copy.i64x2 v6986
    from [ GPR: ---b--sd89012345 FPR32: -----------------789012345678901 FPR: ---------------- FLAG: f ]
 DEBUG cranelift_codegen::regalloc::coloring   >     kill v6986 in %xmm0 (local FPR)
 DEBUG cranelift_codegen::regalloc::coloring   >     glob [ GPR: a--b--sd89012345 FPR32: ----------------6789012345678901 FPR: ---------------- FLAG: f ]
...

You can see how v6986 is replacing v6985 above and this is the pattern that will repeat forever. You may notice as well that the code above is using 16 FPR registers which I temporarily added at https://github.com/abrown/wasmtime/blob/7b0463a24cdcf525057349c53c6a46a436c21a80/cranelift/codegen/meta/src/isa/x86/encodings.rs#L346 to see if relieving register pressure would get rid of the issue. It doesn't.

I extracted the function that I believe is causing the issue into the attached clif.txt and you can see below a couple of different ways of looking at the problem (bugpoint loops forever, compile fails). I can attach the original Wasm if that would be helpful but that introduces even more functions to worry about.

clif-util bugpoint clif.txt loops forever.

To panic, run clif-util compile -dDpv clif.txt:

$ target/debug/clif-util compile -dDpv clif.txt
thread 'main' panicked at 'FPR8:%xmm5 is already free in [ GPR: -------d89012345 FPR32: -----5---------56789012345678901 FPR: -----5---------5 FLAG: f ]', cranelift/codegen/src/regalloc/register_set.rs:73:9

https://github.com/abrown/wasmtime/tree/fix-simd-locals/cranelift

rustc 1.41.1, cargo 1.41.0, Fedora 31 on kernel 5.5.7

view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2020 at 23:48):

abrown commented on Issue #1306:

I attempted removing the FPR32 register class entirely but that doesn't seem to solve the problem. I mean, the FPR32 is gone from the logging but I still get a FPR8:%xmm2 is already free in... error. Also, without FPR32 the endless loop seems to be gone and I can run bugpoint to get a 172-instruction version to reproduce: clif-reduced.txt.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 12 2020 at 23:49):

abrown edited a comment on Issue #1306:

I attempted removing the FPR32 register class entirely but that doesn't seem tototally solve the problem. I mean, the FPR32 is gone from the logging but I still get a FPR8:%xmm2 is already free in... error when I attempt to compile. On the bright side, without FPR32 the endless loop seems to be gone and I can run bugpoint to get a 172-instruction version to reproduce: clif-reduced.txt.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 16 2020 at 22:06):

abrown commented on Issue #1306:

In an attempt to reduce this down to something comprehensible, I started playing around with small test cases:

test compile
set enable_simd
target x86_64 skylake

function u0:35() system_v {
block0:
    ; v0 = vconst.i32x4 [0 1 2 3]
    v0 = iconst.i64 0xdeadbeef
    v1 = load.i32x4 v0
    v2 = load.i32x4 v0
    v3 = load.i32x4 v0
    v4 = load.i32x4 v0
    v5 = load.i32x4 v0
    v6 = load.i32x4 v0
    v7 = load.i32x4 v0
    v8 = load.i32x4 v0
    v9 = load.i32x4 v0
    v10 = load.i32x4 v0
    v11 = load.i32x4 v0
    v12 = load.i32x4 v0
    v13 = load.i32x4 v0
    v14 = load.i32x4 v0
    v15 = load.i32x4 v0
    v16 = load.i32x4 v0
    ;; this causes the error
    ;; v17 = load.i32x4 v0

    store v1, v0
    store v2, v0
    store v3, v0
    store v4, v0
    store v5, v0
    store v6, v0
    store v7, v0
    store v8, v0
    store v9, v0
    store v10, v0
    store v11, v0
    store v12, v0
    store v13, v0
    store v14, v0
    store v15, v0
    store v16, v0
    return
}

On the remove-fpr32 branch (see #1318), I run cargo run -p cranelift-tools -- compile -dDpv scratch-tiny.clif. This fails with:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `68`,
 right: `64`: Invalid registers for REX-less Op2 encoding', cranelift/codegen/src/isa/x86/binemit.rs:119:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

When I turn on logging, I see the following right before the failure:

 DEBUG cranelift_codegen::context              > Compiled:
function u0:35(i64 fp [%rbp]) -> i64 fp [%rbp] system_v {
    ss0 = spill_slot 16, offset -32
    ss1 = incoming_arg 16, offset -16

                                block0(v20: i64 [%rbp]):
[RexOp1pushq#50]                    x86_push v20
[RexOp1copysp#8089]                 copy_special %rsp -> %rbp
[RexOp1adjustsp_ib#d083]            adjust_sp_down_imm 16
[RexOp1pu_id#b8,%rax]               v0 = iconst.i64 0xdead_beef
[DynRexOp2fld#410,%xmm0]            v18 = load.i32x4 v0
[Op2fspillSib32#411,ss0]            v1 = spill v18
[DynRexOp2fld#410,%xmm0]            v2 = load.i32x4 v0
[DynRexOp2fld#410,%xmm1]            v3 = load.i32x4 v0
[DynRexOp2fld#410,%xmm2]            v4 = load.i32x4 v0
[DynRexOp2fld#410,%xmm3]            v5 = load.i32x4 v0
[DynRexOp2fld#410,%xmm4]            v6 = load.i32x4 v0
[DynRexOp2fld#410,%xmm5]            v7 = load.i32x4 v0
[DynRexOp2fld#410,%xmm6]            v8 = load.i32x4 v0
[DynRexOp2fld#410,%xmm7]            v9 = load.i32x4 v0
[DynRexOp2fld#410,%xmm8]            v10 = load.i32x4 v0
[DynRexOp2fld#410,%xmm9]            v11 = load.i32x4 v0
[DynRexOp2fld#410,%xmm10]           v12 = load.i32x4 v0
[DynRexOp2fld#410,%xmm11]           v13 = load.i32x4 v0
[DynRexOp2fld#410,%xmm12]           v14 = load.i32x4 v0
[DynRexOp2fld#410,%xmm13]           v15 = load.i32x4 v0
[DynRexOp2fld#410,%xmm14]           v16 = load.i32x4 v0
[DynRexOp2fld#410,%xmm15]           v17 = load.i32x4 v0
[Op2frmov#428]                      regmove v2, %xmm0 -> %xmm15
[Op2ffillSib32#410,%xmm0]           v19 = fill v1
[DynRexOp2fst#411]                  store v19, v0
[DynRexOp2fst#411]                  store v2, v0
[DynRexOp2fst#411]                  store v3, v0
[DynRexOp2fst#411]                  store v4, v0
[DynRexOp2fst#411]                  store v5, v0
[DynRexOp2fst#411]                  store v6, v0
[DynRexOp2fst#411]                  store v7, v0
[DynRexOp2fst#411]                  store v8, v0
[DynRexOp2fst#411]                  store v9, v0
[DynRexOp2fst#411]                  store v10, v0
[DynRexOp2fst#411]                  store v11, v0
[DynRexOp2fst#411]                  store v12, v0
[DynRexOp2fst#411]                  store v13, v0
[DynRexOp2fst#411]                  store v14, v0
[DynRexOp2fst#411]                  store v15, v0
[DynRexOp2fst#411]                  store v16, v0
[RexOp1adjustsp_ib#8083]            adjust_sp_up_imm 16
[RexOp1popq#58,%rbp]                v21 = x86_pop.i64
[Op1ret#c3]                         return v21
}

The immediate spilling of v1 seems to make sense, and the regmove v2, %xmm0->%xmm15 in order to fill v1 into v19 does too: v17/%xmm15 is never used again so we should be able to use %xmm15 for v2's value. But when I debug the regmove is the instruction causing the failure. Because load and store have the infer_rex() meta-property, they can access all 16 FPR registers but regmove, which has neither infer_rex() nor rex(), can only access FPR8. When I give regmove a rex() prefix the snippet compiles. We have an issue to track that I need to add REX prefixes to a bunch of SIMD instructions, #1127, so I think I will submit a PR for that and then revisit this.

Regardless of whether I fix #1127 or not, though, regalloc should know better than to try to move to a register it can't encode. For SIMD and floats, regmove uses the frmov recipe, which takes a single FPR as an input. I would have expected cranelift to know that the lack of infer_rex() nor rex() meant that regmove would be limited to FPR8, but no. Perhaps it assumes that the source and destination registers are in the same register class.

My plan for now is:

view this post on Zulip Wasmtime GitHub notifications bot (Mar 16 2020 at 22:07):

abrown edited a comment on Issue #1306:

In an attempt to reduce this down to something comprehensible, I started playing around with small test cases:

test compile
set enable_simd
target x86_64 skylake

function u0:35() system_v {
block0:
    v0 = iconst.i64 0xdeadbeef
    v1 = load.i32x4 v0
    v2 = load.i32x4 v0
    v3 = load.i32x4 v0
    v4 = load.i32x4 v0
    v5 = load.i32x4 v0
    v6 = load.i32x4 v0
    v7 = load.i32x4 v0
    v8 = load.i32x4 v0
    v9 = load.i32x4 v0
    v10 = load.i32x4 v0
    v11 = load.i32x4 v0
    v12 = load.i32x4 v0
    v13 = load.i32x4 v0
    v14 = load.i32x4 v0
    v15 = load.i32x4 v0
    v16 = load.i32x4 v0
    ;; this causes the error
    v17 = load.i32x4 v0

    store v1, v0
    store v2, v0
    store v3, v0
    store v4, v0
    store v5, v0
    store v6, v0
    store v7, v0
    store v8, v0
    store v9, v0
    store v10, v0
    store v11, v0
    store v12, v0
    store v13, v0
    store v14, v0
    store v15, v0
    store v16, v0
    return
}

On the remove-fpr32 branch (see #1318), I run cargo run -p cranelift-tools -- compile -dDpv scratch-tiny.clif. This fails with:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `68`,
 right: `64`: Invalid registers for REX-less Op2 encoding', cranelift/codegen/src/isa/x86/binemit.rs:119:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

When I turn on logging, I see the following right before the failure:

 DEBUG cranelift_codegen::context              > Compiled:
function u0:35(i64 fp [%rbp]) -> i64 fp [%rbp] system_v {
    ss0 = spill_slot 16, offset -32
    ss1 = incoming_arg 16, offset -16

                                block0(v20: i64 [%rbp]):
[RexOp1pushq#50]                    x86_push v20
[RexOp1copysp#8089]                 copy_special %rsp -> %rbp
[RexOp1adjustsp_ib#d083]            adjust_sp_down_imm 16
[RexOp1pu_id#b8,%rax]               v0 = iconst.i64 0xdead_beef
[DynRexOp2fld#410,%xmm0]            v18 = load.i32x4 v0
[Op2fspillSib32#411,ss0]            v1 = spill v18
[DynRexOp2fld#410,%xmm0]            v2 = load.i32x4 v0
[DynRexOp2fld#410,%xmm1]            v3 = load.i32x4 v0
[DynRexOp2fld#410,%xmm2]            v4 = load.i32x4 v0
[DynRexOp2fld#410,%xmm3]            v5 = load.i32x4 v0
[DynRexOp2fld#410,%xmm4]            v6 = load.i32x4 v0
[DynRexOp2fld#410,%xmm5]            v7 = load.i32x4 v0
[DynRexOp2fld#410,%xmm6]            v8 = load.i32x4 v0
[DynRexOp2fld#410,%xmm7]            v9 = load.i32x4 v0
[DynRexOp2fld#410,%xmm8]            v10 = load.i32x4 v0
[DynRexOp2fld#410,%xmm9]            v11 = load.i32x4 v0
[DynRexOp2fld#410,%xmm10]           v12 = load.i32x4 v0
[DynRexOp2fld#410,%xmm11]           v13 = load.i32x4 v0
[DynRexOp2fld#410,%xmm12]           v14 = load.i32x4 v0
[DynRexOp2fld#410,%xmm13]           v15 = load.i32x4 v0
[DynRexOp2fld#410,%xmm14]           v16 = load.i32x4 v0
[DynRexOp2fld#410,%xmm15]           v17 = load.i32x4 v0
[Op2frmov#428]                      regmove v2, %xmm0 -> %xmm15
[Op2ffillSib32#410,%xmm0]           v19 = fill v1
[DynRexOp2fst#411]                  store v19, v0
[DynRexOp2fst#411]                  store v2, v0
[DynRexOp2fst#411]                  store v3, v0
[DynRexOp2fst#411]                  store v4, v0
[DynRexOp2fst#411]                  store v5, v0
[DynRexOp2fst#411]                  store v6, v0
[DynRexOp2fst#411]                  store v7, v0
[DynRexOp2fst#411]                  store v8, v0
[DynRexOp2fst#411]                  store v9, v0
[DynRexOp2fst#411]                  store v10, v0
[DynRexOp2fst#411]                  store v11, v0
[DynRexOp2fst#411]                  store v12, v0
[DynRexOp2fst#411]                  store v13, v0
[DynRexOp2fst#411]                  store v14, v0
[DynRexOp2fst#411]                  store v15, v0
[DynRexOp2fst#411]                  store v16, v0
[RexOp1adjustsp_ib#8083]            adjust_sp_up_imm 16
[RexOp1popq#58,%rbp]                v21 = x86_pop.i64
[Op1ret#c3]                         return v21
}

The immediate spilling of v1 seems to make sense, and the regmove v2, %xmm0->%xmm15 in order to fill v1 into v19 does too: v17/%xmm15 is never used again so we should be able to use %xmm15 for v2's value. But when I debug the regmove is the instruction causing the failure. Because load and store have the infer_rex() meta-property, they can access all 16 FPR registers but regmove, which has neither infer_rex() nor rex(), can only access FPR8. When I give regmove a rex() prefix the snippet compiles. We have an issue to track that I need to add REX prefixes to a bunch of SIMD instructions, #1127, so I think I will submit a PR for that and then revisit this.

Regardless of whether I fix #1127 or not, though, regalloc should know better than to try to move to a register it can't encode. For SIMD and floats, regmove uses the frmov recipe, which takes a single FPR as an input. I would have expected cranelift to know that the lack of infer_rex() nor rex() meant that regmove would be limited to FPR8, but no. Perhaps it assumes that the source and destination registers are in the same register class.

My plan for now is:

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2020 at 01:30):

abrown commented on Issue #1306:

#1318 and #1335, when merged, should clarify this a bit more.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 03 2021 at 18:25):

bjorn3 commented on Issue #1306:

The new backend framework uses a different register allocator.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 03 2021 at 18:29):

abrown closed Issue #1306:

As I was attempting to compile some SIMD Wasm code, I discovered that wasmtime would enter an infinite loop in the register allocator. With logging enabled, the register allocator gets stuck with the number of global registers:

...
 DEBUG cranelift_codegen::regalloc::solver     > add_killed_var(v6985:FPR, from=%xmm0)
 DEBUG cranelift_codegen::regalloc::solver     > -> new var: v6985(FPR, from %xmm0, in)
 DEBUG cranelift_codegen::regalloc::solver     > real_solve for Solver { inputs_done: true,
  in:  [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  assignments: []
  vars: [v191(FPR, out, global, def), v6985(FPR, from %xmm0, in, 1)]
  moves: []
}

 DEBUG cranelift_codegen::regalloc::coloring   > Not enough global registers for v191, trying as local
 DEBUG cranelift_codegen::regalloc::solver     > real_solve for Solver { inputs_done: true,
  in:  [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ]
  assignments: []
  vars: [v191(FPR, out, def, 1), v6985(FPR, from %xmm0, in, 1)]
  moves: []
}

 DEBUG cranelift_codegen::regalloc::coloring   >     color v191 -> %xmm0 (global to be replaced)
 DEBUG cranelift_codegen::regalloc::coloring   > Replacing global defs on v191 = copy.i64x2 v6985
 DEBUG cranelift_codegen::regalloc::coloring   >   + v191 = copy.i64x2 v6986 with v6986 in %xmm0
 DEBUG cranelift_codegen::regalloc::coloring   > Done: v6986 = copy.i64x2 v6985
 DEBUG cranelift_codegen::regalloc::coloring   > Coloring v191 = copy.i64x2 v6986
    from [ GPR: ---b--sd89012345 FPR32: -----------------789012345678901 FPR: ---------------- FLAG: f ]
 DEBUG cranelift_codegen::regalloc::coloring   >     kill v6986 in %xmm0 (local FPR)
 DEBUG cranelift_codegen::regalloc::coloring   >     glob [ GPR: a--b--sd89012345 FPR32: ----------------6789012345678901 FPR: ---------------- FLAG: f ]
...

You can see how v6986 is replacing v6985 above and this is the pattern that will repeat forever. You may notice as well that the code above is using 16 FPR registers which I temporarily added at https://github.com/abrown/wasmtime/blob/7b0463a24cdcf525057349c53c6a46a436c21a80/cranelift/codegen/meta/src/isa/x86/encodings.rs#L346 to see if relieving register pressure would get rid of the issue. It doesn't.

I extracted the function that I believe is causing the issue into the attached clif.txt and you can see below a couple of different ways of looking at the problem (bugpoint loops forever, compile fails). I can attach the original Wasm if that would be helpful but that introduces even more functions to worry about.

clif-util bugpoint clif.txt loops forever.

To panic, run clif-util compile -dDpv clif.txt:

$ target/debug/clif-util compile -dDpv clif.txt
thread 'main' panicked at 'FPR8:%xmm5 is already free in [ GPR: -------d89012345 FPR32: -----5---------56789012345678901 FPR: -----5---------5 FLAG: f ]', cranelift/codegen/src/regalloc/register_set.rs:73:9

https://github.com/abrown/wasmtime/tree/fix-simd-locals/cranelift

rustc 1.41.1, cargo 1.41.0, Fedora 31 on kernel 5.5.7


Last updated: Oct 23 2024 at 20:03 UTC