abrown opened Issue #1306:
As I was attempting to compile some SIMD Wasm code, I discovered that wasmtime would enter an infinite loop in the register allocator. With logging enabled, the register allocator gets stuck with the number of global registers:
... DEBUG cranelift_codegen::regalloc::solver > add_killed_var(v6985:FPR, from=%xmm0) DEBUG cranelift_codegen::regalloc::solver > -> new var: v6985(FPR, from %xmm0, in) DEBUG cranelift_codegen::regalloc::solver > real_solve for Solver { inputs_done: true, in: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] assignments: [] vars: [v191(FPR, out, global, def), v6985(FPR, from %xmm0, in, 1)] moves: [] } DEBUG cranelift_codegen::regalloc::coloring > Not enough global registers for v191, trying as local DEBUG cranelift_codegen::regalloc::solver > real_solve for Solver { inputs_done: true, in: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] assignments: [] vars: [v191(FPR, out, def, 1), v6985(FPR, from %xmm0, in, 1)] moves: [] } DEBUG cranelift_codegen::regalloc::coloring > color v191 -> %xmm0 (global to be replaced) DEBUG cranelift_codegen::regalloc::coloring > Replacing global defs on v191 = copy.i64x2 v6985 DEBUG cranelift_codegen::regalloc::coloring > + v191 = copy.i64x2 v6986 with v6986 in %xmm0 DEBUG cranelift_codegen::regalloc::coloring > Done: v6986 = copy.i64x2 v6985 DEBUG cranelift_codegen::regalloc::coloring > Coloring v191 = copy.i64x2 v6986 from [ GPR: ---b--sd89012345 FPR32: -----------------789012345678901 FPR: ---------------- FLAG: f ] DEBUG cranelift_codegen::regalloc::coloring > kill v6986 in %xmm0 (local FPR) DEBUG cranelift_codegen::regalloc::coloring > glob [ GPR: a--b--sd89012345 FPR32: ----------------6789012345678901 FPR: ---------------- FLAG: f ] ...You can see how
v6986
is replacingv6985
above and this is the pattern that will repeat forever. You may notice as well that the code above is using 16 FPR registers which I temporarily added at https://github.com/abrown/wasmtime/blob/7b0463a24cdcf525057349c53c6a46a436c21a80/cranelift/codegen/meta/src/isa/x86/encodings.rs#L346 to see if relieving register pressure would get rid of the issue. It doesn't.I extracted the function that I believe is causing the issue into
clif [clif.txt](https://github.com/bytecodealliance/wasmtime/files/4326538/clif.txt) .txt
(attached) and you can see below a couple of different ways of looking at the problem (bugpoint
loops forever,compile
fails). I can attach the original Wasm if that would be helpful but that introduces even more functions to worry about.
- What are the steps to reproduce the issue? Can you include a CLIF test case, ideally reduced with the
bugpoint
clif-util command?
clif-util bugpoint scratch.clif
loops forever.
- What do you expect to happen? What does actually happen? Does it panic, and if so, with which assertion?
To panic, run
clif-util compile -dDpv scratch.clif
.
- Which Cranelift version / commit hash / branch are you using?
https://github.com/abrown/wasmtime/tree/fix-simd-locals/cranelift
- If relevant, can you include some extra information about your environment?
(Rust version, operating system, architecture...)rustc 1.41.1, cargo 1.41.0, Fedora 31 on kernel 5.5.7
abrown labeled Issue #1306:
As I was attempting to compile some SIMD Wasm code, I discovered that wasmtime would enter an infinite loop in the register allocator. With logging enabled, the register allocator gets stuck with the number of global registers:
... DEBUG cranelift_codegen::regalloc::solver > add_killed_var(v6985:FPR, from=%xmm0) DEBUG cranelift_codegen::regalloc::solver > -> new var: v6985(FPR, from %xmm0, in) DEBUG cranelift_codegen::regalloc::solver > real_solve for Solver { inputs_done: true, in: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] assignments: [] vars: [v191(FPR, out, global, def), v6985(FPR, from %xmm0, in, 1)] moves: [] } DEBUG cranelift_codegen::regalloc::coloring > Not enough global registers for v191, trying as local DEBUG cranelift_codegen::regalloc::solver > real_solve for Solver { inputs_done: true, in: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] assignments: [] vars: [v191(FPR, out, def, 1), v6985(FPR, from %xmm0, in, 1)] moves: [] } DEBUG cranelift_codegen::regalloc::coloring > color v191 -> %xmm0 (global to be replaced) DEBUG cranelift_codegen::regalloc::coloring > Replacing global defs on v191 = copy.i64x2 v6985 DEBUG cranelift_codegen::regalloc::coloring > + v191 = copy.i64x2 v6986 with v6986 in %xmm0 DEBUG cranelift_codegen::regalloc::coloring > Done: v6986 = copy.i64x2 v6985 DEBUG cranelift_codegen::regalloc::coloring > Coloring v191 = copy.i64x2 v6986 from [ GPR: ---b--sd89012345 FPR32: -----------------789012345678901 FPR: ---------------- FLAG: f ] DEBUG cranelift_codegen::regalloc::coloring > kill v6986 in %xmm0 (local FPR) DEBUG cranelift_codegen::regalloc::coloring > glob [ GPR: a--b--sd89012345 FPR32: ----------------6789012345678901 FPR: ---------------- FLAG: f ] ...You can see how
v6986
is replacingv6985
above and this is the pattern that will repeat forever. You may notice as well that the code above is using 16 FPR registers which I temporarily added at https://github.com/abrown/wasmtime/blob/7b0463a24cdcf525057349c53c6a46a436c21a80/cranelift/codegen/meta/src/isa/x86/encodings.rs#L346 to see if relieving register pressure would get rid of the issue. It doesn't.I extracted the function that I believe is causing the issue into
clif [clif.txt](https://github.com/bytecodealliance/wasmtime/files/4326538/clif.txt) .txt
(attached) and you can see below a couple of different ways of looking at the problem (bugpoint
loops forever,compile
fails). I can attach the original Wasm if that would be helpful but that introduces even more functions to worry about.
- What are the steps to reproduce the issue? Can you include a CLIF test case, ideally reduced with the
bugpoint
clif-util command?
clif-util bugpoint scratch.clif
loops forever.
- What do you expect to happen? What does actually happen? Does it panic, and if so, with which assertion?
To panic, run
clif-util compile -dDpv scratch.clif
.
- Which Cranelift version / commit hash / branch are you using?
https://github.com/abrown/wasmtime/tree/fix-simd-locals/cranelift
- If relevant, can you include some extra information about your environment?
(Rust version, operating system, architecture...)rustc 1.41.1, cargo 1.41.0, Fedora 31 on kernel 5.5.7
abrown labeled Issue #1306:
As I was attempting to compile some SIMD Wasm code, I discovered that wasmtime would enter an infinite loop in the register allocator. With logging enabled, the register allocator gets stuck with the number of global registers:
... DEBUG cranelift_codegen::regalloc::solver > add_killed_var(v6985:FPR, from=%xmm0) DEBUG cranelift_codegen::regalloc::solver > -> new var: v6985(FPR, from %xmm0, in) DEBUG cranelift_codegen::regalloc::solver > real_solve for Solver { inputs_done: true, in: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] assignments: [] vars: [v191(FPR, out, global, def), v6985(FPR, from %xmm0, in, 1)] moves: [] } DEBUG cranelift_codegen::regalloc::coloring > Not enough global registers for v191, trying as local DEBUG cranelift_codegen::regalloc::solver > real_solve for Solver { inputs_done: true, in: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] assignments: [] vars: [v191(FPR, out, def, 1), v6985(FPR, from %xmm0, in, 1)] moves: [] } DEBUG cranelift_codegen::regalloc::coloring > color v191 -> %xmm0 (global to be replaced) DEBUG cranelift_codegen::regalloc::coloring > Replacing global defs on v191 = copy.i64x2 v6985 DEBUG cranelift_codegen::regalloc::coloring > + v191 = copy.i64x2 v6986 with v6986 in %xmm0 DEBUG cranelift_codegen::regalloc::coloring > Done: v6986 = copy.i64x2 v6985 DEBUG cranelift_codegen::regalloc::coloring > Coloring v191 = copy.i64x2 v6986 from [ GPR: ---b--sd89012345 FPR32: -----------------789012345678901 FPR: ---------------- FLAG: f ] DEBUG cranelift_codegen::regalloc::coloring > kill v6986 in %xmm0 (local FPR) DEBUG cranelift_codegen::regalloc::coloring > glob [ GPR: a--b--sd89012345 FPR32: ----------------6789012345678901 FPR: ---------------- FLAG: f ] ...You can see how
v6986
is replacingv6985
above and this is the pattern that will repeat forever. You may notice as well that the code above is using 16 FPR registers which I temporarily added at https://github.com/abrown/wasmtime/blob/7b0463a24cdcf525057349c53c6a46a436c21a80/cranelift/codegen/meta/src/isa/x86/encodings.rs#L346 to see if relieving register pressure would get rid of the issue. It doesn't.I extracted the function that I believe is causing the issue into
clif [clif.txt](https://github.com/bytecodealliance/wasmtime/files/4326538/clif.txt) .txt
(attached) and you can see below a couple of different ways of looking at the problem (bugpoint
loops forever,compile
fails). I can attach the original Wasm if that would be helpful but that introduces even more functions to worry about.
- What are the steps to reproduce the issue? Can you include a CLIF test case, ideally reduced with the
bugpoint
clif-util command?
clif-util bugpoint scratch.clif
loops forever.
- What do you expect to happen? What does actually happen? Does it panic, and if so, with which assertion?
To panic, run
clif-util compile -dDpv scratch.clif
.
- Which Cranelift version / commit hash / branch are you using?
https://github.com/abrown/wasmtime/tree/fix-simd-locals/cranelift
- If relevant, can you include some extra information about your environment?
(Rust version, operating system, architecture...)rustc 1.41.1, cargo 1.41.0, Fedora 31 on kernel 5.5.7
abrown edited Issue #1306:
As I was attempting to compile some SIMD Wasm code, I discovered that wasmtime would enter an infinite loop in the register allocator. With logging enabled, the register allocator gets stuck with the number of global registers:
... DEBUG cranelift_codegen::regalloc::solver > add_killed_var(v6985:FPR, from=%xmm0) DEBUG cranelift_codegen::regalloc::solver > -> new var: v6985(FPR, from %xmm0, in) DEBUG cranelift_codegen::regalloc::solver > real_solve for Solver { inputs_done: true, in: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] assignments: [] vars: [v191(FPR, out, global, def), v6985(FPR, from %xmm0, in, 1)] moves: [] } DEBUG cranelift_codegen::regalloc::coloring > Not enough global registers for v191, trying as local DEBUG cranelift_codegen::regalloc::solver > real_solve for Solver { inputs_done: true, in: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] assignments: [] vars: [v191(FPR, out, def, 1), v6985(FPR, from %xmm0, in, 1)] moves: [] } DEBUG cranelift_codegen::regalloc::coloring > color v191 -> %xmm0 (global to be replaced) DEBUG cranelift_codegen::regalloc::coloring > Replacing global defs on v191 = copy.i64x2 v6985 DEBUG cranelift_codegen::regalloc::coloring > + v191 = copy.i64x2 v6986 with v6986 in %xmm0 DEBUG cranelift_codegen::regalloc::coloring > Done: v6986 = copy.i64x2 v6985 DEBUG cranelift_codegen::regalloc::coloring > Coloring v191 = copy.i64x2 v6986 from [ GPR: ---b--sd89012345 FPR32: -----------------789012345678901 FPR: ---------------- FLAG: f ] DEBUG cranelift_codegen::regalloc::coloring > kill v6986 in %xmm0 (local FPR) DEBUG cranelift_codegen::regalloc::coloring > glob [ GPR: a--b--sd89012345 FPR32: ----------------6789012345678901 FPR: ---------------- FLAG: f ] ...You can see how
v6986
is replacingv6985
above and this is the pattern that will repeat forever. You may notice as well that the code above is using 16 FPR registers which I temporarily added at https://github.com/abrown/wasmtime/blob/7b0463a24cdcf525057349c53c6a46a436c21a80/cranelift/codegen/meta/src/isa/x86/encodings.rs#L346 to see if relieving register pressure would get rid of the issue. It doesn't.I extracted the function that I believe is causing the issue into
clif [clif.txt](https://github.com/bytecodealliance/wasmtime/files/4326538/clif.txt) .txt
(attached) and you can see below a couple of different ways of looking at the problem (bugpoint
loops forever,compile
fails). I can attach the original Wasm if that would be helpful but that introduces even more functions to worry about.
- What are the steps to reproduce the issue? Can you include a CLIF test case, ideally reduced with the
bugpoint
clif-util command?
clif-util bugpoint scratch.clif
loops forever.
- What do you expect to happen? What does actually happen? Does it panic, and if so, with which assertion?
To panic, run
clif-util compile -dDpv scratch.clif
.
- Which Cranelift version / commit hash / branch are you using?
https://github.com/abrown/wasmtime/tree/fix-simd-locals/cranelift
- If relevant, can you include some extra information about your environment?
(Rust version, operating system, architecture...)rustc 1.41.1, cargo 1.41.0, Fedora 31 on kernel 5.5.7
abrown edited Issue #1306:
As I was attempting to compile some SIMD Wasm code, I discovered that wasmtime would enter an infinite loop in the register allocator. With logging enabled, the register allocator gets stuck with the number of global registers:
... DEBUG cranelift_codegen::regalloc::solver > add_killed_var(v6985:FPR, from=%xmm0) DEBUG cranelift_codegen::regalloc::solver > -> new var: v6985(FPR, from %xmm0, in) DEBUG cranelift_codegen::regalloc::solver > real_solve for Solver { inputs_done: true, in: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] assignments: [] vars: [v191(FPR, out, global, def), v6985(FPR, from %xmm0, in, 1)] moves: [] } DEBUG cranelift_codegen::regalloc::coloring > Not enough global registers for v191, trying as local DEBUG cranelift_codegen::regalloc::solver > real_solve for Solver { inputs_done: true, in: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] assignments: [] vars: [v191(FPR, out, def, 1), v6985(FPR, from %xmm0, in, 1)] moves: [] } DEBUG cranelift_codegen::regalloc::coloring > color v191 -> %xmm0 (global to be replaced) DEBUG cranelift_codegen::regalloc::coloring > Replacing global defs on v191 = copy.i64x2 v6985 DEBUG cranelift_codegen::regalloc::coloring > + v191 = copy.i64x2 v6986 with v6986 in %xmm0 DEBUG cranelift_codegen::regalloc::coloring > Done: v6986 = copy.i64x2 v6985 DEBUG cranelift_codegen::regalloc::coloring > Coloring v191 = copy.i64x2 v6986 from [ GPR: ---b--sd89012345 FPR32: -----------------789012345678901 FPR: ---------------- FLAG: f ] DEBUG cranelift_codegen::regalloc::coloring > kill v6986 in %xmm0 (local FPR) DEBUG cranelift_codegen::regalloc::coloring > glob [ GPR: a--b--sd89012345 FPR32: ----------------6789012345678901 FPR: ---------------- FLAG: f ] ...You can see how
v6986
is replacingv6985
above and this is the pattern that will repeat forever. You may notice as well that the code above is using 16 FPR registers which I temporarily added at https://github.com/abrown/wasmtime/blob/7b0463a24cdcf525057349c53c6a46a436c21a80/cranelift/codegen/meta/src/isa/x86/encodings.rs#L346 to see if relieving register pressure would get rid of the issue. It doesn't.I extracted the function that I believe is causing the issue into the attached clif.txt and you can see below a couple of different ways of looking at the problem (
bugpoint
loops forever,compile
fails). I can attach the original Wasm if that would be helpful but that introduces even more functions to worry about.
- What are the steps to reproduce the issue? Can you include a CLIF test case, ideally reduced with the
bugpoint
clif-util command?
clif-util bugpoint scratch.clif
loops forever.
- What do you expect to happen? What does actually happen? Does it panic, and if so, with which assertion?
To panic, run
clif-util compile -dDpv scratch.clif
.
- Which Cranelift version / commit hash / branch are you using?
https://github.com/abrown/wasmtime/tree/fix-simd-locals/cranelift
- If relevant, can you include some extra information about your environment?
(Rust version, operating system, architecture...)rustc 1.41.1, cargo 1.41.0, Fedora 31 on kernel 5.5.7
abrown edited Issue #1306:
As I was attempting to compile some SIMD Wasm code, I discovered that wasmtime would enter an infinite loop in the register allocator. With logging enabled, the register allocator gets stuck with the number of global registers:
... DEBUG cranelift_codegen::regalloc::solver > add_killed_var(v6985:FPR, from=%xmm0) DEBUG cranelift_codegen::regalloc::solver > -> new var: v6985(FPR, from %xmm0, in) DEBUG cranelift_codegen::regalloc::solver > real_solve for Solver { inputs_done: true, in: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] assignments: [] vars: [v191(FPR, out, global, def), v6985(FPR, from %xmm0, in, 1)] moves: [] } DEBUG cranelift_codegen::regalloc::coloring > Not enough global registers for v191, trying as local DEBUG cranelift_codegen::regalloc::solver > real_solve for Solver { inputs_done: true, in: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] assignments: [] vars: [v191(FPR, out, def, 1), v6985(FPR, from %xmm0, in, 1)] moves: [] } DEBUG cranelift_codegen::regalloc::coloring > color v191 -> %xmm0 (global to be replaced) DEBUG cranelift_codegen::regalloc::coloring > Replacing global defs on v191 = copy.i64x2 v6985 DEBUG cranelift_codegen::regalloc::coloring > + v191 = copy.i64x2 v6986 with v6986 in %xmm0 DEBUG cranelift_codegen::regalloc::coloring > Done: v6986 = copy.i64x2 v6985 DEBUG cranelift_codegen::regalloc::coloring > Coloring v191 = copy.i64x2 v6986 from [ GPR: ---b--sd89012345 FPR32: -----------------789012345678901 FPR: ---------------- FLAG: f ] DEBUG cranelift_codegen::regalloc::coloring > kill v6986 in %xmm0 (local FPR) DEBUG cranelift_codegen::regalloc::coloring > glob [ GPR: a--b--sd89012345 FPR32: ----------------6789012345678901 FPR: ---------------- FLAG: f ] ...You can see how
v6986
is replacingv6985
above and this is the pattern that will repeat forever. You may notice as well that the code above is using 16 FPR registers which I temporarily added at https://github.com/abrown/wasmtime/blob/7b0463a24cdcf525057349c53c6a46a436c21a80/cranelift/codegen/meta/src/isa/x86/encodings.rs#L346 to see if relieving register pressure would get rid of the issue. It doesn't.I extracted the function that I believe is causing the issue into the attached clif.txt and you can see below a couple of different ways of looking at the problem (
bugpoint
loops forever,compile
fails). I can attach the original Wasm if that would be helpful but that introduces even more functions to worry about.
- What are the steps to reproduce the issue? Can you include a CLIF test case, ideally reduced with the
bugpoint
clif-util command?
clif-util bugpoint clif.txt
loops forever.
- What do you expect to happen? What does actually happen? Does it panic, and if so, with which assertion?
To panic, run
clif-util compile -dDpv clif.txt
:$ target/debug/clif-util compile -dDpv text.clif thread 'main' panicked at 'FPR8:%xmm5 is already free in [ GPR: -------d89012345 FPR32: -----5---------56789012345678901 FPR: -----5---------5 FLAG: f ]', cranelift/codegen/src/regalloc/register_set.rs:73:9
- Which Cranelift version / commit hash / branch are you using?
https://github.com/abrown/wasmtime/tree/fix-simd-locals/cranelift
- If relevant, can you include some extra information about your environment?
(Rust version, operating system, architecture...)rustc 1.41.1, cargo 1.41.0, Fedora 31 on kernel 5.5.7
abrown edited Issue #1306:
As I was attempting to compile some SIMD Wasm code, I discovered that wasmtime would enter an infinite loop in the register allocator. With logging enabled, the register allocator gets stuck with the number of global registers:
... DEBUG cranelift_codegen::regalloc::solver > add_killed_var(v6985:FPR, from=%xmm0) DEBUG cranelift_codegen::regalloc::solver > -> new var: v6985(FPR, from %xmm0, in) DEBUG cranelift_codegen::regalloc::solver > real_solve for Solver { inputs_done: true, in: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] assignments: [] vars: [v191(FPR, out, global, def), v6985(FPR, from %xmm0, in, 1)] moves: [] } DEBUG cranelift_codegen::regalloc::coloring > Not enough global registers for v191, trying as local DEBUG cranelift_codegen::regalloc::solver > real_solve for Solver { inputs_done: true, in: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] assignments: [] vars: [v191(FPR, out, def, 1), v6985(FPR, from %xmm0, in, 1)] moves: [] } DEBUG cranelift_codegen::regalloc::coloring > color v191 -> %xmm0 (global to be replaced) DEBUG cranelift_codegen::regalloc::coloring > Replacing global defs on v191 = copy.i64x2 v6985 DEBUG cranelift_codegen::regalloc::coloring > + v191 = copy.i64x2 v6986 with v6986 in %xmm0 DEBUG cranelift_codegen::regalloc::coloring > Done: v6986 = copy.i64x2 v6985 DEBUG cranelift_codegen::regalloc::coloring > Coloring v191 = copy.i64x2 v6986 from [ GPR: ---b--sd89012345 FPR32: -----------------789012345678901 FPR: ---------------- FLAG: f ] DEBUG cranelift_codegen::regalloc::coloring > kill v6986 in %xmm0 (local FPR) DEBUG cranelift_codegen::regalloc::coloring > glob [ GPR: a--b--sd89012345 FPR32: ----------------6789012345678901 FPR: ---------------- FLAG: f ] ...You can see how
v6986
is replacingv6985
above and this is the pattern that will repeat forever. You may notice as well that the code above is using 16 FPR registers which I temporarily added at https://github.com/abrown/wasmtime/blob/7b0463a24cdcf525057349c53c6a46a436c21a80/cranelift/codegen/meta/src/isa/x86/encodings.rs#L346 to see if relieving register pressure would get rid of the issue. It doesn't.I extracted the function that I believe is causing the issue into the attached clif.txt and you can see below a couple of different ways of looking at the problem (
bugpoint
loops forever,compile
fails). I can attach the original Wasm if that would be helpful but that introduces even more functions to worry about.
- What are the steps to reproduce the issue? Can you include a CLIF test case, ideally reduced with the
bugpoint
clif-util command?
clif-util bugpoint clif.txt
loops forever.
- What do you expect to happen? What does actually happen? Does it panic, and if so, with which assertion?
To panic, run
clif-util compile -dDpv clif.txt
:$ target/debug/clif-util compile -dDpv clif.txt thread 'main' panicked at 'FPR8:%xmm5 is already free in [ GPR: -------d89012345 FPR32: -----5---------56789012345678901 FPR: -----5---------5 FLAG: f ]', cranelift/codegen/src/regalloc/register_set.rs:73:9
- Which Cranelift version / commit hash / branch are you using?
https://github.com/abrown/wasmtime/tree/fix-simd-locals/cranelift
- If relevant, can you include some extra information about your environment?
(Rust version, operating system, architecture...)rustc 1.41.1, cargo 1.41.0, Fedora 31 on kernel 5.5.7
abrown commented on Issue #1306:
I attempted removing the
FPR32
register class entirely but that doesn't seem to solve the problem. I mean, theFPR32
is gone from the logging but I still get aFPR8:%xmm2 is already free in...
error. Also, withoutFPR32
the endless loop seems to be gone and I can run bugpoint to get a 172-instruction version to reproduce: clif-reduced.txt.
abrown edited a comment on Issue #1306:
I attempted removing the
FPR32
register class entirely but that doesn't seem tototally solve the problem. I mean, theFPR32
is gone from the logging but I still get aFPR8:%xmm2 is already free in...
error when I attempt tocompile
. On the bright side, withoutFPR32
the endless loop seems to be gone and I can runbugpoint
to get a 172-instruction version to reproduce: clif-reduced.txt.
abrown commented on Issue #1306:
In an attempt to reduce this down to something comprehensible, I started playing around with small test cases:
test compile set enable_simd target x86_64 skylake function u0:35() system_v { block0: ; v0 = vconst.i32x4 [0 1 2 3] v0 = iconst.i64 0xdeadbeef v1 = load.i32x4 v0 v2 = load.i32x4 v0 v3 = load.i32x4 v0 v4 = load.i32x4 v0 v5 = load.i32x4 v0 v6 = load.i32x4 v0 v7 = load.i32x4 v0 v8 = load.i32x4 v0 v9 = load.i32x4 v0 v10 = load.i32x4 v0 v11 = load.i32x4 v0 v12 = load.i32x4 v0 v13 = load.i32x4 v0 v14 = load.i32x4 v0 v15 = load.i32x4 v0 v16 = load.i32x4 v0 ;; this causes the error ;; v17 = load.i32x4 v0 store v1, v0 store v2, v0 store v3, v0 store v4, v0 store v5, v0 store v6, v0 store v7, v0 store v8, v0 store v9, v0 store v10, v0 store v11, v0 store v12, v0 store v13, v0 store v14, v0 store v15, v0 store v16, v0 return }On the
remove-fpr32
branch (see #1318), I runcargo run -p cranelift-tools -- compile -dDpv scratch-tiny.clif
. This fails with:thread 'main' panicked at 'assertion failed: `(left == right)` left: `68`, right: `64`: Invalid registers for REX-less Op2 encoding', cranelift/codegen/src/isa/x86/binemit.rs:119:5 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.When I turn on logging, I see the following right before the failure:
DEBUG cranelift_codegen::context > Compiled: function u0:35(i64 fp [%rbp]) -> i64 fp [%rbp] system_v { ss0 = spill_slot 16, offset -32 ss1 = incoming_arg 16, offset -16 block0(v20: i64 [%rbp]): [RexOp1pushq#50] x86_push v20 [RexOp1copysp#8089] copy_special %rsp -> %rbp [RexOp1adjustsp_ib#d083] adjust_sp_down_imm 16 [RexOp1pu_id#b8,%rax] v0 = iconst.i64 0xdead_beef [DynRexOp2fld#410,%xmm0] v18 = load.i32x4 v0 [Op2fspillSib32#411,ss0] v1 = spill v18 [DynRexOp2fld#410,%xmm0] v2 = load.i32x4 v0 [DynRexOp2fld#410,%xmm1] v3 = load.i32x4 v0 [DynRexOp2fld#410,%xmm2] v4 = load.i32x4 v0 [DynRexOp2fld#410,%xmm3] v5 = load.i32x4 v0 [DynRexOp2fld#410,%xmm4] v6 = load.i32x4 v0 [DynRexOp2fld#410,%xmm5] v7 = load.i32x4 v0 [DynRexOp2fld#410,%xmm6] v8 = load.i32x4 v0 [DynRexOp2fld#410,%xmm7] v9 = load.i32x4 v0 [DynRexOp2fld#410,%xmm8] v10 = load.i32x4 v0 [DynRexOp2fld#410,%xmm9] v11 = load.i32x4 v0 [DynRexOp2fld#410,%xmm10] v12 = load.i32x4 v0 [DynRexOp2fld#410,%xmm11] v13 = load.i32x4 v0 [DynRexOp2fld#410,%xmm12] v14 = load.i32x4 v0 [DynRexOp2fld#410,%xmm13] v15 = load.i32x4 v0 [DynRexOp2fld#410,%xmm14] v16 = load.i32x4 v0 [DynRexOp2fld#410,%xmm15] v17 = load.i32x4 v0 [Op2frmov#428] regmove v2, %xmm0 -> %xmm15 [Op2ffillSib32#410,%xmm0] v19 = fill v1 [DynRexOp2fst#411] store v19, v0 [DynRexOp2fst#411] store v2, v0 [DynRexOp2fst#411] store v3, v0 [DynRexOp2fst#411] store v4, v0 [DynRexOp2fst#411] store v5, v0 [DynRexOp2fst#411] store v6, v0 [DynRexOp2fst#411] store v7, v0 [DynRexOp2fst#411] store v8, v0 [DynRexOp2fst#411] store v9, v0 [DynRexOp2fst#411] store v10, v0 [DynRexOp2fst#411] store v11, v0 [DynRexOp2fst#411] store v12, v0 [DynRexOp2fst#411] store v13, v0 [DynRexOp2fst#411] store v14, v0 [DynRexOp2fst#411] store v15, v0 [DynRexOp2fst#411] store v16, v0 [RexOp1adjustsp_ib#8083] adjust_sp_up_imm 16 [RexOp1popq#58,%rbp] v21 = x86_pop.i64 [Op1ret#c3] return v21 }The immediate spilling of
v1
seems to make sense, and theregmove v2, %xmm0->%xmm15
in order to fillv1
intov19
does too:v17/%xmm15
is never used again so we should be able to use%xmm15
forv2
's value. But when I debug theregmove
is the instruction causing the failure. Becauseload
andstore
have theinfer_rex()
meta-property, they can access all 16 FPR registers butregmove
, which has neitherinfer_rex()
norrex()
, can only access FPR8. When I giveregmove
arex()
prefix the snippet compiles. We have an issue to track that I need to add REX prefixes to a bunch of SIMD instructions, #1127, so I think I will submit a PR for that and then revisit this.Regardless of whether I fix #1127 or not, though, regalloc should know better than to try to move to a register it can't encode. For SIMD and floats,
regmove
uses thefrmov
recipe, which takes a single FPR as an input. I would have expected cranelift to know that the lack ofinfer_rex()
norrex()
meant thatregmove
would be limited to FPR8, but no. Perhaps it assumes that the source and destination registers are in the same register class.My plan for now is:
- fix #1127 by telling adding REX prefixes in a bunch of places
- see if the original clif.txt will compile with that fix and without FPR32 support
- if that works, try again with FPR32 support
abrown edited a comment on Issue #1306:
In an attempt to reduce this down to something comprehensible, I started playing around with small test cases:
test compile set enable_simd target x86_64 skylake function u0:35() system_v { block0: v0 = iconst.i64 0xdeadbeef v1 = load.i32x4 v0 v2 = load.i32x4 v0 v3 = load.i32x4 v0 v4 = load.i32x4 v0 v5 = load.i32x4 v0 v6 = load.i32x4 v0 v7 = load.i32x4 v0 v8 = load.i32x4 v0 v9 = load.i32x4 v0 v10 = load.i32x4 v0 v11 = load.i32x4 v0 v12 = load.i32x4 v0 v13 = load.i32x4 v0 v14 = load.i32x4 v0 v15 = load.i32x4 v0 v16 = load.i32x4 v0 ;; this causes the error v17 = load.i32x4 v0 store v1, v0 store v2, v0 store v3, v0 store v4, v0 store v5, v0 store v6, v0 store v7, v0 store v8, v0 store v9, v0 store v10, v0 store v11, v0 store v12, v0 store v13, v0 store v14, v0 store v15, v0 store v16, v0 return }On the
remove-fpr32
branch (see #1318), I runcargo run -p cranelift-tools -- compile -dDpv scratch-tiny.clif
. This fails with:thread 'main' panicked at 'assertion failed: `(left == right)` left: `68`, right: `64`: Invalid registers for REX-less Op2 encoding', cranelift/codegen/src/isa/x86/binemit.rs:119:5 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.When I turn on logging, I see the following right before the failure:
DEBUG cranelift_codegen::context > Compiled: function u0:35(i64 fp [%rbp]) -> i64 fp [%rbp] system_v { ss0 = spill_slot 16, offset -32 ss1 = incoming_arg 16, offset -16 block0(v20: i64 [%rbp]): [RexOp1pushq#50] x86_push v20 [RexOp1copysp#8089] copy_special %rsp -> %rbp [RexOp1adjustsp_ib#d083] adjust_sp_down_imm 16 [RexOp1pu_id#b8,%rax] v0 = iconst.i64 0xdead_beef [DynRexOp2fld#410,%xmm0] v18 = load.i32x4 v0 [Op2fspillSib32#411,ss0] v1 = spill v18 [DynRexOp2fld#410,%xmm0] v2 = load.i32x4 v0 [DynRexOp2fld#410,%xmm1] v3 = load.i32x4 v0 [DynRexOp2fld#410,%xmm2] v4 = load.i32x4 v0 [DynRexOp2fld#410,%xmm3] v5 = load.i32x4 v0 [DynRexOp2fld#410,%xmm4] v6 = load.i32x4 v0 [DynRexOp2fld#410,%xmm5] v7 = load.i32x4 v0 [DynRexOp2fld#410,%xmm6] v8 = load.i32x4 v0 [DynRexOp2fld#410,%xmm7] v9 = load.i32x4 v0 [DynRexOp2fld#410,%xmm8] v10 = load.i32x4 v0 [DynRexOp2fld#410,%xmm9] v11 = load.i32x4 v0 [DynRexOp2fld#410,%xmm10] v12 = load.i32x4 v0 [DynRexOp2fld#410,%xmm11] v13 = load.i32x4 v0 [DynRexOp2fld#410,%xmm12] v14 = load.i32x4 v0 [DynRexOp2fld#410,%xmm13] v15 = load.i32x4 v0 [DynRexOp2fld#410,%xmm14] v16 = load.i32x4 v0 [DynRexOp2fld#410,%xmm15] v17 = load.i32x4 v0 [Op2frmov#428] regmove v2, %xmm0 -> %xmm15 [Op2ffillSib32#410,%xmm0] v19 = fill v1 [DynRexOp2fst#411] store v19, v0 [DynRexOp2fst#411] store v2, v0 [DynRexOp2fst#411] store v3, v0 [DynRexOp2fst#411] store v4, v0 [DynRexOp2fst#411] store v5, v0 [DynRexOp2fst#411] store v6, v0 [DynRexOp2fst#411] store v7, v0 [DynRexOp2fst#411] store v8, v0 [DynRexOp2fst#411] store v9, v0 [DynRexOp2fst#411] store v10, v0 [DynRexOp2fst#411] store v11, v0 [DynRexOp2fst#411] store v12, v0 [DynRexOp2fst#411] store v13, v0 [DynRexOp2fst#411] store v14, v0 [DynRexOp2fst#411] store v15, v0 [DynRexOp2fst#411] store v16, v0 [RexOp1adjustsp_ib#8083] adjust_sp_up_imm 16 [RexOp1popq#58,%rbp] v21 = x86_pop.i64 [Op1ret#c3] return v21 }The immediate spilling of
v1
seems to make sense, and theregmove v2, %xmm0->%xmm15
in order to fillv1
intov19
does too:v17/%xmm15
is never used again so we should be able to use%xmm15
forv2
's value. But when I debug theregmove
is the instruction causing the failure. Becauseload
andstore
have theinfer_rex()
meta-property, they can access all 16 FPR registers butregmove
, which has neitherinfer_rex()
norrex()
, can only access FPR8. When I giveregmove
arex()
prefix the snippet compiles. We have an issue to track that I need to add REX prefixes to a bunch of SIMD instructions, #1127, so I think I will submit a PR for that and then revisit this.Regardless of whether I fix #1127 or not, though, regalloc should know better than to try to move to a register it can't encode. For SIMD and floats,
regmove
uses thefrmov
recipe, which takes a single FPR as an input. I would have expected cranelift to know that the lack ofinfer_rex()
norrex()
meant thatregmove
would be limited to FPR8, but no. Perhaps it assumes that the source and destination registers are in the same register class.My plan for now is:
- fix #1127 by telling adding REX prefixes in a bunch of places
- see if the original clif.txt will compile with that fix and without FPR32 support
- if that works, try again with FPR32 support
abrown commented on Issue #1306:
#1318 and #1335, when merged, should clarify this a bit more.
bjorn3 commented on Issue #1306:
The new backend framework uses a different register allocator.
abrown closed Issue #1306:
As I was attempting to compile some SIMD Wasm code, I discovered that wasmtime would enter an infinite loop in the register allocator. With logging enabled, the register allocator gets stuck with the number of global registers:
... DEBUG cranelift_codegen::regalloc::solver > add_killed_var(v6985:FPR, from=%xmm0) DEBUG cranelift_codegen::regalloc::solver > -> new var: v6985(FPR, from %xmm0, in) DEBUG cranelift_codegen::regalloc::solver > real_solve for Solver { inputs_done: true, in: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] assignments: [] vars: [v191(FPR, out, global, def), v6985(FPR, from %xmm0, in, 1)] moves: [] } DEBUG cranelift_codegen::regalloc::coloring > Not enough global registers for v191, trying as local DEBUG cranelift_codegen::regalloc::solver > real_solve for Solver { inputs_done: true, in: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] out: [ GPR: ---b--sd89012345 FPR32: 0----------------789012345678901 FPR: 0--------------- FLAG: f ] assignments: [] vars: [v191(FPR, out, def, 1), v6985(FPR, from %xmm0, in, 1)] moves: [] } DEBUG cranelift_codegen::regalloc::coloring > color v191 -> %xmm0 (global to be replaced) DEBUG cranelift_codegen::regalloc::coloring > Replacing global defs on v191 = copy.i64x2 v6985 DEBUG cranelift_codegen::regalloc::coloring > + v191 = copy.i64x2 v6986 with v6986 in %xmm0 DEBUG cranelift_codegen::regalloc::coloring > Done: v6986 = copy.i64x2 v6985 DEBUG cranelift_codegen::regalloc::coloring > Coloring v191 = copy.i64x2 v6986 from [ GPR: ---b--sd89012345 FPR32: -----------------789012345678901 FPR: ---------------- FLAG: f ] DEBUG cranelift_codegen::regalloc::coloring > kill v6986 in %xmm0 (local FPR) DEBUG cranelift_codegen::regalloc::coloring > glob [ GPR: a--b--sd89012345 FPR32: ----------------6789012345678901 FPR: ---------------- FLAG: f ] ...
You can see how
v6986
is replacingv6985
above and this is the pattern that will repeat forever. You may notice as well that the code above is using 16 FPR registers which I temporarily added at https://github.com/abrown/wasmtime/blob/7b0463a24cdcf525057349c53c6a46a436c21a80/cranelift/codegen/meta/src/isa/x86/encodings.rs#L346 to see if relieving register pressure would get rid of the issue. It doesn't.I extracted the function that I believe is causing the issue into the attached clif.txt and you can see below a couple of different ways of looking at the problem (
bugpoint
loops forever,compile
fails). I can attach the original Wasm if that would be helpful but that introduces even more functions to worry about.
- What are the steps to reproduce the issue? Can you include a CLIF test case, ideally reduced with the
bugpoint
clif-util command?
clif-util bugpoint clif.txt
loops forever.
- What do you expect to happen? What does actually happen? Does it panic, and if so, with which assertion?
To panic, run
clif-util compile -dDpv clif.txt
:$ target/debug/clif-util compile -dDpv clif.txt thread 'main' panicked at 'FPR8:%xmm5 is already free in [ GPR: -------d89012345 FPR32: -----5---------56789012345678901 FPR: -----5---------5 FLAG: f ]', cranelift/codegen/src/regalloc/register_set.rs:73:9
- Which Cranelift version / commit hash / branch are you using?
https://github.com/abrown/wasmtime/tree/fix-simd-locals/cranelift
- If relevant, can you include some extra information about your environment?
(Rust version, operating system, architecture...)rustc 1.41.1, cargo 1.41.0, Fedora 31 on kernel 5.5.7
Last updated: Oct 23 2024 at 20:03 UTC