mmcloughlin opened issue #12368:
The following
lower_fmlarules enable user-controlled recursion.These rules will peel away an arbitrary number of
fnegarguments to anfma
instruction, alternating between a fused-multiply-add and
fused-multiply-subtract operation.
.clifTest CaseGenerate a CLIF function with a stack of
fnegarguments using a script like
fneg.py:import sys def generate_fneg_fma_rec(count): print("function %fma_fneg_rec(f32x4, f32x4, f32x4) -> f32x4 {") print("block0(v1: f32x4, v2: f32x4, v3: f32x4):") n = 4 for _ in range(count): print(f" v{n} = fneg v{n - 2}") print(f" v{n + 1} = fneg v{n - 1}") n += 2 print(f" v{n} = fma v{n - 2}, v{n - 1}, v1") print(f" return v{n}") print("}") def main(): count = int(sys.argv[1]) if len(sys.argv) > 1 else 1 generate_fneg_fma_rec(count) if __name__ == "__main__": main()Steps to Reproduce
Generate a large instance and compile with
clif-util(atv40.0.2):$ python3 fneg.py 100000 >fneg100000.clif $ cargo run --bin clif-util -- compile --target aarch64 -p --disasm fneg100000.clif ...Expected Results
Expect function to compile and execute successfully. Ideally, it would be
optimized to a singlefma.Actual Results
Observe that rule recursion leads to stack overflow, for a sufficiently large
instance.thread 'main' (2897042) has overflowed its stack fatal runtime error: stack overflow, aborting fish: Job 1, 'cargo run --bin clif-util -- co…' terminated by signal SIGABRT (Abort)Versions and Environment
Cranelift version or commit: v40.0.2
Operating system: Mac OSX
Architecture: AArch64
Extra Info
Related #12333
mmcloughlin added the bug label to Issue #12368.
mmcloughlin added the cranelift label to Issue #12368.
mmcloughlin edited issue #12368:
The following
lower_fmlarules enable user-controlled recursion.These rules will peel away an arbitrary number of
fnegarguments to anfma
instruction, alternating between a fused-multiply-add and
fused-multiply-subtract operation.
.clifTest CaseGenerate a CLIF function with a stack of
fnegarguments using a script like
fneg.py:import sys def generate_fneg_fma_rec(count): print("function %fma_fneg_rec(f32x4, f32x4, f32x4) -> f32x4 {") print("block0(v1: f32x4, v2: f32x4, v3: f32x4):") n = 4 for _ in range(count): print(f" v{n} = fneg v{n - 2}") print(f" v{n + 1} = fneg v{n - 1}") n += 2 print(f" v{n} = fma v{n - 2}, v{n - 1}, v1") print(f" return v{n}") print("}") def main(): count = int(sys.argv[1]) if len(sys.argv) > 1 else 1 generate_fneg_fma_rec(count) if __name__ == "__main__": main()Steps to Reproduce
Generate a large instance:
python3 fneg.py 100000 >fneg100000.clifCompile with
clif-util(atv40.0.2):cargo run --bin clif-util -- compile --target aarch64 -p --disasm fneg100000.clifExpected Results
Expect function to compile and execute successfully. Ideally, it would be
optimized to a singlefma.Actual Results
Observe that rule recursion leads to stack overflow, for a sufficiently large
instance.thread 'main' (2897042) has overflowed its stack fatal runtime error: stack overflow, aborting fish: Job 1, 'cargo run --bin clif-util -- co…' terminated by signal SIGABRT (Abort)Versions and Environment
Cranelift version or commit: v40.0.2
Operating system: Mac OSX
Architecture: AArch64
Extra Info
Related #12333
mmcloughlin edited issue #12368:
The following
lower_fmlarules enable user-controlled recursion.;; Special case: if one of the multiplicands is `fneg` then peel that away, ;; reverse the operation being performed, and then recurse on `lower_fmla` ;; again to generate the actual instruction. ;; ;; Note that these are the highest priority cases for `lower_fmla` to peel ;; away as many `fneg` operations as possible. (rule 5 (lower_fmla op (fneg x) y z size) (lower_fmla (neg_fmla op) x y z size)) (rule 6 (lower_fmla op x (fneg y) z size) (lower_fmla (neg_fmla op) x y z size))These rules will peel away an arbitrary number of
fnegarguments to anfma
instruction, alternating between a fused-multiply-add and
fused-multiply-subtract operation.
.clifTest CaseGenerate a CLIF function with a stack of
fnegarguments using a script like
fneg.py:import sys def generate_fneg_fma_rec(count): print("function %fma_fneg_rec(f32x4, f32x4, f32x4) -> f32x4 {") print("block0(v1: f32x4, v2: f32x4, v3: f32x4):") n = 4 for _ in range(count): print(f" v{n} = fneg v{n - 2}") print(f" v{n + 1} = fneg v{n - 1}") n += 2 print(f" v{n} = fma v{n - 2}, v{n - 1}, v1") print(f" return v{n}") print("}") def main(): count = int(sys.argv[1]) if len(sys.argv) > 1 else 1 generate_fneg_fma_rec(count) if __name__ == "__main__": main()Steps to Reproduce
Generate a large instance:
python3 fneg.py 100000 >fneg100000.clifCompile with
clif-util(atv40.0.2):cargo run --bin clif-util -- compile --target aarch64 -p --disasm fneg100000.clifExpected Results
Expect function to compile and execute successfully. Ideally, it would be
optimized to a singlefma.Actual Results
Observe that rule recursion leads to stack overflow, for a sufficiently large
instance.thread 'main' (2897042) has overflowed its stack fatal runtime error: stack overflow, aborting fish: Job 1, 'cargo run --bin clif-util -- co…' terminated by signal SIGABRT (Abort)Versions and Environment
Cranelift version or commit: v40.0.2
Operating system: Mac OSX
Architecture: AArch64
Extra Info
Related #12333
alexcrichton closed issue #12368:
The following
lower_fmlarules enable user-controlled recursion.;; Special case: if one of the multiplicands is `fneg` then peel that away, ;; reverse the operation being performed, and then recurse on `lower_fmla` ;; again to generate the actual instruction. ;; ;; Note that these are the highest priority cases for `lower_fmla` to peel ;; away as many `fneg` operations as possible. (rule 5 (lower_fmla op (fneg x) y z size) (lower_fmla (neg_fmla op) x y z size)) (rule 6 (lower_fmla op x (fneg y) z size) (lower_fmla (neg_fmla op) x y z size))These rules will peel away an arbitrary number of
fnegarguments to anfma
instruction, alternating between a fused-multiply-add and
fused-multiply-subtract operation.
.clifTest CaseGenerate a CLIF function with a stack of
fnegarguments using a script like
fneg.py:import sys def generate_fneg_fma_rec(count): print("function %fma_fneg_rec(f32x4, f32x4, f32x4) -> f32x4 {") print("block0(v1: f32x4, v2: f32x4, v3: f32x4):") n = 4 for _ in range(count): print(f" v{n} = fneg v{n - 2}") print(f" v{n + 1} = fneg v{n - 1}") n += 2 print(f" v{n} = fma v{n - 2}, v{n - 1}, v1") print(f" return v{n}") print("}") def main(): count = int(sys.argv[1]) if len(sys.argv) > 1 else 1 generate_fneg_fma_rec(count) if __name__ == "__main__": main()Steps to Reproduce
Generate a large instance:
python3 fneg.py 100000 >fneg100000.clifCompile with
clif-util(atv40.0.2):cargo run --bin clif-util -- compile --target aarch64 -p --disasm fneg100000.clifExpected Results
Expect function to compile and execute successfully. Ideally, it would be
optimized to a singlefma.Actual Results
Observe that rule recursion leads to stack overflow, for a sufficiently large
instance.thread 'main' (2897042) has overflowed its stack fatal runtime error: stack overflow, aborting fish: Job 1, 'cargo run --bin clif-util -- co…' terminated by signal SIGABRT (Abort)Versions and Environment
Cranelift version or commit: v40.0.2
Operating system: Mac OSX
Architecture: AArch64
Extra Info
Related #12333
Last updated: Jan 29 2026 at 13:25 UTC