bjorn3 opened Issue #1310:
- What is the feature or code improvement you would like to do in Cranelift?
udiv_imm v0, 4
should be optimized toshr_imm v0, 2
, whileurem_imm v0, 4
should be optimized toband_imm v0, 0b11
.- What is the value of adding this in Cranelift? This improves the runtime performance of the generated code. The modulo operator is for example used to test pointers for alignment in libcore.
Do you have an implementation plan, and/or ideas for data structures or
algorithms to use? N/AHave you considered alternative implementations? If so, how are they better
or worse than your proposal? I first considered to do this optimization while during codegen of the clif ir. Unfortunately the required alignment is stored in a variable, socranelift_frontend
makes it an ebb param until I finalize theFunctionBuilder
.
bnjbvr commented on Issue #1310:
This optimization has been implemented already, and it's run if opt_level is not None (as part of simple_preopt). The following test case:
set opt_level=speed target x86_64 function %f(i64) -> i64 { ebb0(v0: i64): v1 = udiv_imm v0, 4 return v1 }Is compiled into this:
function %f(i64 [%rdi], i64 fp [%rbp]) -> i64 [%rax], i64 fp [%rbp] fast { ss0 = incoming_arg 16, offset -16 ebb0(v0: i64 [%rdi], v2: i64 [%rbp]): [RexOp1pushq#50] x86_push v2 [RexOp1copysp#8089] copy_special %rsp -> %rbp [DynRexOp1r_ib#d0c1,%rdi] v1 = ushr_imm v0, 2 [RexOp1rmov#8089] regmove v1, %rdi -> %rax [RexOp1popq#58,%rbp] v3 = x86_pop.i64 [Op1ret#c3] return v1, v3 }Are you running into a case where it should kick in but it doesn't, while opt_level is set to speed or speed+size?
bnjbvr labeled Issue #1310:
- What is the feature or code improvement you would like to do in Cranelift?
udiv_imm v0, 4
should be optimized toshr_imm v0, 2
, whileurem_imm v0, 4
should be optimized toband_imm v0, 0b11
.- What is the value of adding this in Cranelift? This improves the runtime performance of the generated code. The modulo operator is for example used to test pointers for alignment in libcore.
Do you have an implementation plan, and/or ideas for data structures or
algorithms to use? N/AHave you considered alternative implementations? If so, how are they better
or worse than your proposal? I first considered to do this optimization while during codegen of the clif ir. Unfortunately the required alignment is stored in a variable, socranelift_frontend
makes it an ebb param until I finalize theFunctionBuilder
.
bjorn3 commented on Issue #1310:
I can't use opt-level=speed as jump tables don't work with LICM. I searched for this optimization in cranelift_preopt, but I should have looked for this optimization in cranelift_codegen itself too.
bjorn3 closed Issue #1310:
- What is the feature or code improvement you would like to do in Cranelift?
udiv_imm v0, 4
should be optimized toshr_imm v0, 2
, whileurem_imm v0, 4
should be optimized toband_imm v0, 0b11
.- What is the value of adding this in Cranelift? This improves the runtime performance of the generated code. The modulo operator is for example used to test pointers for alignment in libcore.
Do you have an implementation plan, and/or ideas for data structures or
algorithms to use? N/AHave you considered alternative implementations? If so, how are they better
or worse than your proposal? I first considered to do this optimization while during codegen of the clif ir. Unfortunately the required alignment is stored in a variable, socranelift_frontend
makes it an ebb param until I finalize theFunctionBuilder
.
Last updated: Dec 23 2024 at 13:07 UTC