cranelift / Issue #1310 [preopt] Optimize udiv and urem w... · git-cranelift

What is the feature or code improvement you would like to do in Cranelift? udiv_imm v0, 4 should be optimized to shr_imm v0, 2, while urem_imm v0, 4 should be optimized to band_imm v0, 0b11.
What is the value of adding this in Cranelift? This improves the runtime performance of the generated code. The modulo operator is for example used to test pointers for alignment in libcore.
Do you have an implementation plan, and/or ideas for data structures or
algorithms to use? N/A
Have you considered alternative implementations? If so, how are they better
or worse than your proposal? I first considered to do this optimization while during codegen of the clif ir. Unfortunately the required alignment is stored in a variable, so cranelift_frontend makes it an ebb param until I finalize the FunctionBuilder.

This optimization has been implemented already, and it's run if opt_level is not None (as part of simple_preopt). The following test case:

set opt_level=speed
target x86_64

function %f(i64) -> i64 {
ebb0(v0: i64):
    v1 = udiv_imm v0, 4
    return v1
}

Is compiled into this:

function %f(i64 [%rdi], i64 fp [%rbp]) -> i64 [%rax], i64 fp [%rbp] fast {
    ss0 = incoming_arg 16, offset -16

                                ebb0(v0: i64 [%rdi], v2: i64 [%rbp]):
[RexOp1pushq#50]                    x86_push v2
[RexOp1copysp#8089]                 copy_special %rsp -> %rbp
[DynRexOp1r_ib#d0c1,%rdi]           v1 = ushr_imm v0, 2
[RexOp1rmov#8089]                   regmove v1, %rdi -> %rax
[RexOp1popq#58,%rbp]                v3 = x86_pop.i64
[Op1ret#c3]                         return v1, v3
}

Are you running into a case where it should kick in but it doesn't, while opt_level is set to speed or speed+size?

What is the feature or code improvement you would like to do in Cranelift? udiv_imm v0, 4 should be optimized to shr_imm v0, 2, while urem_imm v0, 4 should be optimized to band_imm v0, 0b11.
What is the value of adding this in Cranelift? This improves the runtime performance of the generated code. The modulo operator is for example used to test pointers for alignment in libcore.
Do you have an implementation plan, and/or ideas for data structures or
algorithms to use? N/A
Have you considered alternative implementations? If so, how are they better
or worse than your proposal? I first considered to do this optimization while during codegen of the clif ir. Unfortunately the required alignment is stored in a variable, so cranelift_frontend makes it an ebb param until I finalize the FunctionBuilder.

I can't use opt-level=speed as jump tables don't work with LICM. I searched for this optimization in cranelift_preopt, but I should have looked for this optimization in cranelift_codegen itself too.

What is the feature or code improvement you would like to do in Cranelift? udiv_imm v0, 4 should be optimized to shr_imm v0, 2, while urem_imm v0, 4 should be optimized to band_imm v0, 0b11.
What is the value of adding this in Cranelift? This improves the runtime performance of the generated code. The modulo operator is for example used to test pointers for alignment in libcore.
Do you have an implementation plan, and/or ideas for data structures or
algorithms to use? N/A
Have you considered alternative implementations? If so, how are they better
or worse than your proposal? I first considered to do this optimization while during codegen of the clif ir. Unfortunately the required alignment is stored in a variable, so cranelift_frontend makes it an ebb param until I finalize the FunctionBuilder.

Last updated: Apr 16 2025 at 09:03 UTC

Stream: git-cranelift

Topic: cranelift / Issue #1310 [preopt] Optimize udiv and urem w...

GitHub (Dec 24 2019 at 12:20):

GitHub (Jan 06 2020 at 14:02):

GitHub (Jan 06 2020 at 14:03):

GitHub (Jan 06 2020 at 14:30):

GitHub (Jan 06 2020 at 14:30):