avanhatt opened issue #5903:
Our prototype verifier seems to have found a completeness/performance (not correctness) bug in the handling of some narrow immediate values in
aarch64
.
aarch64
has the following rules that check if the _negated_ value of a constant fits in an immediate (Imm12
), doing a corresponding switch between add and subtract:;; Same as the previous special case, except we can switch the subtraction to an ;; addition if the negated immediate fits in 12 bits. (rule 2 (lower (has_type (fits_in_64 ty) (isub x (imm12_from_negated_value y)))) (add_imm ty x y)) ;; Same as the previous special cases, except we can switch the addition to a ;; subtraction if the negated immediate fits in 12 bits. (rule 2 (lower (has_type (fits_in_64 ty) (iadd x (imm12_from_negated_value y)))) (sub_imm ty x y)) (rule 3 (lower (has_type (fits_in_64 ty) (iadd (imm12_from_negated_value x) y))) (sub_imm ty y x))
These rules should work for all
fits_in_64
types (i8, i16, i32, i64), but they seem to only work as intended for i64 (they aren't stricly needed for i8, since the versions of these rules without negation trigger there since i8 values always fit inImm12
).In particular, our prototype verifier found that for i16 and i32, these rules _only_ apply if the operand extracted with
imm12_from_negated_value
is exactly 0, rather than values in the range Cranelift lowers toImm12
: constants with negated valuesval
:(val < 0xfff) || (val < 0xfff_000 && (val & 0xfff == 0))
.For example, adding a constant -1 to a value should always be able to be lowered to subtracting an immediate 1 from the value, but as shown in the test cases below, clif only uses
sub
for i8 and i64 in this case and otherwise keeps amov
andadd
.What I think is the bug
As discussed in https://github.com/bytecodealliance/wasmtime/issues/5700, Clif semantics are that narrow (i8, i64, i32)
iconst
are stored in an underlyingu64
that is zero-extended, not sign-extended, from the narrow bits.
imm12_from_negated_value
negates this zero-extended value as an i64 then checks whether that fits into an immediate:fn imm12_from_negated_u64(&mut self, n: u64) -> Option<Imm12> { Imm12::maybe_from_u64((n as i64).wrapping_neg() as u64) }
Any narrow negative constant, after being zero extended, has left-filled zeros, so negating the value no longer produces a constant that is unsigned
< 0xfff
or< 0xfff_000
.When handling narrow iconst values,
imm12_from_negated_value
should negate the narrow value then zero extend in this case. In general, ISLE terms operating on theu64
may need to see theu32
,u16
, etc, instead. I'll work on a PR for at least fixingimm12_from_negated_value
.
.clif
Test Case
imm12-64.clif
test interpret test run target aarch64 function %a(i64) -> i64 { block0(v0: i64): v1 = iconst.i64 -1 v3 = iadd v0, v1 return v3 } ; run: %a(0) == -1
Because the
.clif
parser currently sign-extends instead of zero-extends values, for narrow constants I use the decimal -1 value as an unsigned u64 to construct the iconst:
imm12-32.clif
test interpret test run target aarch64 function %a(i32) -> i32 { block0(v0: i32): v1 = iconst.i32 4294967295 v3 = iadd v0, v1 return v3 } ; run: %a(0) == -1
Steps to Reproduce
cargo run -- compile --target aarch64 -D imm12-64.clif cargo run -- compile --target aarch64 -D imm12-32.clif cargo run -- compile --target aarch64 -D imm12-16.clif cargo run -- compile --target aarch64 -D imm12-8.clif
Expected Results
This function gets rewritten to:
sub x0, x0, #1 ret
in all cases.
Actual Results
cargo run -- compile --target aarch64 -D imm12-64.clif .byte 0, 4, 0, 209, 192, 3, 95, 214 Disassembly of 8 bytes: 0: 00 04 00 d1 sub x0, x0, #1 <-- good use of immediate 4: c0 03 5f d6 ret cargo run -- compile --target aarch64 -D imm12-32.clif .byte 2, 0, 128, 18, 0, 0, 2, 11, 192, 3, 95, 214 Disassembly of 12 bytes: 0: 02 00 80 12 mov w2, #-1 4: 00 00 02 0b add w0, w0, w2 <-- no immediate 8: c0 03 5f d6 ret cargo run -- compile --target aarch64 -D imm12-16.clif .byte 226, 255, 159, 82, 0, 0, 2, 11, 192, 3, 95, 214 Disassembly of 12 bytes: 0: e2 ff 9f 52 mov w2, #0xffff 4: 00 00 02 0b add w0, w0, w2 <-- no immediate 8: c0 03 5f d6 ret cargo run -- compile --target aarch64 -D imm12-8.clif .byte 0, 252, 3, 17, 192, 3, 95, 214 Disassembly of 8 bytes: 0: 00 fc 03 11 add w0, w0, #0xff <-- good use of immediate because #0xff is small 4: c0 03 5f d6 ret
Versions and Environment
Cranelift version or commit: c4a2c1e81893a41b363ba135b9fd79fb77eb0531
Operating system: macOS
Architecture: aarch64
avanhatt labeled issue #5903:
Our prototype verifier seems to have found a completeness/performance (not correctness) bug in the handling of some narrow immediate values in
aarch64
.
aarch64
has the following rules that check if the _negated_ value of a constant fits in an immediate (Imm12
), doing a corresponding switch between add and subtract:;; Same as the previous special case, except we can switch the subtraction to an ;; addition if the negated immediate fits in 12 bits. (rule 2 (lower (has_type (fits_in_64 ty) (isub x (imm12_from_negated_value y)))) (add_imm ty x y)) ;; Same as the previous special cases, except we can switch the addition to a ;; subtraction if the negated immediate fits in 12 bits. (rule 2 (lower (has_type (fits_in_64 ty) (iadd x (imm12_from_negated_value y)))) (sub_imm ty x y)) (rule 3 (lower (has_type (fits_in_64 ty) (iadd (imm12_from_negated_value x) y))) (sub_imm ty y x))
These rules should work for all
fits_in_64
types (i8, i16, i32, i64), but they seem to only work as intended for i64 (they aren't stricly needed for i8, since the versions of these rules without negation trigger there since i8 values always fit inImm12
).In particular, our prototype verifier found that for i16 and i32, these rules _only_ apply if the operand extracted with
imm12_from_negated_value
is exactly 0, rather than values in the range Cranelift lowers toImm12
: constants with negated valuesval
:(val < 0xfff) || (val < 0xfff_000 && (val & 0xfff == 0))
.For example, adding a constant -1 to a value should always be able to be lowered to subtracting an immediate 1 from the value, but as shown in the test cases below, clif only uses
sub
for i8 and i64 in this case and otherwise keeps amov
andadd
.What I think is the bug
As discussed in https://github.com/bytecodealliance/wasmtime/issues/5700, Clif semantics are that narrow (i8, i64, i32)
iconst
are stored in an underlyingu64
that is zero-extended, not sign-extended, from the narrow bits.
imm12_from_negated_value
negates this zero-extended value as an i64 then checks whether that fits into an immediate:fn imm12_from_negated_u64(&mut self, n: u64) -> Option<Imm12> { Imm12::maybe_from_u64((n as i64).wrapping_neg() as u64) }
Any narrow negative constant, after being zero extended, has left-filled zeros, so negating the value no longer produces a constant that is unsigned
< 0xfff
or< 0xfff_000
.When handling narrow iconst values,
imm12_from_negated_value
should negate the narrow value then zero extend in this case. In general, ISLE terms operating on theu64
may need to see theu32
,u16
, etc, instead. I'll work on a PR for at least fixingimm12_from_negated_value
.
.clif
Test Case
imm12-64.clif
test interpret test run target aarch64 function %a(i64) -> i64 { block0(v0: i64): v1 = iconst.i64 -1 v3 = iadd v0, v1 return v3 } ; run: %a(0) == -1
Because the
.clif
parser currently sign-extends instead of zero-extends values, for narrow constants I use the decimal -1 value as an unsigned u64 to construct the iconst:
imm12-32.clif
test interpret test run target aarch64 function %a(i32) -> i32 { block0(v0: i32): v1 = iconst.i32 4294967295 v3 = iadd v0, v1 return v3 } ; run: %a(0) == -1
Steps to Reproduce
cargo run -- compile --target aarch64 -D imm12-64.clif cargo run -- compile --target aarch64 -D imm12-32.clif cargo run -- compile --target aarch64 -D imm12-16.clif cargo run -- compile --target aarch64 -D imm12-8.clif
Expected Results
This function gets rewritten to:
sub x0, x0, #1 ret
in all cases.
Actual Results
cargo run -- compile --target aarch64 -D imm12-64.clif .byte 0, 4, 0, 209, 192, 3, 95, 214 Disassembly of 8 bytes: 0: 00 04 00 d1 sub x0, x0, #1 <-- good use of immediate 4: c0 03 5f d6 ret cargo run -- compile --target aarch64 -D imm12-32.clif .byte 2, 0, 128, 18, 0, 0, 2, 11, 192, 3, 95, 214 Disassembly of 12 bytes: 0: 02 00 80 12 mov w2, #-1 4: 00 00 02 0b add w0, w0, w2 <-- no immediate 8: c0 03 5f d6 ret cargo run -- compile --target aarch64 -D imm12-16.clif .byte 226, 255, 159, 82, 0, 0, 2, 11, 192, 3, 95, 214 Disassembly of 12 bytes: 0: e2 ff 9f 52 mov w2, #0xffff 4: 00 00 02 0b add w0, w0, w2 <-- no immediate 8: c0 03 5f d6 ret cargo run -- compile --target aarch64 -D imm12-8.clif .byte 0, 252, 3, 17, 192, 3, 95, 214 Disassembly of 8 bytes: 0: 00 fc 03 11 add w0, w0, #0xff <-- good use of immediate because #0xff is small 4: c0 03 5f d6 ret
Versions and Environment
Cranelift version or commit: c4a2c1e81893a41b363ba135b9fd79fb77eb0531
Operating system: macOS
Architecture: aarch64
avanhatt labeled issue #5903:
Our prototype verifier seems to have found a completeness/performance (not correctness) bug in the handling of some narrow immediate values in
aarch64
.
aarch64
has the following rules that check if the _negated_ value of a constant fits in an immediate (Imm12
), doing a corresponding switch between add and subtract:;; Same as the previous special case, except we can switch the subtraction to an ;; addition if the negated immediate fits in 12 bits. (rule 2 (lower (has_type (fits_in_64 ty) (isub x (imm12_from_negated_value y)))) (add_imm ty x y)) ;; Same as the previous special cases, except we can switch the addition to a ;; subtraction if the negated immediate fits in 12 bits. (rule 2 (lower (has_type (fits_in_64 ty) (iadd x (imm12_from_negated_value y)))) (sub_imm ty x y)) (rule 3 (lower (has_type (fits_in_64 ty) (iadd (imm12_from_negated_value x) y))) (sub_imm ty y x))
These rules should work for all
fits_in_64
types (i8, i16, i32, i64), but they seem to only work as intended for i64 (they aren't stricly needed for i8, since the versions of these rules without negation trigger there since i8 values always fit inImm12
).In particular, our prototype verifier found that for i16 and i32, these rules _only_ apply if the operand extracted with
imm12_from_negated_value
is exactly 0, rather than values in the range Cranelift lowers toImm12
: constants with negated valuesval
:(val < 0xfff) || (val < 0xfff_000 && (val & 0xfff == 0))
.For example, adding a constant -1 to a value should always be able to be lowered to subtracting an immediate 1 from the value, but as shown in the test cases below, clif only uses
sub
for i8 and i64 in this case and otherwise keeps amov
andadd
.What I think is the bug
As discussed in https://github.com/bytecodealliance/wasmtime/issues/5700, Clif semantics are that narrow (i8, i64, i32)
iconst
are stored in an underlyingu64
that is zero-extended, not sign-extended, from the narrow bits.
imm12_from_negated_value
negates this zero-extended value as an i64 then checks whether that fits into an immediate:fn imm12_from_negated_u64(&mut self, n: u64) -> Option<Imm12> { Imm12::maybe_from_u64((n as i64).wrapping_neg() as u64) }
Any narrow negative constant, after being zero extended, has left-filled zeros, so negating the value no longer produces a constant that is unsigned
< 0xfff
or< 0xfff_000
.When handling narrow iconst values,
imm12_from_negated_value
should negate the narrow value then zero extend in this case. In general, ISLE terms operating on theu64
may need to see theu32
,u16
, etc, instead. I'll work on a PR for at least fixingimm12_from_negated_value
.
.clif
Test Case
imm12-64.clif
test interpret test run target aarch64 function %a(i64) -> i64 { block0(v0: i64): v1 = iconst.i64 -1 v3 = iadd v0, v1 return v3 } ; run: %a(0) == -1
Because the
.clif
parser currently sign-extends instead of zero-extends values, for narrow constants I use the decimal -1 value as an unsigned u64 to construct the iconst:
imm12-32.clif
test interpret test run target aarch64 function %a(i32) -> i32 { block0(v0: i32): v1 = iconst.i32 4294967295 v3 = iadd v0, v1 return v3 } ; run: %a(0) == -1
Steps to Reproduce
cargo run -- compile --target aarch64 -D imm12-64.clif cargo run -- compile --target aarch64 -D imm12-32.clif cargo run -- compile --target aarch64 -D imm12-16.clif cargo run -- compile --target aarch64 -D imm12-8.clif
Expected Results
This function gets rewritten to:
sub x0, x0, #1 ret
in all cases.
Actual Results
cargo run -- compile --target aarch64 -D imm12-64.clif .byte 0, 4, 0, 209, 192, 3, 95, 214 Disassembly of 8 bytes: 0: 00 04 00 d1 sub x0, x0, #1 <-- good use of immediate 4: c0 03 5f d6 ret cargo run -- compile --target aarch64 -D imm12-32.clif .byte 2, 0, 128, 18, 0, 0, 2, 11, 192, 3, 95, 214 Disassembly of 12 bytes: 0: 02 00 80 12 mov w2, #-1 4: 00 00 02 0b add w0, w0, w2 <-- no immediate 8: c0 03 5f d6 ret cargo run -- compile --target aarch64 -D imm12-16.clif .byte 226, 255, 159, 82, 0, 0, 2, 11, 192, 3, 95, 214 Disassembly of 12 bytes: 0: e2 ff 9f 52 mov w2, #0xffff 4: 00 00 02 0b add w0, w0, w2 <-- no immediate 8: c0 03 5f d6 ret cargo run -- compile --target aarch64 -D imm12-8.clif .byte 0, 252, 3, 17, 192, 3, 95, 214 Disassembly of 8 bytes: 0: 00 fc 03 11 add w0, w0, #0xff <-- good use of immediate because #0xff is small 4: c0 03 5f d6 ret
Versions and Environment
Cranelift version or commit: c4a2c1e81893a41b363ba135b9fd79fb77eb0531
Operating system: macOS
Architecture: aarch64
jameysharp commented on issue #5903:
I agree that this is a bug and I think you've diagnosed the root cause correctly. Thanks for the example inputs, which illustrate the bug very clearly! These would make great precise-output filetests in
cranelift/filetests/filetests/isa/aarch64/
, if you'd like to open a PR for that: we'd test for the current sub-optimal behavior for now, and then when this is fixed we'll see the tests "fail", highlighting that the fix worked.I guess
imm12_from_negated_value
should be a partial constructor, not an extractor, so it can usei64_sextend_imm64
and pass the value's type through. If we add ani64_neg
constructor to the prelude, then I think we can get rid ofimm12_from_negated_u64
and write this instead:(decl partial imm12_from_negated_value (Value) Imm12) (rule (imm12_from_negated_value (has_type ty (iconst n))) (if-let (imm12_from_u64 imm) (i64_as_u64 (i64_neg (i64_sextend_imm64 ty n)))) imm)
I think we also need to implement #3059 so we can have some confidence that we're treating high bits consistently everywhere.
If anyone wants to pick this up I'm happy to discuss in more detail how to fix it!
jameysharp labeled issue #5903:
Our prototype verifier seems to have found a completeness/performance (not correctness) bug in the handling of some narrow immediate values in
aarch64
.
aarch64
has the following rules that check if the _negated_ value of a constant fits in an immediate (Imm12
), doing a corresponding switch between add and subtract:;; Same as the previous special case, except we can switch the subtraction to an ;; addition if the negated immediate fits in 12 bits. (rule 2 (lower (has_type (fits_in_64 ty) (isub x (imm12_from_negated_value y)))) (add_imm ty x y)) ;; Same as the previous special cases, except we can switch the addition to a ;; subtraction if the negated immediate fits in 12 bits. (rule 2 (lower (has_type (fits_in_64 ty) (iadd x (imm12_from_negated_value y)))) (sub_imm ty x y)) (rule 3 (lower (has_type (fits_in_64 ty) (iadd (imm12_from_negated_value x) y))) (sub_imm ty y x))
These rules should work for all
fits_in_64
types (i8, i16, i32, i64), but they seem to only work as intended for i64 (they aren't stricly needed for i8, since the versions of these rules without negation trigger there since i8 values always fit inImm12
).In particular, our prototype verifier found that for i16 and i32, these rules _only_ apply if the operand extracted with
imm12_from_negated_value
is exactly 0, rather than values in the range Cranelift lowers toImm12
: constants with negated valuesval
:(val < 0xfff) || (val < 0xfff_000 && (val & 0xfff == 0))
.For example, adding a constant -1 to a value should always be able to be lowered to subtracting an immediate 1 from the value, but as shown in the test cases below, clif only uses
sub
for i8 and i64 in this case and otherwise keeps amov
andadd
.What I think is the bug
As discussed in https://github.com/bytecodealliance/wasmtime/issues/5700, Clif semantics are that narrow (i8, i64, i32)
iconst
are stored in an underlyingu64
that is zero-extended, not sign-extended, from the narrow bits.
imm12_from_negated_value
negates this zero-extended value as an i64 then checks whether that fits into an immediate:fn imm12_from_negated_u64(&mut self, n: u64) -> Option<Imm12> { Imm12::maybe_from_u64((n as i64).wrapping_neg() as u64) }
Any narrow negative constant, after being zero extended, has left-filled zeros, so negating the value no longer produces a constant that is unsigned
< 0xfff
or< 0xfff_000
.When handling narrow iconst values,
imm12_from_negated_value
should negate the narrow value then zero extend in this case. In general, ISLE terms operating on theu64
may need to see theu32
,u16
, etc, instead. I'll work on a PR for at least fixingimm12_from_negated_value
.
.clif
Test Case
imm12-64.clif
test interpret test run target aarch64 function %a(i64) -> i64 { block0(v0: i64): v1 = iconst.i64 -1 v3 = iadd v0, v1 return v3 } ; run: %a(0) == -1
Because the
.clif
parser currently sign-extends instead of zero-extends values, for narrow constants I use the decimal -1 value as an unsigned u64 to construct the iconst:
imm12-32.clif
test interpret test run target aarch64 function %a(i32) -> i32 { block0(v0: i32): v1 = iconst.i32 4294967295 v3 = iadd v0, v1 return v3 } ; run: %a(0) == -1
Steps to Reproduce
cargo run -- compile --target aarch64 -D imm12-64.clif cargo run -- compile --target aarch64 -D imm12-32.clif cargo run -- compile --target aarch64 -D imm12-16.clif cargo run -- compile --target aarch64 -D imm12-8.clif
Expected Results
This function gets rewritten to:
sub x0, x0, #1 ret
in all cases.
Actual Results
cargo run -- compile --target aarch64 -D imm12-64.clif .byte 0, 4, 0, 209, 192, 3, 95, 214 Disassembly of 8 bytes: 0: 00 04 00 d1 sub x0, x0, #1 <-- good use of immediate 4: c0 03 5f d6 ret cargo run -- compile --target aarch64 -D imm12-32.clif .byte 2, 0, 128, 18, 0, 0, 2, 11, 192, 3, 95, 214 Disassembly of 12 bytes: 0: 02 00 80 12 mov w2, #-1 4: 00 00 02 0b add w0, w0, w2 <-- no immediate 8: c0 03 5f d6 ret cargo run -- compile --target aarch64 -D imm12-16.clif .byte 226, 255, 159, 82, 0, 0, 2, 11, 192, 3, 95, 214 Disassembly of 12 bytes: 0: e2 ff 9f 52 mov w2, #0xffff 4: 00 00 02 0b add w0, w0, w2 <-- no immediate 8: c0 03 5f d6 ret cargo run -- compile --target aarch64 -D imm12-8.clif .byte 0, 252, 3, 17, 192, 3, 95, 214 Disassembly of 8 bytes: 0: 00 fc 03 11 add w0, w0, #0xff <-- good use of immediate because #0xff is small 4: c0 03 5f d6 ret
Versions and Environment
Cranelift version or commit: c4a2c1e81893a41b363ba135b9fd79fb77eb0531
Operating system: macOS
Architecture: aarch64
jameysharp labeled issue #5903:
Our prototype verifier seems to have found a completeness/performance (not correctness) bug in the handling of some narrow immediate values in
aarch64
.
aarch64
has the following rules that check if the _negated_ value of a constant fits in an immediate (Imm12
), doing a corresponding switch between add and subtract:;; Same as the previous special case, except we can switch the subtraction to an ;; addition if the negated immediate fits in 12 bits. (rule 2 (lower (has_type (fits_in_64 ty) (isub x (imm12_from_negated_value y)))) (add_imm ty x y)) ;; Same as the previous special cases, except we can switch the addition to a ;; subtraction if the negated immediate fits in 12 bits. (rule 2 (lower (has_type (fits_in_64 ty) (iadd x (imm12_from_negated_value y)))) (sub_imm ty x y)) (rule 3 (lower (has_type (fits_in_64 ty) (iadd (imm12_from_negated_value x) y))) (sub_imm ty y x))
These rules should work for all
fits_in_64
types (i8, i16, i32, i64), but they seem to only work as intended for i64 (they aren't stricly needed for i8, since the versions of these rules without negation trigger there since i8 values always fit inImm12
).In particular, our prototype verifier found that for i16 and i32, these rules _only_ apply if the operand extracted with
imm12_from_negated_value
is exactly 0, rather than values in the range Cranelift lowers toImm12
: constants with negated valuesval
:(val < 0xfff) || (val < 0xfff_000 && (val & 0xfff == 0))
.For example, adding a constant -1 to a value should always be able to be lowered to subtracting an immediate 1 from the value, but as shown in the test cases below, clif only uses
sub
for i8 and i64 in this case and otherwise keeps amov
andadd
.What I think is the bug
As discussed in https://github.com/bytecodealliance/wasmtime/issues/5700, Clif semantics are that narrow (i8, i64, i32)
iconst
are stored in an underlyingu64
that is zero-extended, not sign-extended, from the narrow bits.
imm12_from_negated_value
negates this zero-extended value as an i64 then checks whether that fits into an immediate:fn imm12_from_negated_u64(&mut self, n: u64) -> Option<Imm12> { Imm12::maybe_from_u64((n as i64).wrapping_neg() as u64) }
Any narrow negative constant, after being zero extended, has left-filled zeros, so negating the value no longer produces a constant that is unsigned
< 0xfff
or< 0xfff_000
.When handling narrow iconst values,
imm12_from_negated_value
should negate the narrow value then zero extend in this case. In general, ISLE terms operating on theu64
may need to see theu32
,u16
, etc, instead. I'll work on a PR for at least fixingimm12_from_negated_value
.
.clif
Test Case
imm12-64.clif
test interpret test run target aarch64 function %a(i64) -> i64 { block0(v0: i64): v1 = iconst.i64 -1 v3 = iadd v0, v1 return v3 } ; run: %a(0) == -1
Because the
.clif
parser currently sign-extends instead of zero-extends values, for narrow constants I use the decimal -1 value as an unsigned u64 to construct the iconst:
imm12-32.clif
test interpret test run target aarch64 function %a(i32) -> i32 { block0(v0: i32): v1 = iconst.i32 4294967295 v3 = iadd v0, v1 return v3 } ; run: %a(0) == -1
Steps to Reproduce
cargo run -- compile --target aarch64 -D imm12-64.clif cargo run -- compile --target aarch64 -D imm12-32.clif cargo run -- compile --target aarch64 -D imm12-16.clif cargo run -- compile --target aarch64 -D imm12-8.clif
Expected Results
This function gets rewritten to:
sub x0, x0, #1 ret
in all cases.
Actual Results
cargo run -- compile --target aarch64 -D imm12-64.clif .byte 0, 4, 0, 209, 192, 3, 95, 214 Disassembly of 8 bytes: 0: 00 04 00 d1 sub x0, x0, #1 <-- good use of immediate 4: c0 03 5f d6 ret cargo run -- compile --target aarch64 -D imm12-32.clif .byte 2, 0, 128, 18, 0, 0, 2, 11, 192, 3, 95, 214 Disassembly of 12 bytes: 0: 02 00 80 12 mov w2, #-1 4: 00 00 02 0b add w0, w0, w2 <-- no immediate 8: c0 03 5f d6 ret cargo run -- compile --target aarch64 -D imm12-16.clif .byte 226, 255, 159, 82, 0, 0, 2, 11, 192, 3, 95, 214 Disassembly of 12 bytes: 0: e2 ff 9f 52 mov w2, #0xffff 4: 00 00 02 0b add w0, w0, w2 <-- no immediate 8: c0 03 5f d6 ret cargo run -- compile --target aarch64 -D imm12-8.clif .byte 0, 252, 3, 17, 192, 3, 95, 214 Disassembly of 8 bytes: 0: 00 fc 03 11 add w0, w0, #0xff <-- good use of immediate because #0xff is small 4: c0 03 5f d6 ret
Versions and Environment
Cranelift version or commit: c4a2c1e81893a41b363ba135b9fd79fb77eb0531
Operating system: macOS
Architecture: aarch64
avanhatt commented on issue #5903:
I'll open a PR for adding these as file tests, and try out the change @jameysharp suggested to use
i64_sextend_imm64
.@jameysharp, do you think that the change to the rule shouldn't land until https://github.com/bytecodealliance/wasmtime/issues/3059 changes? Or just that this is more motivation to address #3059?
jameysharp commented on issue #5903:
I had to think about that question, but my conclusion is that this is only more motivation to address #3059 and maybe find more cases like this. Since the
i64_sextend_imm64
constructor entirely ignores the upper bits, if we start using that in these rules then these rules won't care whether the CLIF verifier checks this property or not.
cfallin closed issue #5903:
Our prototype verifier seems to have found a completeness/performance (not correctness) bug in the handling of some narrow immediate values in
aarch64
.
aarch64
has the following rules that check if the _negated_ value of a constant fits in an immediate (Imm12
), doing a corresponding switch between add and subtract:;; Same as the previous special case, except we can switch the subtraction to an ;; addition if the negated immediate fits in 12 bits. (rule 2 (lower (has_type (fits_in_64 ty) (isub x (imm12_from_negated_value y)))) (add_imm ty x y)) ;; Same as the previous special cases, except we can switch the addition to a ;; subtraction if the negated immediate fits in 12 bits. (rule 2 (lower (has_type (fits_in_64 ty) (iadd x (imm12_from_negated_value y)))) (sub_imm ty x y)) (rule 3 (lower (has_type (fits_in_64 ty) (iadd (imm12_from_negated_value x) y))) (sub_imm ty y x))
These rules should work for all
fits_in_64
types (i8, i16, i32, i64), but they seem to only work as intended for i64 (they aren't stricly needed for i8, since the versions of these rules without negation trigger there since i8 values always fit inImm12
).In particular, our prototype verifier found that for i16 and i32, these rules _only_ apply if the operand extracted with
imm12_from_negated_value
is exactly 0, rather than values in the range Cranelift lowers toImm12
: constants with negated valuesval
:(val < 0xfff) || (val < 0xfff_000 && (val & 0xfff == 0))
.For example, adding a constant -1 to a value should always be able to be lowered to subtracting an immediate 1 from the value, but as shown in the test cases below, clif only uses
sub
for i8 and i64 in this case and otherwise keeps amov
andadd
.What I think is the bug
As discussed in https://github.com/bytecodealliance/wasmtime/issues/5700, Clif semantics are that narrow (i8, i64, i32)
iconst
are stored in an underlyingu64
that is zero-extended, not sign-extended, from the narrow bits.
imm12_from_negated_value
negates this zero-extended value as an i64 then checks whether that fits into an immediate:fn imm12_from_negated_u64(&mut self, n: u64) -> Option<Imm12> { Imm12::maybe_from_u64((n as i64).wrapping_neg() as u64) }
Any narrow negative constant, after being zero extended, has left-filled zeros, so negating the value no longer produces a constant that is unsigned
< 0xfff
or< 0xfff_000
.When handling narrow iconst values,
imm12_from_negated_value
should negate the narrow value then zero extend in this case. In general, ISLE terms operating on theu64
may need to see theu32
,u16
, etc, instead. I'll work on a PR for at least fixingimm12_from_negated_value
.
.clif
Test Case
imm12-64.clif
test interpret test run target aarch64 function %a(i64) -> i64 { block0(v0: i64): v1 = iconst.i64 -1 v3 = iadd v0, v1 return v3 } ; run: %a(0) == -1
Because the
.clif
parser currently sign-extends instead of zero-extends values, for narrow constants I use the decimal -1 value as an unsigned u64 to construct the iconst:
imm12-32.clif
test interpret test run target aarch64 function %a(i32) -> i32 { block0(v0: i32): v1 = iconst.i32 4294967295 v3 = iadd v0, v1 return v3 } ; run: %a(0) == -1
Steps to Reproduce
cargo run -- compile --target aarch64 -D imm12-64.clif cargo run -- compile --target aarch64 -D imm12-32.clif cargo run -- compile --target aarch64 -D imm12-16.clif cargo run -- compile --target aarch64 -D imm12-8.clif
Expected Results
This function gets rewritten to:
sub x0, x0, #1 ret
in all cases.
Actual Results
cargo run -- compile --target aarch64 -D imm12-64.clif .byte 0, 4, 0, 209, 192, 3, 95, 214 Disassembly of 8 bytes: 0: 00 04 00 d1 sub x0, x0, #1 <-- good use of immediate 4: c0 03 5f d6 ret cargo run -- compile --target aarch64 -D imm12-32.clif .byte 2, 0, 128, 18, 0, 0, 2, 11, 192, 3, 95, 214 Disassembly of 12 bytes: 0: 02 00 80 12 mov w2, #-1 4: 00 00 02 0b add w0, w0, w2 <-- no immediate 8: c0 03 5f d6 ret cargo run -- compile --target aarch64 -D imm12-16.clif .byte 226, 255, 159, 82, 0, 0, 2, 11, 192, 3, 95, 214 Disassembly of 12 bytes: 0: e2 ff 9f 52 mov w2, #0xffff 4: 00 00 02 0b add w0, w0, w2 <-- no immediate 8: c0 03 5f d6 ret cargo run -- compile --target aarch64 -D imm12-8.clif .byte 0, 252, 3, 17, 192, 3, 95, 214 Disassembly of 8 bytes: 0: 00 fc 03 11 add w0, w0, #0xff <-- good use of immediate because #0xff is small 4: c0 03 5f d6 ret
Versions and Environment
Cranelift version or commit: c4a2c1e81893a41b363ba135b9fd79fb77eb0531
Operating system: macOS
Architecture: aarch64
Last updated: Dec 23 2024 at 13:07 UTC