jameysharp commented on issue #5423:
The current codegen suppresses some useful Rust compiler diagnostics by emitting expressions like
18446744073709551615i128 as u64
. Is that constant being silently truncated to 64 bits? If it were written as just18446744073709551615
and used in a context where its type is inferred to beu64
, then Rust would tell us if the constant was nonsense. For anyone reading the generated code, emitting0xffffffffffffffff
instead would help answer that question. But we don't generally expect people to do that, so getting the Rust compiler to check for us is better.I can confirm we don't have any constants that are getting silently truncated right now. As you guessed, I didn't implement the extra type-casting in my new codegen, and even then Rust didn't complain about the magnitude of constants in current backends. It wouldn't be hard to emit the same kind of casts that ISLE currently uses, but I think it's worth having a compile-time check for this kind of mistake.
This PR is because in addition to checking for too-big constants, Rust also rejects negation on unsigned constants. I don't think that's so useful to check at compile time, but it doesn't look like it's a feature that's been used much at all in backends, so I figured it wouldn't be a big deal to lose it.
I did initially write this patch with
0xFFFF_FFFF_FFFF_FFFF
and0xFFFF_FF80
and so on. I liked that writingu64_sub(0, 128)
instead makes explicit that the result is supposed to be au64
. But there are other ways to fix this.
signed_min
andsigned_max
are used exactly once each, like(imm $I32 (ImmExtend.Sign) (signed_max out_ty))
. How would you feel about pushing the call toimm
insidesigned_min
, so the sign-extension becomes part of constructing the immediate?(rule (signed_min $I8) (imm $I8 (ImmExtend.Sign) 0x80)) (rule (signed_min $I16) (imm $I16 (ImmExtend.Sign) 0x8000))
Similarly, the immediate constructed from -1 could be something like:
(imm $I8 (ImmExtend.Sign) 0xFF)
cfallin commented on issue #5423:
That all makes sense, thanks -- I agree that finding a way to get compile-time checks on constant magnitudes would be ideal. re: this:
This PR is because in addition to checking for too-big constants, Rust also rejects negation on unsigned constants. I don't think that's so useful to check at compile time, but it doesn't look like it's a feature that's been used much at all in backends, so I figured it wouldn't be a big deal to lose it.
I actually do think that allowing negative constants is important, and papering over their lack by defining certain external constructors is not ideal -- do we do that each time we need another negative value? The fact that we do use negative constants in a few places now is evidence to me that they're useful, and they may come up in other situations later as well...
I wonder whether another approach might not be better: why not give ISLE types a notion of bit-width, and check for out-of-bounds literals at the ISLE level? That seems strictly better in another way: it lifts the check to an explicitly-defined one in our DSL, rather than a side-effect of our target language. So something like
(type u64 (primitive u64 integer 64))
then when lowering a
ConstInt
to sema (at which point we know the type), we can check that (i) the expected type is actually an integer primitive, and not some other primitive; and (ii) it's within range. Then we can decide what the acceptable range is: e.g., we could say that we accept -2^(bits-1) .. 2^bits, or the union of the range of signed and unsigned values of the given width. Thoughts?
cfallin commented on issue #5423:
e.g., we could say that we accept -2^(bits-1) .. 2^bits, or the union of the range of signed and unsigned values of the given width.
although now that I've written this out, I'm starting to dislike it more; the asymmetry just feels weird and wrong.
Alternative: lift a bit more of the integer type system into ISLE. Namely
(type u64 (primitive u64 integer 64 unsigned)) (type i64 (primitive i64 integer 64 signed))
such that in a
u64
context we accept literals in 0..2^64 and in ani64
context we accept -2^63..2^63. Then we also add a builtin(imm $I64 (u64 -1:i64)) ;; sketching some syntax, please bikeshed something better
where
u64
used in term position is a cast (compiles toas u64
, but also works with typechecking at ISLE level) and:i64
on a constant sets the type of the constant.This is all a bit uncertain but I feel there's a better system for constants, and for integer types, buried in here somewhere...
cfallin edited a comment on issue #5423:
e.g., we could say that we accept -2^(bits-1) .. 2^bits, or the union of the range of signed and unsigned values of the given width.
although now that I've written this out, I'm starting to dislike it more; the asymmetry just feels weird and wrong.
Alternative: lift a bit more of the integer type system into ISLE. Namely
(type u64 (primitive u64 integer 64 unsigned)) (type i64 (primitive i64 integer 64 signed))
such that in a
u64
context we accept literals in 0..2^64 and in ani64
context we accept -2^63..2^63. Then we also add some syntax(imm $I64 (u64 -1:i64)) ;; sketching some syntax, please bikeshed something better
where
u64
used in term position is a cast (compiles toas u64
, but also works with typechecking at ISLE level) and:i64
on a constant sets the type of the constant.This is all a bit uncertain but I feel there's a better system for constants, and for integer types, buried in here somewhere...
jameysharp commented on issue #5423:
I think it'd be great to give ISLE a little more knowledge about primitive numeric types. My codegen prototype is currently checking whether the name of an integer constant's type starts with "i" to decide how to treat negative constants, which certainly isn't ideal.
But also, I don't want to build that out right now. What do you think of the alternative patch I've just pushed here?
Last updated: Dec 23 2024 at 12:05 UTC