abrown opened issue #10200:
This issue outlines two problems I encountered adding new assembler instructions:
- to match
capstone
's pretty-printing, we must distinguish between signed and unsigned immediates, both of which can be sign-extended (!)- to avoid a semantic mismatch at the ISLE level, the assembler must clearly differentiate between signed and unsigned immediates with the same representation (@alexcrichton suggested using different types).
Taken together, these two problems make it difficult to find a solution that satisfies both requirements. Let me explain:
capstone
pretty-prints immediates differently per instruction. The x64add
andand
groups both have instructions that sign-extend a 32-bit immediate into a 64-bit one before the operation. Theadd
output prints like a signed integer, but theand
prints like an unsigned integer:let add = inst::addq_i_sxl::new(Imm32::new(0xd7f247b5)); println!("{add}"); > addq $-0x280db84b, %rax let and = inst::andq_i_sxl::new(Imm32::new(0xd7f247b5)); println!("{and}"); > andq $0xffffffffd7f247b5, %rax
This is probably due to
capstone
understanding thatadd
is arithmetic andand
is logical — makes sense, right? One solution to properly match whatcapstone
prints is to add a newsimm*
form to the DSL: for sign-extending instructions,add
would get thesimm*
form and print the signed integer ($-0x...
),and
would get the currentimm*
form and print the unsigned integer ($0xffff...
)... just extended to the right width. (There are other solutions here, like switching to XED which prints both forms as unsigned integers, but we may not be ready for that just yet).But what about problem 2? @alexcrichton was concerned that if we don't differentiate the immediate type that the assembly instruction takes, we could try to pass in bit-equivalent values to these sign-extending instructions but then have unexpected effects when they are sign-extended; e.g., we pass in
254u8
to one of these instructions but it gets treated as-2i8
and sign-extended to-2i64
. We added this comment to track this:Problem 1 and problem 2 interfere: if we choose to represent the
add
operand withsimm*
as suggested above, the instruction can accept a newSimm*
type at the CLIF level that makes it clear that we accept a signed integer and that this will be sign-extended — all is well. But, theand
operand would still beimm*
, accepting anImm*
type, and still confusing the user at the CLIF level, as @alexcrichton was worried would happen. There are several solutions here, but none that I really like, so I'll just describe the problem for now.
alexcrichton commented on issue #10200:
Personally I feel like we should prioritize the representation of the types of the immediates to ensure it minimizes errors and is easy to use. Matching capstone exactly seems like something where we might want to instead engineer the test suite/fuzzing to remove that necessity.
One possible option with that is to rework tests to (a) generate an arbitrary Inst, (b) convert Inst to binary, (c) print the Inst and use a different assembler to convert to binary (maybe
llvm-as
? maybe justas
?), and finally (d) assert the binary is the same. That means that our exact printed format won't necessarily be the same as any other tool, but it does mean that what we print is accepted by a tool.
abrown closed issue #10200:
This issue outlines two problems I encountered adding new assembler instructions:
- to match
capstone
's pretty-printing, we must distinguish between signed and unsigned immediates, both of which can be sign-extended (!)- to avoid a semantic mismatch at the ISLE level, the assembler must clearly differentiate between signed and unsigned immediates with the same representation (@alexcrichton suggested using different types).
Taken together, these two problems make it difficult to find a solution that satisfies both requirements. Let me explain:
capstone
pretty-prints immediates differently per instruction. The x64add
andand
groups both have instructions that sign-extend a 32-bit immediate into a 64-bit one before the operation. Theadd
output prints like a signed integer, but theand
prints like an unsigned integer:let add = inst::addq_i_sxl::new(Imm32::new(0xd7f247b5)); println!("{add}"); > addq $-0x280db84b, %rax let and = inst::andq_i_sxl::new(Imm32::new(0xd7f247b5)); println!("{and}"); > andq $0xffffffffd7f247b5, %rax
This is probably due to
capstone
understanding thatadd
is arithmetic andand
is logical — makes sense, right? One solution to properly match whatcapstone
prints is to add a newsimm*
form to the DSL: for sign-extending instructions,add
would get thesimm*
form and print the signed integer ($-0x...
),and
would get the currentimm*
form and print the unsigned integer ($0xffff...
)... just extended to the right width. (There are other solutions here, like switching to XED which prints both forms as unsigned integers, but we may not be ready for that just yet).But what about problem 2? @alexcrichton was concerned that if we don't differentiate the immediate type that the assembly instruction takes, we could try to pass in bit-equivalent values to these sign-extending instructions but then have unexpected effects when they are sign-extended; e.g., we pass in
254u8
to one of these instructions but it gets treated as-2i8
and sign-extended to-2i64
. We added this comment to track this:Problem 1 and problem 2 interfere: if we choose to represent the
add
operand withsimm*
as suggested above, the instruction can accept a newSimm*
type at the CLIF level that makes it clear that we accept a signed integer and that this will be sign-extended — all is well. But, theand
operand would still beimm*
, accepting anImm*
type, and still confusing the user at the CLIF level, as @alexcrichton was worried would happen. There are several solutions here, but none that I really like, so I'll just describe the problem for now.
Last updated: Feb 28 2025 at 03:10 UTC