abrown opened issue #10200:
This issue outlines two problems I encountered adding new assembler instructions:
- to match
capstone's pretty-printing, we must distinguish between signed and unsigned immediates, both of which can be sign-extended (!)- to avoid a semantic mismatch at the ISLE level, the assembler must clearly differentiate between signed and unsigned immediates with the same representation (@alexcrichton suggested using different types).
Taken together, these two problems make it difficult to find a solution that satisfies both requirements. Let me explain:
capstonepretty-prints immediates differently per instruction. The x64addandandgroups both have instructions that sign-extend a 32-bit immediate into a 64-bit one before the operation. Theaddoutput prints like a signed integer, but theandprints like an unsigned integer:let add = inst::addq_i_sxl::new(Imm32::new(0xd7f247b5)); println!("{add}"); > addq $-0x280db84b, %rax let and = inst::andq_i_sxl::new(Imm32::new(0xd7f247b5)); println!("{and}"); > andq $0xffffffffd7f247b5, %raxThis is probably due to
capstoneunderstanding thataddis arithmetic andandis logical — makes sense, right? One solution to properly match whatcapstoneprints is to add a newsimm*form to the DSL: for sign-extending instructions,addwould get thesimm*form and print the signed integer ($-0x...),andwould get the currentimm*form and print the unsigned integer ($0xffff...)... just extended to the right width. (There are other solutions here, like switching to XED which prints both forms as unsigned integers, but we may not be ready for that just yet).But what about problem 2? @alexcrichton was concerned that if we don't differentiate the immediate type that the assembly instruction takes, we could try to pass in bit-equivalent values to these sign-extending instructions but then have unexpected effects when they are sign-extended; e.g., we pass in
254u8to one of these instructions but it gets treated as-2i8and sign-extended to-2i64. We added this comment to track this:Problem 1 and problem 2 interfere: if we choose to represent the
addoperand withsimm*as suggested above, the instruction can accept a newSimm*type at the CLIF level that makes it clear that we accept a signed integer and that this will be sign-extended — all is well. But, theandoperand would still beimm*, accepting anImm*type, and still confusing the user at the CLIF level, as @alexcrichton was worried would happen. There are several solutions here, but none that I really like, so I'll just describe the problem for now.
alexcrichton commented on issue #10200:
Personally I feel like we should prioritize the representation of the types of the immediates to ensure it minimizes errors and is easy to use. Matching capstone exactly seems like something where we might want to instead engineer the test suite/fuzzing to remove that necessity.
One possible option with that is to rework tests to (a) generate an arbitrary Inst, (b) convert Inst to binary, (c) print the Inst and use a different assembler to convert to binary (maybe
llvm-as? maybe justas?), and finally (d) assert the binary is the same. That means that our exact printed format won't necessarily be the same as any other tool, but it does mean that what we print is accepted by a tool.
abrown closed issue #10200:
This issue outlines two problems I encountered adding new assembler instructions:
- to match
capstone's pretty-printing, we must distinguish between signed and unsigned immediates, both of which can be sign-extended (!)- to avoid a semantic mismatch at the ISLE level, the assembler must clearly differentiate between signed and unsigned immediates with the same representation (@alexcrichton suggested using different types).
Taken together, these two problems make it difficult to find a solution that satisfies both requirements. Let me explain:
capstonepretty-prints immediates differently per instruction. The x64addandandgroups both have instructions that sign-extend a 32-bit immediate into a 64-bit one before the operation. Theaddoutput prints like a signed integer, but theandprints like an unsigned integer:let add = inst::addq_i_sxl::new(Imm32::new(0xd7f247b5)); println!("{add}"); > addq $-0x280db84b, %rax let and = inst::andq_i_sxl::new(Imm32::new(0xd7f247b5)); println!("{and}"); > andq $0xffffffffd7f247b5, %raxThis is probably due to
capstoneunderstanding thataddis arithmetic andandis logical — makes sense, right? One solution to properly match whatcapstoneprints is to add a newsimm*form to the DSL: for sign-extending instructions,addwould get thesimm*form and print the signed integer ($-0x...),andwould get the currentimm*form and print the unsigned integer ($0xffff...)... just extended to the right width. (There are other solutions here, like switching to XED which prints both forms as unsigned integers, but we may not be ready for that just yet).But what about problem 2? @alexcrichton was concerned that if we don't differentiate the immediate type that the assembly instruction takes, we could try to pass in bit-equivalent values to these sign-extending instructions but then have unexpected effects when they are sign-extended; e.g., we pass in
254u8to one of these instructions but it gets treated as-2i8and sign-extended to-2i64. We added this comment to track this:Problem 1 and problem 2 interfere: if we choose to represent the
addoperand withsimm*as suggested above, the instruction can accept a newSimm*type at the CLIF level that makes it clear that we accept a signed integer and that this will be sign-extended — all is well. But, theandoperand would still beimm*, accepting anImm*type, and still confusing the user at the CLIF level, as @alexcrichton was worried would happen. There are several solutions here, but none that I really like, so I'll just describe the problem for now.
Last updated: Dec 13 2025 at 19:03 UTC