wasmtime / issue #10200 asm: pretty-printing signed immed... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #10200 asm: pretty-printing signed immed...

Wasmtime GitHub notifications bot (Feb 06 2025 at 18:26):

This issue outlines two problems I encountered adding new assembler instructions:

to match capstone's pretty-printing, we must distinguish between signed and unsigned immediates, both of which can be sign-extended (!)

to avoid a semantic mismatch at the ISLE level, the assembler must clearly differentiate between signed and unsigned immediates with the same representation (@alexcrichton suggested using different types).

Taken together, these two problems make it difficult to find a solution that satisfies both requirements. Let me explain: capstone pretty-prints immediates differently per instruction. The x64 add and and groups both have instructions that sign-extend a 32-bit immediate into a 64-bit one before the operation. The add output prints like a signed integer, but the and prints like an unsigned integer:
let add = inst::addq_i_sxl::new(Imm32::new(0xd7f247b5));
println!("{add}");
> addq $-0x280db84b, %rax

let and = inst::andq_i_sxl::new(Imm32::new(0xd7f247b5));
println!("{and}");
> andq $0xffffffffd7f247b5, %rax
This is probably due to capstone understanding that add is arithmetic and and is logical — makes sense, right? One solution to properly match what capstone prints is to add a new simm* form to the DSL: for sign-extending instructions, add would get the simm* form and print the signed integer ($-0x...), and would get the current imm* form and print the unsigned integer ($0xffff...)... just extended to the right width. (There are other solutions here, like switching to XED which prints both forms as unsigned integers, but we may not be ready for that just yet).

But what about problem 2? @alexcrichton was concerned that if we don't differentiate the immediate type that the assembly instruction takes, we could try to pass in bit-equivalent values to these sign-extending instructions but then have unexpected effects when they are sign-extended; e.g., we pass in 254u8 to one of these instructions but it gets treated as -2i8 and sign-extended to -2i64. We added this comment to track this:

https://github.com/bytecodealliance/wasmtime/blob/d943d57e78950da21dd430e0847f3b8fd0ade073/cranelift/codegen/src/isa/x64/lower/isle.rs#L965-L978

Problem 1 and problem 2 interfere: if we choose to represent the add operand with simm* as suggested above, the instruction can accept a new Simm* type at the CLIF level that makes it clear that we accept a signed integer and that this will be sign-extended — all is well. But, the and operand would still be imm*, accepting an Imm* type, and still confusing the user at the CLIF level, as @alexcrichton was worried would happen. There are several solutions here, but none that I really like, so I'll just describe the problem for now.

Wasmtime GitHub notifications bot (Feb 06 2025 at 20:49):

alexcrichton commented on issue #10200:

Personally I feel like we should prioritize the representation of the types of the immediates to ensure it minimizes errors and is easy to use. Matching capstone exactly seems like something where we might want to instead engineer the test suite/fuzzing to remove that necessity.

One possible option with that is to rework tests to (a) generate an arbitrary Inst, (b) convert Inst to binary, (c) print the Inst and use a different assembler to convert to binary (maybe llvm-as? maybe just as?), and finally (d) assert the binary is the same. That means that our exact printed format won't necessarily be the same as any other tool, but it does mean that what we print is accepted by a tool.

Wasmtime GitHub notifications bot (Feb 13 2025 at 22:38):

abrown closed issue #10200:

This issue outlines two problems I encountered adding new assembler instructions:

to match capstone's pretty-printing, we must distinguish between signed and unsigned immediates, both of which can be sign-extended (!)

to avoid a semantic mismatch at the ISLE level, the assembler must clearly differentiate between signed and unsigned immediates with the same representation (@alexcrichton suggested using different types).

Taken together, these two problems make it difficult to find a solution that satisfies both requirements. Let me explain: capstone pretty-prints immediates differently per instruction. The x64 add and and groups both have instructions that sign-extend a 32-bit immediate into a 64-bit one before the operation. The add output prints like a signed integer, but the and prints like an unsigned integer:
let add = inst::addq_i_sxl::new(Imm32::new(0xd7f247b5));
println!("{add}");
> addq $-0x280db84b, %rax

let and = inst::andq_i_sxl::new(Imm32::new(0xd7f247b5));
println!("{and}");
> andq $0xffffffffd7f247b5, %rax
This is probably due to capstone understanding that add is arithmetic and and is logical — makes sense, right? One solution to properly match what capstone prints is to add a new simm* form to the DSL: for sign-extending instructions, add would get the simm* form and print the signed integer ($-0x...), and would get the current imm* form and print the unsigned integer ($0xffff...)... just extended to the right width. (There are other solutions here, like switching to XED which prints both forms as unsigned integers, but we may not be ready for that just yet).

But what about problem 2? @alexcrichton was concerned that if we don't differentiate the immediate type that the assembly instruction takes, we could try to pass in bit-equivalent values to these sign-extending instructions but then have unexpected effects when they are sign-extended; e.g., we pass in 254u8 to one of these instructions but it gets treated as -2i8 and sign-extended to -2i64. We added this comment to track this:

https://github.com/bytecodealliance/wasmtime/blob/d943d57e78950da21dd430e0847f3b8fd0ade073/cranelift/codegen/src/isa/x64/lower/isle.rs#L965-L978

Problem 1 and problem 2 interfere: if we choose to represent the add operand with simm* as suggested above, the instruction can accept a new Simm* type at the CLIF level that makes it clear that we accept a signed integer and that this will be sign-extended — all is well. But, the and operand would still be imm*, accepting an Imm* type, and still confusing the user at the CLIF level, as @alexcrichton was worried would happen. There are several solutions here, but none that I really like, so I'll just describe the problem for now.

Last updated: Apr 18 2025 at 04:04 UTC