elliottt commented on issue #5780:
The only additional feature I'd like to suggest is that I think we can get the blockN: labels back, and maybe add a comment with their byte offsets so we can more easily see where jumps refer to. MachBuffer::label_offsets is, if I'm reading this correctly, an array indexed by block number and holding the code offset of the first instruction of that block. Since branches may have been removed there can be multiple labels at the same offset, so before each instruction you can scan that array for all blocks at that offset.
As we were discussing, we don't have access to
label_offsets
inMachBufferFinalized
, which is unfortunate. Perhaps we can plumb that through in a future PR, as it would be really nice to get the block labels back.
uweigand commented on issue #5780:
I've been going through the remaining s390x changes. Looking at broad categories I see:
- One actual encoding bug in cranelift: the last byte of
ahy
is 0x7a, not 0x4a! This is wrong code-gen, I'll work on a fix.- A few places where the cranelift disassembly output doesn't match the official format (operands swapped etc.). Does not affect code-gen, but should still be fixed as long as we have our own disassembler.
capstone
apparently does not support any z15 or later instructions right now - this will need to be fixed there.- Even with all the wrong and missing encodings fixed, we still have inline constants. I think it would be better to use an out-of-line constant pool. This used not to be possible, but I think now we have the required infrastructure. The one thing I'm not certain is whether we can put constants with relocations in the pool right now.
- As mentioned above already, branch targets are now shown as plain offsets, but it is basically impossible to find out which instruction this refers to.
capstore
seems to print all immediates in hex, while the current disassembler uses decimal ...- We are no longer seeing any of cranelift-specific "virtual" instructions like
virtual_sp_offset_adjust
. Not sure if this is problem.- For some of the jump sequences, the new disassembler appears to display an optimized form, where the old one would show an intermediate form, e.g.
; clgr %r2, %r3 -; jge label1 ; jg label2 -; block1: -; jg label3 -; block2: -; jg label3 -; block3: ; lghi %r2, 1 ; br %r14
I guess my primary question would be to what extent we're willing to rely on the quality of the
capstone
library here. I'm not sure exactly who is writing / maintaining this -- looking at the web site, the last official release is as of nearly three years ago, and even in the github repo (assuming I'm looking at the correct one?) the last significant change to theSystemZ
directory is five years old, which presumably explains why it doesn't supportz15
yet. EvenX86
doesn't appear to be much more recent.The actual disassembly implementation in
capstone
seems to be done primarily via files copied from LLVM, so that should likely be correct as far as it goes. But if this isn't regularly updated, we could run into issues with using recent instructions on other architectures as well ...
bjorn3 commented on issue #5780:
Zydis is a disassembler for x86 with a focus on correctness (even knows the difference between how Intel and AMD cpus decode insts) and performance. It has official rust bindings: https://github.com/zyantific/zydis-rs (licensed under MIT)
BinaryNinja has an AArch64 disassembler generated from the actual ISA manual: https://binary.ninja/2021/04/05/groundup-aarch64.html, https://github.com/Vector35/arch-arm64/tree/master/disassembler (licensed under Apache-2.0) Not sure how easy it is reusable outside of BinaryNinja though.
elliottt commented on issue #5780:
After the discussion today, I've added a section to each test output that also includes the printed vcode. This will help spot differences between what we're assuming we're producing and what we're actually producing, as well as help identify where
capstone
is falling short.
Last updated: Jan 24 2025 at 00:11 UTC