abrown opened PR #10110 from abrown:assembler-upstream
to bytecodealliance:main
:
This is a first step to providing an external assembler for
cranelift-codegen
as described in https://github.com/bytecodealliance/wasmtime/issues/41. Each commit has further details, but the summary is that this direction would eventually move all assembler logic out into crates designed for easier checking (e.g., fuzzing).
abrown updated PR #10110.
abrown updated PR #10110.
abrown updated PR #10110.
abrown updated PR #10110.
abrown updated PR #10110.
abrown updated PR #10110.
abrown submitted PR review.
abrown created PR review comment:
@cfallin, I'm not a big fan of this
KnownOffsetTable
approach: it seems like we may want to just add appropriateCodeSink
/MachBuffer
methods to propagate this kind of thing.
abrown submitted PR review.
abrown created PR review comment:
Looking at this CLIF diffs, I see some potential problems: removed instructions, different immediates, potential
and
->or
miscompilations... This bears another look.
abrown updated PR #10110.
abrown updated PR #10110.
abrown edited PR review comment.
abrown created PR review comment:
(Pretty sure those
add
->or
"miscompilations" were just bits changing in the constant pool... not sure why they're changing, though).
abrown submitted PR review.
abrown requested cfallin for a review on PR #10110.
abrown has marked PR #10110 as ready for review.
abrown requested wasmtime-compiler-reviewers for a review on PR #10110.
abrown requested wasmtime-default-reviewers for a review on PR #10110.
cfallin submitted PR review:
This looks really really good -- thanks for spawning this idea and working to build it out!
Most of my feedback you already got offline as this was being built and the general shape is therefore what I'd expect and am overall quite happy with. A bunch of nits below but nothing major -- let's get this merged and we can iterate and start the migration process.
Thanks again!
cfallin created PR review comment:
Could we add doc-comments here? Most are pretty obvious (
name
is indeed the name) but our general style is for all pub types to have docs. I guess documenting some of the specifics could be helpful too: does "name" mean what you would write in an assembler, or is it a unique name for a particular variant of an instruction?
cfallin created PR review comment:
It looks like we're a little inconsistent across our crates but I think we can do:
version.workspace = true edition.workspace = true rust-version.workspace = true
to inherit a bit more from the top-level.
cfallin created PR review comment:
earlier than... Cranelift execution? compiling generated Rust code from the DSL?
cfallin created PR review comment:
Is there a part of the manual we can add a reference to that describes the validity conditions here? (I guess some are implicit e.g. in the description of
r
anddigit
implying they're mutually exclusive?)
cfallin created PR review comment:
slightly confusing error message -- I first read it as "the bits (data content) we are encoding must match some operand's actual value", rather than "bits" in the sense of "width". Maybe "for an immediate, the encoding's expected immediate width must match the declared operand's width"?
cfallin created PR review comment:
Can we have doc-comments on the builder methods? Some of them are fairly obvious (e.g.
prefix
) but it would be good to have one-sentence reminders on the others (w
is theREX.W
bit, etc) if only for LSP tooltips/hints as folks write the DSL...
cfallin created PR review comment:
Is this how the Intel manual writes the prefix? (It wouldn't surprise me if so but...) it's a bit confusing in a little-endian world: naively I'd expect
0x66F0
to serialize asF0 66 ...
when it's the other way around. If we're not otherwise constrained, can we write it bytewise, e.g. `0x66 0xF0 + "?
cfallin created PR review comment:
I was surprised not to see
0F
when I got here -- so my earlier thought about how two-byte instructions are encoded was wrong. Is the plan to add0F
when we eventually need to encode e.g.0F 1B
, or will we expand the opcode to a u16 or u32 with length?
cfallin created PR review comment:
Could we put this const slice alongside the
Feature
enum definition so it's easier to update/keep in sync?
cfallin created PR review comment:
We should note here that "multi-byte opcodes" (e.g.
0F 1B
forud2
) are handled by defining0F
to be a prefix and then calling the following byteopcode
(not obvious to a casual observer otherwise how this might work).
cfallin created PR review comment:
Are we planning to support encoding 32-bit x86 with this assembler as well eventually? (I think that'd be great if so, not that we need it in the short term) If so, do we need to make any of the descriptions over in
Rex
more general regarding size prefixes, availability of the REX prefix byte itself, etc? Also are we going to gate register availability (e.g. the top 8 of 16) on features?
cfallin created PR review comment:
Can we add a sentence here describing how this fits into the instruction definitions -- e.g., every instruction has a format, and the format must correspond to the specific instruction encoding's expected operands?
cfallin created PR review comment:
minor pedantry I guess but does the manual describe it as "default"? My 2025-era x86 CPU still boots up thinking it's an 8086 in 16-bit real mode...
"standard mode" or "expected mode of operation" maybe?
cfallin created PR review comment:
You've probably thought about this already but just to check: is there any way we could share this code directly? E.g. depend on
cranelift-codegen-meta
, or factor out another crate (cranelift/meta-utils
), or something like that?
cfallin created PR review comment:
minor thing but could we put these on separate lines?
cfallin created PR review comment:
What's the reason for the commented-out variant here -- could we add a comment describing why in the code? Or does it still need to be disabled?
cfallin created PR review comment:
Is this for testing? If so can we add a comment saying so? (The specificity of only 2-elem arrays otherwise is somewhat surprising)
cfallin created PR review comment:
Oh no! I think we all prefer Intel syntax and are kind of stuck with AT&T only because of legacy/inertia. It might be worth listing here or somewhere else why we're doing this for now and what the plan is: e.g. will we remove this swap and update all the compile test expectations once we've migrated all instructions to the new DSL?
cfallin created PR review comment:
Egads -- is there a way to configure Capstone to always print in decimal or hex? (Necessary evil otherwise, not your fault, just ugly)
cfallin created PR review comment:
Can we make this a newtype wrapper (
pub struct KnownOffset(usize)
) instead?
cfallin created PR review comment:
MinusRsp
evokes some sort of subtraction operation in my mind; maybeNonRspGpr
or something like that?
cfallin created PR review comment:
"from is" and ending with "are..." -- incomplete sentence?
cfallin created PR review comment:
Totally fine to do
Err(PccError::UnimplementedInst)
for now -- I'll need to go and update this once we've moved instructions over that PCC cares about, but PCC isn't fuzzed at the moment anyway until I can make an update/fix pass...
cfallin created PR review comment:
Perhaps! Were you thinking we'd have specific trait methods for e.g.
incoming_arg
andslot_offset
? Or that we'd still be generic over the set of known offsets but have a query method?I'm totally fine with the latter, in fact it's probably even cleaner; for the former I'd be concerned about baking details of Cranelift's ABI impl into the lower-level library and would still want to follow the analogy of a real assembler letting one define symbolic constants and use them with specific values plugged in. Happy to look at proposed diff if you have one.
abrown submitted PR review.
abrown created PR review comment:
I don't think any of the Cranelift crates use
version.workspace
; I assume that's because they exist within the overall Wasmtime workspace?
cfallin submitted PR review.
cfallin created PR review comment:
Ah right, I forgot about the Wasmtime-vs-Cranelift-version distinction; you're right, thanks!
abrown updated PR #10110.
abrown submitted PR review.
abrown created PR review comment:
This condition is in section 2.1.2 of the "Instruction Format" chapter of volume 2 of, etc., etc.: "an additional 3-bit opcode field is sometimes encoded..." That chapter may be a good start and I'll link it here but, honestly, there's a lot there and it isn't always easy to pull out all the various conditions. I just started with some pretty standard ones here to catch trivial mistakes early; the real safety is in fuzzing against a known-good disassembler.
abrown submitted PR review.
abrown created PR review comment:
There are no hex hints: e.g.,
66 0F 54 /r ANDPD xmm1, xmm2/m128
. I'll switch this, retaining the0x
hints.
abrown submitted PR review.
abrown created PR review comment:
Yeah, I just didn't need it at the moment. I would say that it could be completely valid to also expand the opcode width, but let's see what we think when we get to those instructions.
abrown updated PR #10110.
abrown submitted PR review.
abrown created PR review comment:
Yeah, true 32-bit support requires a bit more work. Since I had received the request to at least consider it, though, I thought that the first and quite crucial step is to set up instructions to be gated on 32-bit support.
abrown submitted PR review.
abrown created PR review comment:
I agree, we should avoid Cranelift-izing this assembler as much as possible. I guess things I don't like about the known offset table approach is that (a) we rebuild this table for every emissions and (b) building it the way I'm doing it here feels quite fragile. How long before things get out of sync with the keys used during amode conversion?!
abrown created PR review comment:
Leaving it as a
type
enables us to just tag on theIndex<KnownOffset, ...>
to the trait and no implementation is needed for vectors and slices. Since I wasn't really in love with this part (see my initial comments above) I took a pretty minimal approach. Let's figure out the big picture for this and then we can decide whether to flesh this out more.
abrown submitted PR review.
abrown updated PR #10110.
abrown created PR review comment:
I don't think so... it all seems to be hidden away behind a variant in
ArchSyntax
. I suspect there's more configurability possible with a more x86-specific tool like XED but we're just not ready for that yet. The upside to the fuzzing approach here, though, is that the minute we switch to another disassembler that doesn't have this representation we will see fuzz bugs and be able to remove this code.
abrown submitted PR review.
abrown submitted PR review.
abrown created PR review comment:
I tried that initially but I see test failures?
FAIL filetests/filetests/pcc/succeed/imul-fuzzbug.clif: compile Caused by: Proof-carrying-code validation error: UnimplementedInst
To reproduce with the
Err(...)
version:cd cranelift && cargo test --test filetests
.
abrown submitted PR review.
abrown created PR review comment:
No
and
in that test but it's probably used in lowering some other instruction...
cfallin submitted PR review.
cfallin created PR review comment:
Oh, I had forgotten we also had that test-suite still active -- OK, no worries, this is fine (and leave the TODO) and I'll get to it in my update pass!
cfallin submitted PR review.
cfallin created PR review comment:
Yep, sounds good, it's at least well-tested!
abrown updated PR #10110.
abrown submitted PR review.
abrown created PR review comment:
Well, due to the ISLE bits we need
cranelift-codegen-meta
to depend oncranelift-assembler-x64-meta
, but I guess we could pull the formatting bits into a third crate they both depend on. I think at this point theFormatter
andfmtln!
logic has diverged a bit since on the assembler side I both simplified and then added interesting bits like the "append the location as a comment" logic.I'm fine with trying to re-synchronize this formatter code into a third crate though I think I should commit to doing that in a future PR to avoid re-working everything here. But another option is to use a pre-existing crate: a casual glance online suggests
quote
+prettyplease
or evengenco
could be an alternative. I feel like we're in an in-between place where we don't need the full power of something likequote
but now we would be re-inventing parts of it into a new crate...
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Question on this: how does this know that an unsigned value is desired? For example if you have
(a: u8) & -2
that means it would failis_imm8
, right? But the x64 instruction should be able to encode that?With the assembler immediates all annotated with if they're sign/zero-extended should there be
Simm8
andUimm8
and the extractors here would beis_simm8
andis_uimm8
?(also as a side note I find
u8::try_from(...)
handy in cases of conversion like this so you don't have to do anyas
casts yourself)
alexcrichton created PR review comment:
(sorry if this was already covered elsewhere) but how come these are changing from a 32-bit and to an 8-bit and?
alexcrichton created PR review comment:
This one looks like 3 bytes were shaved off the encoding (the 0x19 => 0x16 in the diff), are the new assembler methods perhaps slightly more optimal in encoding size?
alexcrichton created PR review comment:
If possible could the
Arbitrary
methods be left off? And theArbitrary
impl for each instruction havewhere R::ReadGpr: Arbitrary
?
alexcrichton commented on PR #10110:
(feel free to ignore my comments and respond after landing)
alexcrichton commented on PR #10110:
Oh one blocking thing if that's ok: this adds a hard dependency of cranelift on capstone which I believe wasn't previously there. Would it be possible to move that to a dev-dependency or a feature'd dependency?
abrown submitted PR review.
abrown created PR review comment:
There's a couple of issues that caused us to leave this disabled. There are instructions under the
and
family that, when aREX
prefix is prepended, end up sign-extending their operands. When this encoding is used, they have limitations on their registers: "In 64-bit mode, r/m8 can not be encoded to access the following byte registers if a REX prefix is used: AH, BH, CH, DH." Since we won't be using the high 8-bit slots here, we can probably disregard that, but that was one of the initial concerns. The second concern is that, in documenting these instructions, it is not clear if there are mistakes: some of theseREX + ...
indicate sign extension for certain instructions but not others (e.g., compareand
'sRM
format andor
'sRM
format. The third is that, with both Capstone and XED, we don't really see a difference in the disassembled string, so our fuzzing cannot check that we correctly emit one instruction and not the other. Currently thinking about what to do...
alexcrichton commented on PR #10110:
(I sent a PR for that change as https://github.com/abrown/wasmtime/pull/10)
abrown submitted PR review.
abrown created PR review comment:
Yeah, that's what I concluded.
abrown submitted PR review.
abrown created PR review comment:
I think I need to ask some questions internally about this part of the reference manual. I'll add a TODO here with a comment.
abrown updated PR #10110.
abrown submitted PR review.
abrown created PR review comment:
I'm a bit mystified: CLIF's
icmp
should return a truthyInt
which reduces down to an 8-bit value here:So those
band
instructions should always have lowered to anandb
? Perhaps the original encoding just modified a larger width...
abrown submitted PR review.
abrown created PR review comment:
Ok, I see what you mean. The assembler doesn't care too much about the sign, it just wants 8 bits, so I chose to use
u8
to pass those along. What I was trying to do here (incorrectly) is check that we can fit whatever was in thesimm32: i32
into those 8 bits.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
(I pushed up a commit for this over at https://github.com/abrown/wasmtime/pull/10)
alexcrichton submitted PR review.
alexcrichton created PR review comment:
I talked a bit more about this with @abrown and the reason for this is that Cranelift today clamps operand sizes to 32-bits in the
alu_rmi_r
constructor. That means that all 8-bitand
s are actually 32-bitand
s when emitted to x64. Andrew's PR is (IMO correctly) changing the 8-bit and to useandb
instead.@cfallin do you remember what
operand_size_of_type_32_64
was doing there? For example why clamp it to 32-bit? (and if so, are said reasons still applicable today?)
abrown submitted PR review.
abrown created PR review comment:
Thanks!
abrown updated PR #10110.
abrown updated PR #10110.
abrown updated PR #10110.
cfallin submitted PR review.
cfallin created PR review comment:
I suspect this is an artifact of the development path: initially we only had the 32- and 64-bit versions of the instruction available, and it's safe to implement 8-bit AND with 32-bit AND, so that's what we did.
I don't know about performance implications -- I do recall vaguely something about "partial-register stalls" if one updates only a part of the destination register but maybe it's fine if the instruction already depends on the register as an input. Assuming that's not a concern (@abrown confirm?) then this seems fine with me.
abrown submitted PR review.
abrown created PR review comment:
Hm, that _could_ be an issue. Maybe I should just leave the 32-bit versions in place in
inst.isle
just to be safe. I'll put a note there.
abrown updated PR #10110.
abrown submitted PR review.
abrown created PR review comment:
Ok, see 2c8a276.
abrown updated PR #10110.
abrown updated PR #10110.
abrown commented on PR #10110:
This latest CI failure is under QEMU-emulated s390x. Have we ever observed a segfault here before? I can't imagine this has much to do with the new
cranelift-assembler-x64
crate since that is excluded here...
abrown edited a comment on PR #10110:
This latest CI failure is under QEMU-emulated s390x. Have we ever observed a segfault here before? I can't imagine this has much to do with the new
cranelift-assembler-x64
crate since that is excluded here... I'll try re-running the failed jobs.
abrown edited a comment on PR #10110:
This latest CI failure is under QEMU-emulated s390x. Have we ever observed a segfault here before? I can't imagine this has much to do with the new
cranelift-assembler-x64
crate since that is excluded in that CI check... I'll try re-running the failed jobs.
abrown merged PR #10110.
Last updated: Feb 28 2025 at 02:27 UTC