bnjbvr opened PR #1956 from x64
to main
:
This makes it possible to run a basic C hello world with wasmtime. It's a large PR but the commit splitting should make it easier to review. I'll add emit tests for div/idiv before landing, but reviewing shouldn't be blocked on this.
bnjbvr requested cfallin and julian-seward1 for a review on PR #1956.
bnjbvr requested cfallin and julian-seward1 for a review on PR #1956.
cfallin submitted PR Review.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Maybe rename
JmpRel32
toPCRel32
(since this is not a jump)?
cfallin created PR Review Comment:
Perhaps
u32::try_from(offset).expect("rip-relative can't hold >= U32_MAX values")
?
cfallin created PR Review Comment:
Maybe add a clarifying note here that
off_into_table
is an addend and the label-use resolution adds the target-relative-to-this-entry to the addend so that the result is target-relative-to-JT-start?(I neglected to document this logic in the analogous over on the aarch64 side; feel free to do a drive-by edit with the same comment over there...)
cfallin created PR Review Comment:
Maybe add a note here that we don't need to worry about islands, despite exceeding the "worst-case inst size" (with the potentially large inline targets array), because on x64 our only label-use type right now has a 2GB range? If we later consider using shorter-range label references, we'll have to revisit this.
cfallin created PR Review Comment:
I think there might be a slightly shorter way to test for zero/nonzero:
test Rdivisor, Rdivisor
/jnz ...
. Then we don't need the zero-immediate. Saves a byte, anyway...
cfallin created PR Review Comment:
Suggestion: could we make the
ShowWithRRU
pretty-print format here use assembler-parseable syntax, i.e.,lea 0(%rip), %rdi
? This makes it slightly more mechanical (hence easier to do by hand, and easier to possibly automate later) to verify that the golden bytes match the instruction.
cfallin created PR Review Comment:
It might be worthwhile to go ahead and box up / out-of-line some of this, like we did with
JTSequenceInfo
, to keep theInst
size down.
cfallin created PR Review Comment:
Perhaps we should have an enum for this, like the
InstSize
enum over in aarch64? (We could make it#[repr(u8)]
and assign the arms the above numeric values.)
cfallin created PR Review Comment:
Use the
test
instruction (as above) to save an immediate byte?
cfallin created PR Review Comment:
I wonder if there's a way to merge these signed cases with the above unsigned to reduce duplication a bit. Especially if the unsigned checks (and divide itself) also become a meta-instruction, the lowering is more similar above and below, I think?
cfallin created PR Review Comment:
On a little further thought after our conversation yesterday: I think we should actually probably avoid this fixed offset, because there is a chance that instructions are inserted between the use (jump) and destination (after trap). In particular, we might get spills/reloads; or a later optimization pass, if we eventually have some on the VCode, could rearrange things. So I think we should do here what you did for the signed case below, and wrap up all of the checks into one VCode meta-instruction.
cfallin created PR Review Comment:
Yes, I'd suggest something like the
NarrowValueMode
stuff, but deferring that till later is definitely OK!
cfallin created PR Review Comment:
It seems in some places the backend measures size in bits, while in others (e.g. size field in a divide VCode instruction) it is in bytes. Might be good to standardize on bits (given that all the types, e.g.
I8
, are named as such), and use the enum suggested above here as well; it could have afrom_bits
as onInstSize
in aarch64.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Interestingly, there's also a
PCRel32
in addition toJmpRel32
: the former adds the value read at the offset before patching, while the latter does subtract 4 to the read value before patching (because control flow instructions are relative to the start of the next instruction on x64). This is one of these cases here. Should i renameJmpRel32
toPCRel32Disp4
or something like this instead?
bnjbvr created PR Review Comment:
Yep, it'll probably be part of a greater refactoring though; some instructions use a boolean to distinguish 32 bits encodings from 64 bits encodings, not implementing all the other ones. Maybe cg_clif wants the smaller sizes encodings? But if not, implementing only 32 and 64 bits might be sufficient in general.
bnjbvr submitted PR Review.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Added to the todo list, will probably make more sense once all the instructions have been implemented.
bnjbvr updated PR #1956 from x64
to main
:
This makes it possible to run a basic C hello world with wasmtime. It's a large PR but the commit splitting should make it easier to review. I'll add emit tests for div/idiv before landing, but reviewing shouldn't be blocked on this.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
As a general comment, I think we should try to avoid generating 8- and 16- bit instructions as much as possible, and instead generate 32-bit insns where we ignore the upper 24 or 16 bits of the result. The reason is to avoid partial register stalls. These are documented in the Intel Opt manual, in my copy Sec 3.5.2.4 (Partial Register Stalls), with similar comments in following sections (Partial XMM Register Stalls, Partial Flag Register Stalls).
Also, 16-bit instructions which contain an immediate field are subject to the stalls described in Sec 3.4.2.3 (Length Changing Prefixes).
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Maybe
LabelUse
s need to be architecture-specific, then, if they are not already?
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
I agree; but I also don't think this is critical. We could quite reasonably change
cmp $0, ..
intotst ..
(and maybe some other patterns) in future work to optimise the x64 isel, so we don't delay landing this one further.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Or have a naming scheme that makes the units explicit? For example,
offset_szW
-- offset in words.offset_szB
-- offset in bytes.offset_sz
-- offset in bits. Also, useful variants --offset_szBlg2
-- log 2 of size in bytes, etc.
julian-seward1 submitted PR Review.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
also for and, or, xor, mul
julian-seward1 created PR Review Comment:
I think this needs to show the inverse of
trap_code
(viz,trap_code ^ 1
) since what this currently displays is "skip the trap iftrap_code
is true".
julian-seward1 created PR Review Comment:
Please could you add here, a comment showing what instructions are actually emitted? It's pretty hard to infer from reading the code. Same for SignExtendRaxRdx et al.
julian-seward1 created PR Review Comment:
Given that
cwd
is 6690,cdq
is 90 andcqo
is 4890, (or something like that), and there's only three possibilities, I think it would be clearer to the reader just to write the relevant sink.putN calls directly. These widening insns are ancient 1970s-era stuff and don't fit in the "standard" RM->R scheme.
julian-seward1 created PR Review Comment:
Per comments I added to one of Chris's comments, I think we should generate 32-bit insns instead of 8- or 16-bit insns except where we absolutely can't avoid them (eg, stores), so as to avoid partial register stall hazards.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
They are architecture-specific, already: this is defined in inst/mod.rs.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Good point, I'll revisit this by re-checking all the lowering.
bnjbvr updated PR #1956 from x64
to main
:
This makes it possible to run a basic C hello world with wasmtime. It's a large PR but the commit splitting should make it easier to review. I'll add emit tests for div/idiv before landing, but reviewing shouldn't be blocked on this.
bnjbvr merged PR #1956.
Last updated: Jan 24 2025 at 00:11 UTC