bnjbvr opened PR #1605 from rebase-x64
to master
:
This is a draft-ish PR containing a rebase of Julian's x64 work on the new isel backend, including quite a few refactorings of mine. During today's Cranelift meeting, we agreed that it might make sense to add the incomplete backend here, allowing incremental development by more contributors.
I've tried to split the files in a way that was similar to the aarch64 backend, but some things might still be wobbly; I might check this a bit further more tomorrow. Of course, that's also something we could do as follow-up PRs.
The new x64 backend is controlled by a new Cargo feature
x64
(which also requiresx86
). Then, users ofclif-util
can use the--target x86_64
argument, and:
- get the code generated by the current isel (with legalization, Recipes, Encodings, etc.) and regalloc if nothing more is provided.
- or add
--set use_new_backend
and this will use the new isel backend. (Expect panics, unimplemented errors, etc.)Second (small) commit is what allows passing ISA-specific flags to
--set
, which wasn't implemented yet.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
fn xmm(num: u8) -> Reg { fpr(num, 14 + num) }
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Please return the original triple as passed to
isa_builder
.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
if isa_flags.use_new_backend() { #[cfg(feature = "x64")] { use super::x64; return x64::isa_builder(triple).finish(shared_flags) } #[cfg(not(feature = "x64"))] { panic!("new backend x86 support not included by cargo features!"); } } else { Box::new(Isa { triple, isa_flags, shared_flags, cpumode: level1, }) }
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
There is already a way to have target specific settings:
target riscv32 supports_m=1
bjorn3 created PR Review Comment:
/// `simm32` is its sign-extension out to 64 bits.
bjorn3 submitted PR Review.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
/// the signless, non-extending (N x N -> N, for N in {32,64}) variant Mul,
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
/// These indicate ways of extending (widening) a value, using the Intel /// naming: B(yte) = u8, W(ord) = u16, L(ong)word = u32, Q(uad)word = u64
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
/// byte -> long word BL, /// byte -> quad word BQ, /// word -> long word WL, /// word -> quad word WQ, /// long -> quad word LQ,Also is there no byte -> word variant?
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
/// overflow O = 0, /// no overflow NO = 1, /// <u B = 2, /// >= u NB = 3, /// zero Z = 4, /// not zero NZ = 5, /// <=u BE = 6, /// >u NBE = 7, /// negative S = 8, /// not negative NS = 9, /// <s L = 12, /// >=s NL= 13, /// <=s LE = 14, /// >s NLE = 15,
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
self as u8
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
#[derive(Copy, Clone)] #[repr(u8)]
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
CC::O => CC::NO, CC::NO => CC::O, CC::B => CC::NB, CC::NB => CC::B, CC::Z => CC::NZ, CC::NZ => CC::Z, CC::BE => CC::NBE, CC::NBE => CC::BE, CC::S => CC::NS, CC::NS => CC::S, CC::L => CC::NL, CC::NL => CC::L, CC::LE => CC::NLE, CC::NLE => CC::LE,
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
// Don't build these directly. Instead use the `Inst::*` functions to create them.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Ret,
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Why? You can just emit multiple nop instructions, right?
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
s + &format!("{nil: <width$}", nil = "", width = need)
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
_ => panic!("Inst(x64).show.suffixBWLQ (size={})", size),
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
// For registers that appear in a read role, apply `pre_map`. For those // that appear in a write role, apply `post_map`. For registers that
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
// Note that this function must be closely coordinated with `fn regs`, in // the sense that each arm "agrees" with the one in `fn regs` about which
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
apply_map(dst, pre_map); // yes, really `pre_map`
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
// conceivably use `movl %reg, %reg` to zero out the top 32 bits of
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
collector.add_use(*dst); // yes, really `add_use`
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
// serious error in the allocator or in our `get_regs` function.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
// For an instruction that has as operands a register `encG` and a memory // address `memE`, create and emit, first the REX prefix, then caller-supplied // opcode byte(s) (`opcodes` and `numOpcodes`), then the MOD/RM byte, then // optionally, a SIB byte, and finally optionally an immediate that will be // derived from the `memE` operand. For most instructions up to and including // SSE4.2, that will be the whole instruction.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
/// Get the encoding number from something which we sincerely hope is a real /// register of class I64.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
/// Emit the REX prefix byte even if it appears to be redundant (== 0x40).
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
/// Set the W bit in the REX prefix to zero. By default it will be set to 1, /// indicating a 64-bit operation.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
/// Add an 0x66 (operand-size override) prefix. This is necessary to indicate /// a 16-bit operation. Normally this will be used together with F_CLEAR_REX_W.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
// then the caller should pass `opcodes` == 0xF3_0F_27 and `numOpcodes` == 3.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
// The register operand is represented here not as a `Reg` but as its hardware // encoding, `encG`. `flags` can specify special handling for the REX prefix.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
This should all be a doc comment.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
// General comment for this function: the registers in `memE` must be
bjorn3 created PR Review Comment:
// expression. But `encG` can be derived from a register of any class.
bjorn3 submitted PR Review.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
/// This is the core 'emit' function for instructions that do not reference /// memory. /// /// This is conceptually the same as /// emit_REX_OPCODES_MODRM_SIB_IMM_encG_memE, except it is for the case /// where the E operand is a register rather than memory. Hence it is much /// simpler.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
/// These are merely wrappers for the above two functions that facilitate passing /// actual `Reg`s rather than their encodings.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
// JRS FIXME 2020Feb07: this should really just be `regEnc` not `iregEnc`
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
// JRS FIXME 2020Feb07: these should really just be `regEnc` not `iregEnc`
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
/// Write a suitable number of bits from an imm64 to the sink.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
/// The top-level emit function. /// /// Important! Do not add improved (shortened) encoding cases to existing /// instructions without also adding tests for those improved encodings. That /// is a dangerous game that leads to hard-to-track-down errors in the emitted /// code. /// /// For all instructions, make sure to have test coverage for all of the /// following situations. Do this by creating the cross product resulting from /// applying the following rules to each operand: /// /// (1) for any insn that mentions a register: one test using a register from /// the group [rax, rcx, rdx, rbx, rsp, rbp, rsi, rdi] and a second one /// using a register from the group [r8, r9, r10, r11, r12, r13, r14, r15]. /// This helps detect incorrect REX prefix construction. /// /// (2) for any insn that mentions a byte register: one test for each of the /// four encoding groups [al, cl, dl, bl], [spl, bpl, sil, dil], /// [r8b .. r11b] and [r12b .. r15b]. This checks that /// apparently-redundant REX prefixes are retained when required. /// /// (3) for any insn that contains an immediate field, check the following /// cases: field is zero, field is in simm8 range (-128 .. 127), field is /// in simm32 range (-0x8000_0000 .. 0x7FFF_FFFF). This is because some /// instructions that require a 32-bit immediate have a short-form encoding /// when the imm is in simm8 range. /// /// Rules (1), (2) and (3) don't apply for registers within address expressions /// (`Addr`s). Those are already pretty well tested, and the registers in them /// don't have any effect on the containing instruction (apart from possibly /// require REX prefix bits). /// /// When choosing registers for a test, avoid using registers with the same /// offset within a given group. For example, don't use rax and r8, since they /// both have the lowest 3 bits as 000, and so the test won't detect errors /// where those 3-bit register sub-fields are confused by the emitter. Instead /// use (eg) rax (lo3 = 000) and r9 (lo3 = 001). Similarly, don't use (eg) cl /// and bpl since they have the same offset in their group; use instead (eg) cl /// and sil. /// /// For all instructions, also add a test that uses only low-half registers /// (rax .. rdi, xmm0 .. xmm7) etc, so as to check that any redundant REX /// prefixes are correctly omitted. This low-half restriction must apply to /// _all_ registers in the insn, even those in address expressions. /// /// Following these rules creates large numbers of test cases, but it's the /// only way to make the emitter reliable. /// /// Known possible improvements: /// /// * there's a shorter encoding for shl/shr/sar by a 1-bit immediate. (Do we /// care?)
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
// See comments at the top of `fn x64_emit` for advice on how to create
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
// (cd cranelift/codegen && \
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
// This insn is 6 bytes long. Currently `offset` is relative to
iximeow submitted PR Review.
iximeow created PR Review Comment:
encoding-wise,
movzx
andmovsx
have such an encoding with660f{b6,be}
. For example,movzx ax, cl // 66 0f b6 c1
. I dunno if we're just not using them though.
iximeow submitted PR Review.
iximeow created PR Review Comment:
// Baldrdash expects the stack to take at least the number of words set inOr maybe I'm misreading :)
iximeow created PR Review Comment:
just making sure I'm following the intended direction: this will eventually have an
if call_conv.extends_windows_fastcall()
arm like the existing x86 abi code, correct?
iximeow submitted PR Review.
iximeow submitted PR Review.
iximeow created PR Review Comment:
sink.put1(mkModRegRM(1, encG & 7, 5)); // could/?should? be encE & 7 for consistency sink.put1(0x25); sink.put1((simm32 & 0xFF) as u8);and similar for
!low8WillSXto32()
would encode theRBP
/R13
variants that are currently missing here right? is there a wider reason this ispanic!
, or just "not gotten to it yet"?
iximeow submitted PR Review.
iximeow created PR Review Comment:
This means we can't encode
SHR 0, %rax
(sure fine) but more importantly that attempting to emit aSHR 0, %rax
gets youSHR %cl, %rax
instead! My understanding is that this is a level below anything a user would see from building a function in Cranelift IR, so it's still unlikely to trip across this, correct?
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
I propose not to generate 16 bit operations (addw, etc) to the extent that it's possible to avoid them. They put us at risk of partial register stalls. Instead I plan to do as much 16-bit work as possible at 32 bit width, hence avoiding the stalls and also the need for the 66 prefix.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Is this really a win? IMO it wastes a lot of vertical screen space for minimal gain. Sure, we can rustdoc
CC
itself, but I think the benefit of documenting the meaning of each individual item is not worth the loss of screen space.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
I'd prefer to retain the original. That makes it easy to figure out the encoding for each condition. With
self as u8
you have to count through the definition to find the encoding.
iximeow submitted PR Review.
iximeow created PR Review Comment:
that sounds reasonable to me (pending if/when we might also optimize for size at least)
Do you know if decode stalls from 66-prefixed instructions with immediates are still a concern? At the least it might be helpful to note what is _intentionally_ not implemented, for future readers.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Hmm, I thought I had exhaustively tested this in the encoding test cases (there are 100+ amode encoding tests). So I'm not convinced there's a missing case (am not convinced there isn't a missing case, either). It's simple enough to shake out, by checking the encoding test cases.
iximeow submitted PR Review.
iximeow created PR Review Comment:
Ah, you're right - the initial checks against
R{BP,13}
had me looking for an else mentioning them, but they're encoded through theif low8willSXto32(*simm32) && encE != regs::ENC_RSP && encE != regs::ENC_R12
case. There are a few tests for these such as for127(%rbp)
.How do you feel about upgrading this to an
unreachable!
?
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Yes, it's the instruction selector's responsibility not to try to produce
shr $0, %reg
, but that's fine -- it can just emit no-instruction if it needs to do that -- which would in itself reflect incomplete cleanup at the CLIR level. I guess it would be conceptually cleaner and safer change this so that the shift amount is anOption<u8>
though; maybe we should do that.
julian-seward1 edited PR Review Comment.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Multiple single-byte nop insns, you mean? Sure, you can do that, but that wastes front end resources. Intel defines "official" no-ops with lengths at least 1 to 14 and maybe more (I forget).
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
I don't know the details (Benjamin surely will), but no doubt the intent exists to make it possible to support windows-specific ABIs as necessary.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
I would be the first to admit that the logic is pretty hard to follow there. Er, if
unreachable!
is the right thing there, then use that. Just so long as it really is unreachable.
julian-seward1 edited PR Review Comment.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
I don't know about decode stalls offhand; we'd have to consult the latest incarnation of the Intel "Optimisation Reference Manual". Note though that I was speaking above of dispatch stalls, not decode stalls -- I forgot there might also be decode limitations till you mentioned it.
Yes, documenting the general intention to avoid generating 16-bit operations as much as possible would be a good thing to do.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
The previous suggestion explicitly assigns a discriminant to every variant. Otherwise this wouldn't even be correct, as 10 and 11 are skipped.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
The function I'm modifying is called by the CLI flags parser, though; is there a way to pass target-specific flags through the CLI that I missed?
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Agreed with Julian, there's an official selection of nop instructions for different sizes.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Yes, that's the idea! We'd pass the
CallConv
, or we'd select different impl in the caller according to the givenCallConv
here, as done inx86/abi.rs
.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
I'll punt on this one, it does break the symmetry with GPR, and it may require a bunch of changes all over the place.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
I'll assume no, then :)
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
I heard screen space is free, so I'll take it!
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
nice
bnjbvr updated PR #1605 from rebase-x64
to master
:
This is a draft-ish PR containing a rebase of Julian's x64 work on the new isel backend, including quite a few refactorings of mine. During today's Cranelift meeting, we agreed that it might make sense to add the incomplete backend here, allowing incremental development by more contributors.
I've tried to split the files in a way that was similar to the aarch64 backend, but some things might still be wobbly; I might check this a bit further more tomorrow. Of course, that's also something we could do as follow-up PRs.
The new x64 backend is controlled by a new Cargo feature
x64
(which also requiresx86
). Then, users ofclif-util
can use the--target x86_64
argument, and:
- get the code generated by the current isel (with legalization, Recipes, Encodings, etc.) and regalloc if nothing more is provided.
- or add
--set use_new_backend
and this will use the new isel backend. (Expect panics, unimplemented errors, etc.)Second (small) commit is what allows passing ISA-specific flags to
--set
, which wasn't implemented yet.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
it may require a bunch of changes all over the place.
These functions are not exported from this module.
I'll punt on this one, it does break the symmetry with GPR
Fair enough
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
/// Actually codegens an instruction's results into registers.
bjorn3 deleted PR Review Comment.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
What about a
return
in the middle of a function? The current code would just fall through to the next instruction without ever inserting areturn
, right?
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
// `VMContext` is `r14` in Baldrdash.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
pub(crate) fn get_enc(self) -> u8 { self as u8
CC
is copy
abrown submitted PR Review.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
I'm sure this is handled correctly, although I am not exactly sure how. The new backend and register allocators are block-order invariant; they can be presented with the blocks in any order; they don't care. After regalloc is done, a final block order is chosen, branches which would be fallthroughs are removed, empty blocks are removed, and the prologue/epilogue machinery identify the entry/exit blocks and add the relevant extra insns.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
I mean how does it know that this is an exit block if there is no
ret
emitted?
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
You'd have to ask Chris or Benjamin for a definite answer, but .. I'm pretty sure it's because the higher level steering machinery in the new BE knows which CLIR-level blocks are exit blocks, so it knows which machine code blocks are exit blocks.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Ha, interesting. Indeed it lacks the emission of
Ret
. This bug wouldn't be observable in spidermonkey since it uses the FallthroughRet, which means wasm returns in the middle of a function are translated into a jump to a single return block at the end of the function.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Taking this back: the actual
Ret
opcode for this is generated inmachinst/lower.rs
, ingen_retval_setup
.
bnjbvr updated PR #1605 from rebase-x64
to master
:
This is a draft-ish PR containing a rebase of Julian's x64 work on the new isel backend, including quite a few refactorings of mine. During today's Cranelift meeting, we agreed that it might make sense to add the incomplete backend here, allowing incremental development by more contributors.
I've tried to split the files in a way that was similar to the aarch64 backend, but some things might still be wobbly; I might check this a bit further more tomorrow. Of course, that's also something we could do as follow-up PRs.
The new x64 backend is controlled by a new Cargo feature
x64
(which also requiresx86
). Then, users ofclif-util
can use the--target x86_64
argument, and:
- get the code generated by the current isel (with legalization, Recipes, Encodings, etc.) and regalloc if nothing more is provided.
- or add
--set use_new_backend
and this will use the new isel backend. (Expect panics, unimplemented errors, etc.)Second (small) commit is what allows passing ISA-specific flags to
--set
, which wasn't implemented yet.
bnjbvr merged PR #1605.
Last updated: Nov 22 2024 at 16:03 UTC