abrown opened Issue #2290:
Many instruction formats in the new x64 backend accept a
RegMem
parameter, implying that they may load from memory. This may cause a trap if the memory address is out-of-bounds and, in order to return the correct trap code, we should be registering the source location of the instruction, e.g.:if let Some(srcloc) = *srcloc { sink.add_trap(srcloc, TrapCode::HeapOutOfBounds); }
Currently, some formats do accept a
RegMem
parameter but do not correctly register the possible trap location. This currently passes the spec tests because these formats are not used in theirMem
variant, but for correctness and completeness we should register the trap locations any time it is possible that an instruction may load from memory.The current convention for doing so appends a
srcloc: Option<SourceLoc>
field to the format and expects the programmer to correctly choose betweenNone
andSome(ctx.srcloc(insn))
when instatiating a new instruction format. I propose we change this: instead, attach theOption<SourceLoc>
, or perhaps even a plainSourceLoc
, toSyntheticAmode
. This approach has advantages:
- any programmer attempting to do a load/store will be forced to decide how the trap will be reported (but not when only accessing registers)
- perhaps the helper functions in
lower.rs
can automatically attach aSourceLoc
to anySyntheticAmode
s created- the
Inst
formats no longer need an extra field filled withNone
in many casesIf we agree that the above approach is a good one, this issue should be closed not only
- when that refactoring is complete but
- also when any instruction formats that can load/store properly registers traps for their
RegMem::Mem
variants
abrown commented on Issue #2290:
cc: @julian-seward1, @bnjbvr, @cfallin
abrown edited Issue #2290:
Many instruction formats in the new x64 backend accept a
RegMem
parameter, implying that they may load from memory. This may cause a trap if the memory address is out-of-bounds and, in order to return the correct trap code, we should be registering the source location of the instruction, e.g.:if let Some(srcloc) = *srcloc { sink.add_trap(srcloc, TrapCode::HeapOutOfBounds); }
Currently, some formats do accept a
RegMem
parameter but do not correctly register the possible trap location. This passes the spec tests because these formats are not used in theirRegMem::Mem
variant, but for correctness and completeness we should register the trap locations any time it is possible that an instruction may load from memory.The current convention for doing so appends a
srcloc: Option<SourceLoc>
field to the format and expects the programmer to correctly choose betweenNone
andSome(ctx.srcloc(insn))
when instatiating a new instruction format. I propose we change this: instead, attach theOption<SourceLoc>
, or perhaps even a plainSourceLoc
, toSyntheticAmode
. This approach has advantages:
- any programmer attempting to do a load/store will be forced to decide how the trap will be reported (but not when only accessing registers)
- perhaps the helper functions in
lower.rs
can automatically attach aSourceLoc
to anySyntheticAmode
s created- the
Inst
formats no longer need an extra field filled withNone
in many casesIf we agree that the above approach is a good one, this issue should be closed not only
- when that refactoring is complete but
- also when any instruction formats that can load/store properly registers traps for their
RegMem::Mem
variants
bnjbvr commented on Issue #2290:
I agree that there's an issue with the current design, and indeed the
RegMem::Mem
variant could have its source location. (No need forOption<SourceLoc>
in fact, sinceSourceLoc
already has a default value meaning invalid that we could reuse to mean "undefined".) This would become a large issue wheninput_to_reg_mem
can actually embed a load in an address-mode operand.Unfortunately, this also means that the SyntheticAmode would then become even much bigger, going against the general plan of making Vcode's inst data fit into 32 bytes or fewer. It would be nice to find a way to properly pack here without inflating the RegMem too much.
To wit: these source locations are also only useful when not using implicit bounds checking (via signal handling and the 4GB memory page trick). When using implicit bounds checking, this data information has no use at all. We could imagine adding a trait to lowering, to decide whether we're using explicit bounds checks or not, and that trait would select a different type for
RegMem
and/orSyntheticAmode
, but that lead to a lot of binary code duplication in the Cranelift-embedding binary, I expect...
cfallin commented on Issue #2290:
So at a high level, I think this issue implies several sub-problems to solve:
- We want to design the API for creating
MachInst
s such that it is not possible to accidentally create an inst that should have aSourceLoc
attached without thatSourceLoc
. I think that we can probably achieve this by auditing the helpers that we create.- We want to be sure to register the trap-codes and trap-locations at the appropriate offsets in the sink
- We want to store the
SourceLoc
s efficiently, that is, ideally not inflate the size of everyRegMem
if we can help it.It occurred to me that we could solve the last two issues simultaneously by keeping source-location / trap info in a side-table, indexed by instruction (i.e., a sparse instruction-index to info map), and emitting this metadata to the sink as we iterate over the insts, outside of the machine backend's particular
MachInstEmit
method. We already have a notion of metadata outside the instruction, in the form of the "is this a safepoint" bit; perhaps this is another.(In a sense, both of these bits are orthogonal to, and outside the scope of, the machine instruction itself; the
MachInst
is some instruction/operation that updates machine state, and the safepoint bit and trap info are metadata carried by the emission pipeline out to side-tables on the machine code.)So perhaps we have a notion of
InsnMetadata
records; each instruction has zero or more such records; they are held by the VCode container in a sparse manner, sorted by instruction index (i.e., kept in the same order as the insts); and the VCode container promises to keep the order in sync, if e.g. it reorders or removes insts (e.g., move elision after RA).Then we have some trait that the
InsnMetadata
implements that "emits" the metadata to theMachBuffer
sink; these emit-calls are interleaved with instruction emit calls as appropriate.Thoughts? I can sketch this in more detail if there's general agreement that we should have a framework something like this.
abrown commented on Issue #2290:
When we call
Inst::foo(...)
, how wouldInst
have a way to stuff the metadata into intoVCode
? I don't see how they reach one another.(As an aside about
Inst
size, I have wondered why we are not using#[repr(u8)]
more liberally to fit stuff into 32 bytes: in the longest case I can think of, we would have anInst
with the following conceptual fields,{bool, opcode, source loc, dst reg, src from shifted amode}
. If we applied#[repr(u8)]
, we could get this case down toinst tag/1 + bool/1 + opcode/2 + source loc/4 + dst reg/4 + src tag/1 + SytheticAMode tag/1 + Amode tag/1 + simm32/4, + base reg/4 + index reg/4 + shift/1
which equals 28 bytes. I'm not saying we shouldn't store stuff in a side table, that may still make sense, but why not pack things tighter? Or perhaps Rust won't pack things this way?)
cfallin commented on Issue #2290:
Re:
#[repr(u8)]
: my understanding from @bnjbvr's explanation is that there are alignment/padding restrictions that come along with nested enums -- so if every field were in onestruct
, we could indeed pack more tightly, but we haven't gone there. (It could be an option if we have a way to keep the ergonomics somehow; maybe a "packed" and "unpacked" form; needs more thought.)When we call Inst::foo(...), how would Inst have a way to stuff the metadata into into VCode? I don't see how they reach one another.
Indeed, the idea is that the creator of the
Inst
would attach the metadata as it passes the instruction to theLowerCtx
. Right now we have (slightly hackishly)ctx.emit()
andctx.emit_safepoint()
; perhaps instead we could havectx.emit_with_metadata()
, and then have various constructors for the metadata info/flags (InsnMetadata::builder().with_mem_trap_loc(...).build()
maybe, and.is_safepoint()
as well to set the safepoint bit?).Anyway, just a thought; perhaps it's too much infrastructure for just a few things; but it at least enables sparsity and gives the ability to only optionally include the data (with zero cost if not), which is good :-)
abrown commented on Issue #2290:
Makes sense to me. @bnjbvr, @julian-seward1?
bnjbvr commented on Issue #2290:
Re: #[repr(u8)]: my understanding from @bnjbvr's explanation is that there are alignment/padding restrictions that come along with nested enums -- so if every field were in one struct, we could indeed pack more tightly, but we haven't gone there. (It could be an option if we have a way to keep the ergonomics somehow; maybe a "packed" and "unpacked" form; needs more thought.)
Yes, exactly this: each different enum adds a new word for the discriminant (not sure if it applies under certain conditions like "when the enum may be heap-allocated" or not; this matches what I've observed with the Inst enum, though). Even if we flattened all the enums into a single one, and then used a newtype as suggested by @fitzgen, we'd not have enough space; the ideal packing would need the RegMem SIB shift offset (3 bits) to be on the same byte as the overall emulated discriminant, if I remember correctly my back-of-the-envelope computation.
Indeed, the idea is that the creator of the Inst would attach the metadata as it passes the instruction to the LowerCtx. Right now we have (slightly hackishly) ctx.emit() and ctx.emit_safepoint(); perhaps instead we could have ctx.emit_with_metadata(), and then have various constructors for the metadata info/flags (InsnMetadata::builder().with_mem_trap_loc(...).build() maybe, and .is_safepoint() as well to set the safepoint bit?).
The Metadata struct sounds good to me! If it has only two fields (is_safepoint + trap's sourceloc), and in no cases both are needed at the same time, the Builder pattern may be overhead, and it may increase Rustc compile times for the already-quite-large
lower
functions (but that's still another issue, that can be handled separately).
cfallin closed Issue #2290:
Many instruction formats in the new x64 backend accept a
RegMem
parameter, implying that they may load from memory. This may cause a trap if the memory address is out-of-bounds and, in order to return the correct trap code, we should be registering the source location of the instruction, e.g.:if let Some(srcloc) = *srcloc { sink.add_trap(srcloc, TrapCode::HeapOutOfBounds); }
Currently, some formats do accept a
RegMem
parameter but do not correctly register the possible trap location. This passes the spec tests because these formats are not used in theirRegMem::Mem
variant, but for correctness and completeness we should register the trap locations any time it is possible that an instruction may load from memory.The current convention for doing so appends a
srcloc: Option<SourceLoc>
field to the format and expects the programmer to correctly choose betweenNone
andSome(ctx.srcloc(insn))
when instatiating a new instruction format. I propose we change this: instead, attach theOption<SourceLoc>
, or perhaps even a plainSourceLoc
, toSyntheticAmode
. This approach has advantages:
- any programmer attempting to do a load/store will be forced to decide how the trap will be reported (but not when only accessing registers)
- perhaps the helper functions in
lower.rs
can automatically attach aSourceLoc
to anySyntheticAmode
s created- the
Inst
formats no longer need an extra field filled withNone
in many casesIf we agree that the above approach is a good one, this issue should be closed not only
- when that refactoring is complete but
- also when any instruction formats that can load/store properly registers traps for their
RegMem::Mem
variants
Last updated: Oct 23 2024 at 20:03 UTC