cfallin opened PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
sunfishcode created PR Review Comment:
Is this a temporary workaround? If so, could you add a comment here explaining what the real fix is? If not, could you add a comment explaining why probing isn't needed? :-)
sunfishcode submitted PR Review.
sunfishcode submitted PR Review.
sunfishcode created PR Review Comment:
This suggests you're generating an
OutOfBounds
trap code rather than aHeapOutOfBounds
trap code. Is that a temporary limitation, or something more fundamental? If it's temporary, could you add a TODO comment here or so?
sunfishcode created PR Review Comment:
pp
isn't something I can guess the meaning of without more context. What would you think about renaming it topretty-print.rs
or so?
sunfishcode created PR Review Comment:
all-arch
was deliberately removed from the default in https://github.com/bytecodealliance/cranelift/pull/853. Is there a new need for it to be enabled here, or is this something that could be enabled by downstream users of cranelift?
sunfishcode created PR Review Comment:
Could you put this under a
cfg!(target_arch = "aarch64")
so that we don't register handlers for things we don't need to on x86?
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Actually, a few days ago we found an alternative opcode that does deliver
SIGILL
on arm64, and I hadn't updated this yet -- so we can just remove it.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Hmm, I need to look into this one more. I remember making the edit thinking it was just a message change that a merge had missed, but I think you're correct that we're generating a different trap code now. We have some other failing tests on arm64 (i.e. only when running
cargo test
on arm64 hardware), so I'll revert and this just joins the club to be resolved later.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Renamed!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Sorry, that was added for development convenience only; reverted.
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Addressed -- it seems that stack probes are only supported on x86 / x86-64. I added a link to Rust's compiler-builtins impl of
__rust_probestack
where a comment describes the situation. Modified the code here to refer toempty_probestack
only on non-x86, non-x86-64 architectures.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
FWIW with https://github.com/bytecodealliance/wasmtime/pull/1490 we won't need this anymore anyway.
alexcrichton created PR Review Comment:
FWIW I'm adding the necessary Rust definitions here in https://github.com/rust-lang/libc/pull/1728, but this I think is the way to go for now.
alexcrichton submitted PR Review.
bjorn3 submitted PR Review.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
This assertion fires when there are multiple
return
instructions, right?
bjorn3 created PR Review Comment:
Wildcard patterns make it easy to forget updating a place when a new instruction format is added.
bjorn3 created PR Review Comment:
Can this be stored in
cranelift_codegen::Context
or some other context to prevent continuous reallocation.
bjorn3 created PR Review Comment:
Use accessors instead? This type has several invariants that are easily broken by modifying
insts
.
bjorn3 created PR Review Comment:
/// Pretty-printing with `RealRegUniverse` context.Rustdoc supports doc comments on
impl
's
bjorn3 created PR Review Comment:
write!(s,You may need to import
std::fmt::Write
though.
bjorn3 created PR Review Comment:
s.push_str("VCode_ShowWithRRU {{");
bjorn3 created PR Review Comment:
Why is this printing two
{
by the way.
bjorn3 created PR Review Comment:
Maybe use something like
let [a, b] = value.to_be_bytes(); self.put1(a); self.put1(b);
bjorn3 submitted PR Review.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Same as above
bjorn3 created PR Review Comment:
s.replace_range(0..1, "w");
bjorn3 created PR Review Comment:
s.push('w');
bjorn3 created PR Review Comment:
s.replace_range(0..1, prefix); s
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
This can take
self
by value.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
pub fn value(self) -> u8 {
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
The
aligned
memflag is never checked. Are stores always lowered to unaligned stores, or will it just emit a signal on unaligned stores?
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Assert that the triple has
aarch64
as isa?
bjorn3 created PR Review Comment:
constructor: |_, shared_flags, _| { let backend = Arm64Backend::new_with_flags(shared_flags); Box::new(TargetIsaAdapter::new(backend)) },If a closure doesn't capture any variables, it can coerce to a function pointer.
bjorn3 submitted PR Review.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Previously you could just reuse the allocations inside
Function, but this will discard all allocations inside
Function`.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Why are these clears no longer done for the new backend?
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Please keep this for the new backend.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
This doesn't seem to be used.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
use std::fmt::Write; let mut s = String::new(); for b in &self.bytes { write!(s, "{:02X}, b); } s
bjorn3 edited PR Review Comment.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Why not use the existing
target aarch64
directive?
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
The
probestack
has an ABI that is not compatible withC
. This means that this will only work correctly if the compiler happened to decide that only aret
should be emitted. If it does anything else, like writing to a caller saved register that was used as argument for the caller ofprobestack
, you will likely get a crash.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Actually, I think it'd be even better to defer this choice to later; LSRA isn't properly working yet (I'll switch back when i'm done with the Spidermonkey integration), so we shouldn't give a false sentiment of safety to people by allowing them to use it. So I'd strongly recommend to just always select the backtracking allocator.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Maybe name the LSRA option
experimental_lsra
?
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
This should be the triple passed to
isa_builder
. This value will result in a binary format of unknown, which means thatcranelift_object
will not work.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Also
target-lexicon
calls itaarch64
.
sunfishcode submitted PR Review.
sunfishcode submitted PR Review.
sunfishcode created PR Review Comment:
Reusing this predicate logic is a good idea, but we should move it (and its dependencies) out of dce.rs so that dce.rs can focus on being a DCE pass. Perhaps we could start a new file, inst_predicates.rs or so?
sunfishcode created PR Review Comment:
This may be an artifact of development, but we should avoid adding warning overrides like this in the main tree unless there's a practical reason we can't just fix the warning.
sunfishcode created PR Review Comment:
AArch64 is indeed the more official name for the architecture, and the name rustc uses, though target-lexicon and other tools do recognize "arm64" as an alias for "aarch64". I don't think it's necessary, but it would be kind of nice to rename the whole "arm64" target to "aarch64" while we have the chance. Anyone have opinions?
sunfishcode created PR Review Comment:
The optimization here is converting a comparison opcode that produces an integer result and a branch that consumes an integer result into a comparison that produces a flags result and a branch that consumes the flags result. Currently it does this by rewriting the comparison in place, so there could be instructions between the compare and the branch. If they clobber the flags, the optimization isn't correct. So it isn't safe to disable this check. A possible alternative though would be to change the optimization to move the comparison just before the branch, to ensure that the flags value isn't clobbered.
sunfishcode created PR Review Comment:
Can you comment on why it makes sense to disable the FlagsVerifier with the mach backend?
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Added comment.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done, thanks!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Eh, I think that's probably unnecessary complexity -- the
ABIBody
is specialized to, and only lives for the duration of, one function compilation anyway.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Also unnecessary complexity IMHO; we'd then need to store it twice to allow the outer layer (VCode) to return information upward.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Simply reverted to
RegAllocAlgorithm::Backtracking
for now; we can keep an env var backdoor as part of a local development-setup patch if needed, but for now the only stable/fast choice for mainline is BT, without checker, I think (Ben/Julian: correct me if wrong, if you think we should enable checker by default).
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done.
cfallin submitted PR Review.
cfallin created PR Review Comment:
I'm not sure it's reasonable to list out all
InstructionData
variants here and in the other helpers, as there are (at present) 50 of them; that's quite a lot of verbosity. Ideally we would add new accessors to the generated code. I'm not as familiar with the DSL mechanisms for that; thoughts?
cfallin submitted PR Review.
cfallin created PR Review Comment:
In a subsequent commit on our dev branch, it becomes a function of the flags. I think this addresses both of your comments (for the former, we can't return a static instance because it might be dynamically computed per the settings).
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Instructions are lowered in reverse order, so I think this is actually correct (albeit counterintuitive!). I'll add a comment to explain this better.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Perhaps, but
TargetIsa
requires the information to come from somewhere. If we eventually remove more vestigial code when all backends migrate to the new framework, I think we'll be able to remove this altogether.
cfallin submitted PR Review.
cfallin created PR Review Comment:
The intent is to allow multiple sections to be constructed simultaneously; this rules out a streaming approach. (We've currently folded the constants and jumptables in as inline data on ARM64, but we previously actually build the sections this way, and could change again when we have proper support for long-range PC-relative addressing.)
cfallin submitted PR Review.
cfallin created PR Review Comment:
Did one better and removed the
pub
qualifiers (only used for access by theBlockRPO
computation).
cfallin submitted PR Review.
cfallin created PR Review Comment:
This is @julian-seward1's debug-printing format; I'm not entirely sure but I am sure there is a good reason for it :-) Perhaps it allows for easier log-grepping. I'll resolve this for now but we can always come back and refine the debug-printing format later if it needs to be more concise, or machine-readable, or whatever.
cfallin submitted PR Review.
cfallin created PR Review Comment:
I'm not sure what you mean by "continuous" -- this is one reallocation that happens per function compilation (
replace_insns_from_regalloc
is called once in the compilation pipeline). Holding a persistent vector to reuse inContext
wouldn't really work with parallel compilation, if I understand your proposal correctly.
cfallin submitted PR Review.
cfallin created PR Review Comment:
I learn something new about the standard library every day -- thanks!
cfallin submitted PR Review.
cfallin created PR Review Comment:
OK, I haven't used rust-analyzer before; but for basic source-code readability, I'd prefer to keep the argument-name comment here.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Yes, but this only happens when the "fallthrough return" mode is used (grep for
GenerateReturn::No
inmachinst/lower.rs
,gen_retval_setup()
), which comes from SpiderMonkey (the Baldrdash ABI). In that case, a return in the middle of the function doesn't make sense anyway.
cfallin submitted PR Review.
cfallin created PR Review Comment:
I was going to make this change at first, but it's quite a refactor to a bunch of debug-printing code (and more in the x64 backend which is not included here) which I'm not inclined to think is worth the effort. The
show_rru()
impls are meant to help debug-logging and are never called in production, so I think it's probably OK to keep this interface as-is, unless Dan/Ben/Julian disagree?
cfallin submitted PR Review.
cfallin created PR Review Comment:
The comment here is wrong, on closer inspection -- removed. (If
final_block_order
is empty, that line of the output just shows an empty list.)
cfallin submitted PR Review.
cfallin created PR Review Comment:
Apparently, aarch64 supports unaligned accesses with the ordinary load/store instructions. I just tested this on real hardware and it seems to work; [1] also confirms.
[1] https://stackoverflow.com/questions/38535738/does-aarch64-support-unaligned-access
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Yes, that's a theoretical concern. In practice, the empty function
{}
compiles to just aret
in both debug and release mode on both x86-64 and aarch64 (just tested all four combinations). Do you have a suggestion for a better approach (compiler attributes, etc)?
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin submitted PR Review.
cfallin created PR Review Comment:
I agree, we might as well standardize on the official name. Renamed the backend to
aarch64
. There are still a few vestigial remnants ofArm64
around, e.g. theArm64Call
reloc type which is exposed in the public API.
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Ah, OK -- the arm64 backend at least will correctly handle an
icmp
+brz
/brnz
sequence, so we don't need this optimization -- I've just altered it to skip this mutation entirely when a MachBackend is in use.
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin submitted PR Review.
cfallin created PR Review Comment:
The FlagsVerifier as written uses the encoding info to determine whether instructions clobber the flags. We could certainly build an equivalent with the new infrastructure, or else add a target-independent notion of "clobbers flags" to the CLIF opcodes themselves, but it would be some extra plumbing. What do you think?
cfallin submitted PR Review.
cfallin created PR Review Comment:
The simple answer is that I couldn't work out how to fetch it. It doesn't appear to be included in the
preamble_comments
of theContext
. It doesn't seem too important to me; but I'm happy to change it if you can point out the proper plumbing to make use of here.
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Actually, upon second look, worked this out -- the tests now use the
target
directive. Resolving!
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
With continuous I mean that it reallocates for every function, instead of once per
Context
. (Apart reallocations necessary resize the vec.) TheContext
holds allocations that are reused between compilations of different functions. It works fine with parallel compilation, as you can just allocate a singleContext
per thread.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Maybe use Cranelift to create the function?
bjorn3 submitted PR Review.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
/// A location for an argument or return value.
bjorn3 created PR Review Comment:
/// In a real register.
bjorn3 created PR Review Comment:
Legalizations are only supposed to run when there is no encoding for an instruction. The generic legalization group contains a lot of legalizations for instructions that can be implemented more efficiently using native instructions. For example 8 and 16 bit operators are legalized using
ireduce
/uextend
/sextend
. Is it possible to only run legalizations during lowering when there is no encoding?
bjorn3 created PR Review Comment:
/// Arguments only: on stack, at given offset from SP at entry.
bjorn3 created PR Review Comment:
/// (first and only) return value only: in memory pointed to by x8 on entry.
bjorn3 created PR Review Comment:
Should this be removed?
bjorn3 created PR Review Comment:
It is allowed to use them as long as it restores them before returning, right?
bjorn3 created PR Review Comment:
Please use doc comments.
bjorn3 created PR Review Comment:
Maybe check that the call conv is either
Fast
,Cold
,SystemV
orBaldrdashSystemV
and notWindowsFastCall
orProbestack
? The later two are not yet implemented.
bjorn3 created PR Review Comment:
_ => unimplemented!("load_stack({})", ty),
bjorn3 created PR Review Comment:
And check for call conv here too.
bjorn3 created PR Review Comment:
If arg is
Copy
you can do:for &arg in &self.sig.args {
bjorn3 created PR Review Comment:
Please take
u64
ori64
instead. This is platform dependent, so won't work nice for cross-compilation.
bjorn3 edited PR Review Comment.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Is there an assertion that
MOV
doesn't havex31
as register just in case?
bjorn3 created PR Review Comment:
let fp_off: i64 = -(self.stackslots_size as i64) - (8 * islot) - i64::from(ty_size);
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
#[derive(Clone, Copy, Debug)] #[repr(u8)] pub enum ShiftOp { LSL = 0b00, LSR = 0b01, ASR = 0b10, ROR = 0b11, } impl ShiftOp { /// Get the encoding of this shift op. pub fn bits(self) -> u8 { self as u8 } }
bjorn3 created PR Review Comment:
u32::try_from( self.frame_size .expect("frame size not computed before prologue generation" ).expect("overflow")
bjorn3 created PR Review Comment:
self.op
ShiftOp
isCopy
bjorn3 created PR Review Comment:
Can you move this down to just above the impl block for this?
bjorn3 created PR Review Comment:
This doc comment is attached to
SPOffset
not both.
bjorn3 created PR Review Comment:
If
offset
is zero, this returnsSome(SImm9::zero())
, right?
bjorn3 created PR Review Comment:
reg_plus_reg
?
bjorn3 created PR Review Comment:
This doesn't mention
op
.
bjorn3 created PR Review Comment:
#[derive(Clone, Copy, Debug)] #[repr(u8)] pub enum ExtendOp { UXTB = 0b000, UXTH = 0b001, UXTW = 0b010, UXTX = 0b011, SXTB = 0b100, SXTH = 0b101, SXTW = 0b110, SXTX = 0b111, } impl ExtendOp { /// Encoding of this op. pub fn bits(self) -> u8 { self as u8 } }
bjorn3 created PR Review Comment:
Maybe put a empty line between every pair of conditions inverting each other?
bjorn3 created PR Review Comment:
Can you add a doc comment describing what
PostIndexed
andPreIndexed
mean?
bjorn3 created PR Review Comment:
/// Get the offset as a 19-bit offset, or `None` if overflow.
bjorn3 created PR Review Comment:
let n = block_index_map[usize::try_from(*bix).unwrap()];
bjorn3 created PR Review Comment:
//// RRR ops.Typo?
bjorn3 created PR Review Comment:
let data = bits.to_le_bytes();This is little endian, right?
bjorn3 created PR Review Comment:
_ => panic!("unknown type {}", ty),
bjorn3 created PR Review Comment:
(u32::from(bits_31_21) << 21) | (u32::from(bits_15_10) << 10)
bjorn3 created PR Review Comment:
let bix = usize::try_from(bix).unwrap();The unwrap will never fire, as
bix
is au32
and Cranelift likely doesn't work on 16 bit machines.
bjorn3 created PR Review Comment:
Does
to_real_reg()
contain theis_real()
assertion?
bjorn3 created PR Review Comment:
/// signed saturating add SQAddScalar, /// unsigned saturating add UQAddScalar, /// signed saturating subtract SQSubScalar, /// unsigned saturating subtract UQSubScalar,
bjorn3 created PR Review Comment:
panic!("Bad ALUOp {:?} in RRR form!", alu_op);
bjorn3 created PR Review Comment:
/// Multiply add MAdd32, /// Multiply add MAdd64, /// Multiply sub MSub32, /// Multiply sub MSub64,
bjorn3 created PR Review Comment:
value
is shifted to the right not left to formbits
.
bjorn3 created PR Review Comment:
This is attached only to
FpuLoad32
.
bjorn3 created PR Review Comment:
Ret,
bjorn3 created PR Review Comment:
((self.N as u16) << 12) | (u16::from(self.R) << 6) | u16::from(self.S)
bjorn3 created PR Review Comment:
/// Round to integer.
bjorn3 created PR Review Comment:
/// Bit reverse RBit32, /// Bit reverse RBit64,
bjorn3 created PR Review Comment:
u32::from(m.to_real_reg().get_hw_encoding())I assume
get_hw_encoding
returns a integer type smaller thanu32
.
bjorn3 created PR Review Comment:
count_zero_half_words
?
bjorn3 created PR Review Comment:
Does AArch64 have native 128bit integer operations? Otherwise these should be legalized to 64bit integer operations and the
I128
andB128
types should have no register class assigned to them.
bjorn3 created PR Review Comment:
What does the
H
mean?
bjorn3 created PR Review Comment:
Is this the arm name for XOR?
bjorn3 created PR Review Comment:
EpiloguePlaceholder,
bjorn3 created PR Review Comment:
Maybe do something like
match inst { Inst::AluRRR { alu_op: _, rd, rn, rm } => { map_wr(d, rd); map(u, rn); map(u, rm); }and let the map functions take the register by mutable reference?
bjorn3 created PR Review Comment:
Who came up with this order? It is even worse than
eax
,ecx
,edx
,ebx
on x86.
bjorn3 created PR Review Comment:
Nop0,
bjorn3 created PR Review Comment:
BitOp::RBit32 | BitOp::Clz32 | BitOp::Cls32 => true, BitOp::RBit64 | BitOp::Clz64 | BitOp::Cls64 => false,
bjorn3 created PR Review Comment:
let scale = i64::from(scale);
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
&InstructionData::UnaryIeee32 { opcode: _, imm } => Some(u64::from(imm.bits())),
bjorn3 submitted PR Review.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
This is only implemented for
Lower
. Why not just passLower
everywhere?
bjorn3 created PR Review Comment:
What about
narrow_mode == NarrowValueMode::None && out_bits < 64
?
bjorn3 created PR Review Comment:
The domtree already knows this.
bjorn3 created PR Review Comment:
A function without entry block is not allowed, right?
bjorn3 created PR Review Comment:
(NarrowValueMode::ZeroExtend64, 64) | (NarrowValueMode::SignExtend64, 64) => in_reg,
bjorn3 created PR Review Comment:
(NarrowValueMode::ZeroExtend32, 32) | (NarrowValueMode::SignExtend32, 32) => {
bjorn3 created PR Review Comment:
You can use
bbs.iter().rev()
below instead to prevent moving everything around.
bjorn3 created PR Review Comment:
What does cf-inst mean?
bjorn3 created PR Review Comment:
for &block in &self.final_block_order {
bjorn3 created PR Review Comment:
Maybe inline this at the two use sites to prevent a possible allocation of a temporary vec?
bjorn3 created PR Review Comment:
What if nops are always say 4 bytes, but there needs to be a 3 byte offset to align to the required 4 bytes. That would mean
I::gen_nop
will always need to return a 0 byte nop, which means that the loop will never terminate.
bjorn3 created PR Review Comment:
Also I think 16 elements is more than likely will be used most of the cases.
bjorn3 created PR Review Comment:
for &dest in f.jump_tables[table].as_slice() { ret.push(dest);
sunfishcode submitted PR Review.
sunfishcode submitted PR Review.
sunfishcode created PR Review Comment:
Is the logic to do DCE the fly required for correctness of the algorithm, or is it just an optimization? If the latter, can you comment on the tradeoffs of doing an eager num_uses pass and DCE on the fly vs just doing a separate DCE pass afterwards?
sunfishcode created PR Review Comment:
Is this using reverse postorder (of a DFS)? Can you comment on the differences with
domtree.cfg_postorder()
?
sunfishcode created PR Review Comment:
Could you remove these few remaining
unused_imports
directives in a few files?
sunfishcode created PR Review Comment:
Minor nit: use a
//
-style comment here.
sunfishcode created PR Review Comment:
Several files have
allow(non_snake_case)
andallow(non_camel_case_types)
. Could you either remove these and change the names to follow the regular Rust conventions, or add comments explaining why exceptions to the rules are justified in these files?
sunfishcode created PR Review Comment:
There are a lot of
is32
parameters being passed around in a lot of code. Would it be worth adding a dedicated enum type to use in place of bool, to catch errors and make the code more self-documenting?
sunfishcode submitted PR Review.
sunfishcode created PR Review Comment:
The flags verifier can still check target-independent properties -- the
encinfo
is anOption
to allow it to run without this information, so I think we can still enable the verifier, just by passingNone
for theencinfo
argument when the mach backend is enabled.
cfallin submitted PR Review.
cfallin created PR Review Comment:
That seems like quite a bit of overhead, and potentially fragile and complex code to maintain (a new "probestack" ABI to implement on ARM64, a codegen step at startup, a way to lazily initialize the
PROBESTACK
global, etc.) for a theoretical concern that does not actually occur in practice. I'll defer to @sunfishcode 's judgment here, but my inclination is to leave it as-is.
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Ah, yes, that makes sense -- done!
cfallin submitted PR Review.
cfallin created PR Review Comment:
The new backend takes a different approach. We want to get away from the DSL-based pattern/legalization/recipe world and write the backends in a more explicit way. This should lead to more simplicity and maintainability (one of the main motivations of the new backend design). E.g., in the ARM64 backend, we handle different operand widths explicitly. It still makes sense to do some expansions with the existing code, such as for
heap_addr
orglobal_value
, which is wheresimple_legalize()
comes in.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Yeah, actually, probably better to remove the partial impl of
RetMem
until we actually need it. Thanks.
cfallin submitted PR Review.
cfallin created PR Review Comment:
@bnjbvr would be the person to answer this (this is his SpiderMonkey-integration code); Ben, comments?
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done; added check to toplevel
new()
instead.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Added check to toplevel
new()
.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Many legalizations are shared between backends. Also legalizations may enable optimizations that would otherwise have to be rewritten for each backend.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Will use
u32
, actually; we don't expect a stackframe to be larger than 4GB.
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Any particular reason (and if so, why not throughout the entire codebase for every
as
cast)?
cfallin submitted PR Review.
cfallin created PR Review Comment:
Just added to
emit.rs
, thanks.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
This prevents accidential truncation of integers.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Using
u32
in stack-frame layout code instead; as above, we don't expect >4GB frames.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Ah, yes, this was left over from an earlier version that used a different mode; removed redundant case.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Eh, perhaps it's just a style thing, but I don't like these casts that require trait imports and this
unwrap()
; it complexifies the code for no good reason, and makes it harder to audit the code for potentially panic'ing operations.In fact, the
bix as usize
does one better if we do ever try to build on a 16-bit host: it will statically fail to compile, becauseu32 as u16
is not a valid cast. I'd prefer that over anunwrap()
that fires at runtime.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Not making this change for same reason as above; would prefer a static compilation failure to a dynamic panic.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
u32 as u16
is actually valid. It truncates.
cfallin submitted PR Review.
cfallin created PR Review Comment:
No, this one is intentional; the RRRR ops are the 3-source, 1-dest ops.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Holding off on this for now, as above. I'm curious if there's an idiomatic-Rust reason for this -- is there some downside to using the
x as T
operator built into the language?
cfallin submitted PR Review.
cfallin created PR Review Comment:
Ambiguous subject, I suppose; the result is
bits
shifted to the left. Edited to clarify.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Yep, it's an ARM-ism. In general we tried to maintain consistent naming w.r.t. the instruction manual unless there was a good reason otherwise. Added a doc comment to clarify.
cfallin submitted PR Review.
cfallin created PR Review Comment:
High word result of multiply (e.g., bits 127-64 for the i64 case), either signed or unsigned variant. Added doc comment.
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Removed. I'm not sure if NEON has a plain 128-bit load or not, but we plan to handle vector ops after our initial merge and evaluation, so I'll defer the proper implementation of this to later.
cfallin edited PR Review Comment.
cfallin submitted PR Review.
cfallin created PR Review Comment:
I am glad we could provide some entertainment :-) It's actually intentional: the register allocator library (regalloc.rs) expects all allocatable registers in a contiguous index range. Because the ABI and codegen constraints poke random holes in this range, the mapping has to be a piecewise-linear thing. It's ugly, but it works...
cfallin submitted PR Review.
cfallin created PR Review Comment:
Ah! I had been under the impression that it was a type error to cast to narrower integers. I'll go back and audit these now.
sunfishcode submitted PR Review.
sunfishcode created PR Review Comment:
As it turns out, @alexcrichton's https://github.com/bytecodealliance/wasmtime/pull/1490 will eliminate the need for this call entirely, which avoids this question entirely.
Beyond that, if we want to reintroduce stack probes, we should add a
__rust_probestack
for AArch64 and any other platform we need it on here rather than having our own version in Cranelift.
cfallin submitted PR Review.
cfallin created PR Review Comment:
OK -- we can incorporate legalizations as required, then, into the new pipeline, I think. The issue with the original request that I see is that "when there is no encoding" is no longer meaningful: we are no longer taking the approach of directly mapping CLIF instructions to hardware encodings and editing in place otherwise. Pulling in some of the existing legalizations here is meant to be an intermediate state, in place until we transition fully to the new backend and then can rewrite the expansions as a more straightforward pass. @sunfishcode and I have talked about various ways of having a "pattern-matching" approach to this.
cfallin submitted PR Review.
cfallin created PR Review Comment:
(But in this particular case, we're casting from
u32
toi64
, so truncation should be an issue.)
cfallin submitted PR Review.
cfallin created PR Review Comment:
Yup, just realized this -- thanks!
cfallin submitted PR Review.
cfallin created PR Review Comment:
(Actually making this change now, as per realization above.)
cfallin submitted PR Review.
cfallin created PR Review Comment:
Updated now; thanks for the education on Rust casts!
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Unreachable because of surrounding conditional's
narrow_mode != NarrowValueMode::None
.
cfallin submitted PR Review.
cfallin created PR Review Comment:
The intention is to also eventually have other
Lower
impls. For one, there has been some conversation about verification efforts that plug in a different impl to get symbolic lowerings out (this is very handwavy now, but we want to leave the door open). For another, we might eventually want to drive the lowering logic with a different IR, and so we're trying to keep as much behind this API as possible (it's not perfect, e.g.Opcode
type, but this will help with that goal). Everything is monomorphized, so the only cost now should be a little extra type verbosity.
cfallin submitted PR Review.
cfallin created PR Review Comment:
We didn't want to rely on the domtree here, for the same reason as above (portability); though perhaps a followup PR could plumb the domtree info through the
LowerCtx
.
cfallin submitted PR Review.
cfallin created PR Review Comment:
It might trigger errors elsewhere, but I had wanted to provide a robust impl here; the code should still do the right thing (i.e., nothing) when presented with an empty function body.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Control-flow instruction; expanded.
cfallin submitted PR Review.
cfallin created PR Review Comment:
At first I had thought this wouldn't work, because we initialize the vcode BB map with
bbs
just below this; but it turns out that that code is invariant to the block order (we'll just get differentBlockIndex
es in the vcode, but this is arbitrary anyway). So, done!
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done by reworking as
visit_branch_targets()
(internal iteration, accepts a closure, monomorphized at each call site).
cfallin submitted PR Review.
cfallin created PR Review Comment:
The invariant that the machine backend must satisfy is that
gen_nop
returns an instruction with nonzero size, and must be smaller than the requested size, so by that invariant, this loop will always terminate. Here's the doc comment fromgen_nop()
for reference:/// Generate a NOP. The `preferred_size` parameter allows the caller to /// request a NOP of that size, or as close to it as possible. The machine /// backend may return a NOP whose binary encoding is smaller than the /// preferred size, but must not return a NOP that is larger. However, /// the instruction must have a nonzero size.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Done!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Removed! None of the names were actually unconventional; I think this was left over from an earlier prototype.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Nope, not required; the reasoning behind including it here is that it is intertwined with the operand-matching/merging behavior. For example, an
iadd
can incorporate aniconst
on an input; it then callsctx.merged(input)
which indicates that the use was incorporated directly, updating the refcounts (use counts). This could just as well be handled by a later inst-level DCE, but it seemed cheaper to avoid the lowering logic in the first place (Julian's measurement earlier today was ~4K host x86 instructions spent to lower one target instruction; that should improve but still). If you think we should factor these apart, I'm happy to do that, though!
cfallin submitted PR Review.
cfallin created PR Review Comment:
It's actually a BFS (
find_reachable_bbs()
further up in this file), but the order doesn't matter here, as we also compute RPO later in blockorder.rs :-)My thinking here was that I wanted to avoid creating a dependency on the
domtree
in order to remain as portable as possible, i.e., given later alternative implementations of the frontend driving the lowering logic, and that theO(blocks)
cost would hopefully not be too bad. (I reserve the right to change my mind later in the week when I start looking at profiling results in detail!) But I can plumb this through if you'd prefer.
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Good idea -- done!
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin submitted PR Review.
cfallin created PR Review Comment:
@sunfishcode, that sounds perfect. Would you like us to wait for #1490 to land, or would it be acceptable to land this with the temporary "empty function" stack probe in the meantime (if the timing works out that way)?
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
I would just leave it as it is. CL will fail at run time the first time someone implementing a new instruction format tries to run any test case, so it'll be obvious it needs to be fixed.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Leave it as is, I say; this will never be used in production.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Is that really an improvement? (1) It requires the reader to know (or look up)
to_be_bytes
; the original was quite clear. (2) In order for it to not be a performance loss, it relies onrustc
to inline away the call, the array construction and the array use, which is quite a big ask.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
As a general comment -- I have been profiling the new backend (Chris' stuff and the new regalloc) extensively with DHAT and plan to keep doing so. That is very effective at finding pointless or low-value repeated reallocation -- in the week 6-13 Apr I reduced the number of allocation calls by a factor of 2 to 3, and there is much more to come. Given that such allocation-avoiding restructurings potentially make code harder to understand, I'd prefer to restrict them to cases where the profile numbers show there's a clear benefit to be had, or that the change is very local and so doesn't impact readability -- eg, replacing a
Vec
with aSmallVec
.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Just because I like these top level debug markers to be easy to spot in the thousands of lines of debug output. No other reason.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
In cg_clif I write the clif ir before and after compilation to a file in debug mode. This makes debugging miscompilations much easier. It is unfortunately responsible for a nontrivial amount of compilation time, which means that I have to disable it in release mode. I would prefer if the new backend counter part to after compilation clif ir (vcode) could be printed as fast as possible.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
The code would fallthrough to the next function when there are no blocks, which shouldn't be allowed.
bjorn3 submitted PR Review.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
This applies only to the
use crate::binemit::CodeOffset;
.
bjorn3 created PR Review Comment:
You could use the same
#[repr(u8)
+Eq = 0
+self as u8
here asShiftOp
andExtendOp
.
bjorn3 created PR Review Comment:
f.debug_struct("MachBackend") .field("name", self.backend.name()) .field("triple", self.backend.triple()) .field("flags", self.backend.flags()) .finish()
bjorn3 created PR Review Comment:
This should be removed if you accept the suggestion above.
bjorn3 created PR Review Comment:
let encinfo = if isa.is_none() || isa.unwrap().get_mach_backend().is_some() {
bjorn3 created PR Review Comment:
#[derive(Clone, Copy, Debug, Eq, PartialEq)] pub enum Size { Size32, Size64, }
Eq
can be used as replacement foris32
andis64
if you think that isn't worse.If you have a better suggestion for the name of the variants, feel free to use that instead.
bjorn3 created PR Review Comment:
/// If `ireg` denotes an I64-classed reg, make a best-effort attempt to show
bjorn3 created PR Review Comment:
// This is pretty lame, but whatever ..
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
Please remove this.
sunfishcode submitted PR Review.
sunfishcode created PR Review Comment:
as
casts silently pick between widening, narrowing, and value-changing casts, while::from
casts get a type error on any potentially value-changing cast, so it's nice to use::from
when applicable.
sunfishcode submitted PR Review.
sunfishcode created PR Review Comment:
I think it's fine to land it with this temporary workaround.
sunfishcode submitted PR Review.
sunfishcode submitted PR Review.
sunfishcode created PR Review Comment:
Makes sense!
cfallin submitted PR Review.
cfallin created PR Review Comment:
That's a reasonable goal, but I suspect writing to the file will vastly outweigh allocation speeds here. In any case, the immediate costs are large -- a refactor, and a painful rebase of other work -- and the benefit is unclear and not driven by actual data. So let's leave this be for now, I think.
cfallin edited PR Review Comment.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Resolving this for now, as Ben is on top of ABI issues and has done further work on caller/callee-saves that will come in another PR.
cfallin submitted PR Review.
cfallin created PR Review Comment:
It seems that rustc actually does this; I just tested (with
-O
). Happy to do a little work to be idiomatic, as long as the complexity cost is not too high!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Yes, perhaps, if the verifier allows this; but it seems that the old backend would have the same problem, no? (There would be no instruction to have an encoding, thus no return would be emitted.) It seems this would be good to follow up in a separate PR if it is indeed allowed, by updating the verifier instead. (As a general code-style thing I don't want to distribute implicit assumptions about data structure state throughout the code; if the entry is an
Option<Block>
then let's write code everywhere that supports that, and if it should always be aSome
, then let's just make it aBlock
instead.)
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
bjorn3 submitted PR Review.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
This can take &Function I think.
bjorn3 created PR Review Comment:
Same here
bjorn3 created PR Review Comment:
Same here
cfallin updated PR #1494 from arm64-merge
to master
:
This PR contributes a new backend framework built around
MachInst
s (machine instructions) contained inVCode
(virtual-registerized code) containers. Furthermore, it contains a working ARM64 backend built in this new framework. The new framework is properly connected into the main Cranelift compilation pipeline, and the appropriate integrations have been made so thatwasmtime
can run basic Wasm tests correctly on ARM64 hardware.This patch sequence contains code written by Julian Seward <jseward@acm.org> and Benjamin Bouvier <public@benj.me>, originally developed on a side-branch before rebasing and condensing into this patch series. See the
arm64
branch athttps://github.com/cfallin/wasmtime
for original development history.This patch also contains code written by Joey Gouly <joey.gouly@arm.com> and contributed to the above branch. These contributions are "Copyright (c) 2020, Arm Limited."
Closes #1174 and #1075.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Yup, this was left over from an earlier version that did mutate the function; much nicer now to take an immutable input. Thanks!
bjorn3 submitted PR Review.
cfallin merged PR #1494.
peterhuene submitted PR Review.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Unfortunately I just caught this. I believe this accidentally moved the
#[cfg(feature = "unwind")]
.I have a PR that this merged conflicted with, so I'll correct it there.
peterhuene edited PR Review Comment.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Yikes, that was likely a rebase failure on my part. Sorry about that. (I suppose I got lucky as
unwind
is enabled by default.) Thanks for fixing!
Last updated: Jan 24 2025 at 00:11 UTC