rfcs / PR #28 RFC: Baseline compilation in Wasmtime · rfc-notifications

Stream: rfc-notifications

Topic: rfcs / PR #28 RFC: Baseline compilation in Wasmtime

RFC notifications bot (Aug 03 2022 at 14:06):

saulecabrera opened PR #28 from wasmtime-baseline-compilation to main:

Rendered

This RFC proposes the addition of a new WebAssembly (Wasm) compiler to Wasmtime:
a single-pass or “baseline” compiler. A baseline compiler works to improve
overall compilation performance, yielding faster startup times, at the cost of
less optimized code.

This RFC describes:

The motivation for a baseline compiler

A high-level overview and description of the structure of the baseline
compiler

A high-level overview of its integration with Wasmtime

cc @cfallin @fitzgen @alexcrichton @bnjbvr

RFC notifications bot (Aug 03 2022 at 15:19):

lqd submitted PR review.

RFC notifications bot (Aug 03 2022 at 15:19):

lqd created PR review comment:

Tiny linebreak suggestion:

let add = cranelift_asm::x64::AluRmiR { op: AluRmiR::Add, … };
let mut buf = MachBuffer::new();
add.emit(&mut buf, …);

RFC notifications bot (Aug 03 2022 at 15:19):

lqd created PR review comment:

wasmtime run --compiler=<baseline|cranelift> file.wasm

RFC notifications bot (Aug 03 2022 at 15:19):

lqd submitted PR review.

RFC notifications bot (Aug 03 2022 at 16:54):

fitzgen submitted PR review.

RFC notifications bot (Aug 03 2022 at 16:54):

fitzgen created PR review comment:

I'd like to echo @alexcrichton's suggestion about building out (at least a realistic kernel of) support for a second architecture during the initial implementation. Because Winch is, by design, skipping an architecture-agnostic mid-end and IR, it has higher risk of baking in architecture-specific assumptions into the core design. Building support for a second architecture should mitigate and nullify this risk.

RFC notifications bot (Aug 03 2022 at 16:54):

fitzgen submitted PR review.

RFC notifications bot (Aug 03 2022 at 16:54):

fitzgen created PR review comment:

I'm not convinced that making this a compile time feature will be overly complex, and even if it turns out that it is, I would strongly prefer that we start by attempting to do it that way, and only fall back to runtime-only if we determine it isn't practical.

RFC notifications bot (Aug 03 2022 at 16:54):

fitzgen created PR review comment:

// request a general purpose register;
// spill if none available.
let imm = self.value_stack.pop();
let rd = self.gpr();
masm.add(rd, imm);

RFC notifications bot (Aug 03 2022 at 17:13):

cfallin submitted PR review.

RFC notifications bot (Aug 03 2022 at 17:13):

cfallin created PR review comment:

I see the appeal of compiled-out-by-default for minimizing impact, but I'll offer a few reasons from experience that might push in the other direction:

Dealing with features in general is a speedbump for users. When developing the new x64 backend we had it cfg'd off behind a feature and it was a constant fight to get tools (LSP etc) to use the right flags, and meant that we couldn't have just one cargo test invocation to test the thing; we had to have a separate one with the feature enabled.

Differential testing across the two configurations becomes impossible. For the x64 backend, I eventually had to refactor to use a runtime setting and allow both configs to co-exist so that I could build a differential fuzz target. In this case I guess the feature would be strictly additive (include another compiler backend) but I could imagine tests at the Wasmtime level might still run into issues.

In general, features make for more confusing code; see e.g. @alexcrichton 's push for us to use very coarse-grained features in memfd, and remove the cfg-logic from the inner implementations; and also the difficulties arising from the feature-controlled uffd backend.

On the other side, I guess the main concern is compile time? Or are you worried about e.g. users prematurely opting into using it and that opt-in being easier if it's a programmatic configuration vs. a feature flag in the dependency?

RFC notifications bot (Aug 03 2022 at 18:27):

fitzgen submitted PR review.

RFC notifications bot (Aug 03 2022 at 18:27):

fitzgen created PR review comment:

My concerns are (1) compile time and (2) factoring code such that it isn't entangled with the rest of Cranelift/Wasmtime.

Differential testing should definitely be possible. I'm _not_ saying that we should have a compile time config to either use Winch xor cranelift (like the experimental_x64 cargo feature did). I'm saying we should have a compile time config which when enabled adds support for and exposes the Winch compilation strategy variant. So both Winch and Cranelift would be available for differential testing or what have you when the cargo feature was enabled.

I don't think this is an onerous imposition on development, and we can point to the ongoing component model work as something that is using this compile-time-cargo-feature-to-make-available-a-runtime-option approach successfully.

RFC notifications bot (Aug 03 2022 at 18:38):

cfallin created PR review comment:

Fair enough; I think much of my experience is with the "xor" variant of feature (so a bunch of #[cfg(not(feature = ...))]) and that may have given me a darker view of how this could go.

I do like the idea of ensuring proper layering/separation by having a build with the feature off, as well.

So I guess this is fine; as you say we can always abandon the approach and go runtime-only if needed later.

RFC notifications bot (Aug 03 2022 at 18:38):

cfallin submitted PR review.

RFC notifications bot (Aug 03 2022 at 19:22):

saulecabrera created PR review comment:

I think I didn't do a good job expressing myself during the initial draft of the doc when I introduced the idea of/open questions around the compile time feature, which probably hinted the "xor" approach. I'm aligned with this though, my main goal is to allow the usage of both compilers, for ease of testing, but also I can definitely see cases in which an embedder might want to use Winch for fast startup and then compile with Cranelift in the background for subsequent requests; and according to the discussion here, such goal is still achievable with a compile time feature.

RFC notifications bot (Aug 03 2022 at 19:22):

saulecabrera submitted PR review.

RFC notifications bot (Aug 03 2022 at 19:38):

saulecabrera updated PR #28 from wasmtime-baseline-compilation to main.

RFC notifications bot (Aug 03 2022 at 19:40):

saulecabrera submitted PR review.

RFC notifications bot (Aug 03 2022 at 19:40):

saulecabrera created PR review comment:

I've bundled all the typo corrections in https://github.com/bytecodealliance/rfcs/pull/28/commits/d8d934ec8a16c6d0ae823c9d5ba499f472bfab99, thanks!

RFC notifications bot (Aug 03 2022 at 19:42):

saulecabrera submitted PR review.

RFC notifications bot (Aug 03 2022 at 19:42):

saulecabrera created PR review comment:

I've updated the doc to specify that changes will be introduced behind a compile-time feature.

RFC notifications bot (Aug 04 2022 at 17:11):

bjorn3 submitted PR review.

RFC notifications bot (Aug 04 2022 at 17:11):

bjorn3 created PR review comment:

If MacroAssembler isn't used by Cranelift, I think it should not be put in cranelift_asm, but either in wince or a separate crate.

RFC notifications bot (Aug 04 2022 at 17:11):

bjorn3 created PR review comment:

This is not valid rust.

RFC notifications bot (Aug 04 2022 at 17:11):

bjorn3 submitted PR review.

RFC notifications bot (Aug 04 2022 at 17:13):

bjorn3 edited PR review comment.

RFC notifications bot (Aug 04 2022 at 17:13):

bjorn3 submitted PR review.

RFC notifications bot (Aug 04 2022 at 17:13):

bjorn3 created PR review comment:

Probably best left for the future, but it would be nice to support tiered compilation to compile with winch by default and with cranelift when a function turns out to be hot.

RFC notifications bot (Aug 04 2022 at 17:14):

bjorn3 submitted PR review.

RFC notifications bot (Aug 04 2022 at 17:14):

bjorn3 created PR review comment:

I personally prefer Winch over Baseline for consistency. Or maybe BaselineWinch.

RFC notifications bot (Aug 04 2022 at 17:46):

saulecabrera updated PR #28 from wasmtime-baseline-compilation to main.

RFC notifications bot (Aug 05 2022 at 12:08):

saulecabrera created PR review comment:

It feels natural to have the MacroAssembler at this level in my opinion, given that cranelift_asm's objective is to provide the functionality and building blocks for code emission from different scenarios. That said, I don't think that this is a one-way door decision, if it's proven to cause any issues, this is easily reversible.

RFC notifications bot (Aug 05 2022 at 12:08):

saulecabrera submitted PR review.

RFC notifications bot (Aug 05 2022 at 12:15):

bjorn3 submitted PR review.

RFC notifications bot (Aug 05 2022 at 12:15):

bjorn3 created PR review comment:

My main worry is about compilation time. Adding MacroAssembler to cranelift_asm would require compiling more code even though Cranelift doesn't need it.

RFC notifications bot (Aug 05 2022 at 12:40):

saulecabrera updated PR #28 from wasmtime-baseline-compilation to main.

RFC notifications bot (Aug 05 2022 at 12:42):

saulecabrera created PR review comment:

Yes, tiering and its heuristics are intentionally out of scope for this RFC.

RFC notifications bot (Aug 05 2022 at 12:42):

saulecabrera submitted PR review.

RFC notifications bot (Aug 05 2022 at 12:59):

saulecabrera updated PR #28 from wasmtime-baseline-compilation to main.

RFC notifications bot (Aug 05 2022 at 18:53):

saulecabrera submitted PR review.

RFC notifications bot (Aug 05 2022 at 18:53):

saulecabrera created PR review comment:

That's a fair concern; on the other hand, I'm confident that we can control exposing the MacroAssembler with a compile time feature, making it opt-in to consumers.

RFC notifications bot (Aug 05 2022 at 19:03):

saulecabrera edited PR review comment.

RFC notifications bot (Aug 08 2022 at 14:20):

bnjbvr submitted PR review.

RFC notifications bot (Aug 08 2022 at 14:20):

bnjbvr created PR review comment:

As a source of inspiration for design (edit: now I see the design principles are very close to what's in this blog post), as well as numbers showing the speed of a baseline compiler, I recommend this great post by Andy Wingo: https://wingolog.org/archives/2020/03/25/firefoxs-low-latency-webassembly-compiler

RFC notifications bot (Aug 08 2022 at 14:20):

bnjbvr created PR review comment:

buffering the `MachInst` structs in memory, which we do not wish to do.

RFC notifications bot (Aug 08 2022 at 14:20):

bnjbvr submitted PR review.

RFC notifications bot (Aug 08 2022 at 14:20):

bnjbvr created PR review comment:

This is really mostly a nitpick from myself, but I think we're confusing two different levels of abstraction here. Provided the examples below, namely the higher-level API call, I think this is really an Assembler library. From my understanding of Spidermonkey's code base, a MacroAssembler tries to hide some extra complexity by only having "operands" (and details about memory addresses/registers are hidden by the abstraction). Or imbuing some other specific semantics of operators that don't exist at the assembler layer; think for instance about having some wasm_div method, that would trap if the divisor is zero; there's no assembly instructions to do so, but a macro-assembler would combine different assembler method calls to generate the code that implements this behavior. As a matter of fact, the macro-assembler is one layer of abstraction above an assembler, and what's described here rather looks like a plain assembler to me, unless I'm mistaken.

RFC notifications bot (Aug 08 2022 at 14:20):

bnjbvr created PR review comment:

Will it spill _all_ the registers, or only the conflicting ones / enough that it can continue doing its work? Spidermonkey's wasm baseline compiler would spill all, if I remember correctly, as it's much simpler to reason about.

RFC notifications bot (Aug 08 2022 at 14:20):

bnjbvr created PR review comment:

Wasmtime, similar to what is present in Wasm engines in Web browsers. This RFC

RFC notifications bot (Aug 08 2022 at 14:20):

bnjbvr created PR review comment:

abstraction, per ISA, along with their availability. It will also hold

RFC notifications bot (Aug 08 2022 at 14:20):

bnjbvr created PR review comment:

Should this be winch instead of baseline? (alternatively, should we change the cranelift option to optimizing?)

RFC notifications bot (Aug 08 2022 at 14:20):

bnjbvr created PR review comment:

let imm = self.value_stack.pop();
// request a general purpose register;
// spill if none available
let rd = self.gpr();

RFC notifications bot (Aug 08 2022 at 14:20):

bnjbvr created PR review comment:

There are many variants of linear scan register allocation, and I'm afraid that referring to a textbook implementation may be confusing: here it's really allocating registers on the fly, and spilling aggressively whenever none is available, rather than doing any kind of textbook analysis to figure out liveness intervals and all of that.

RFC notifications bot (Aug 08 2022 at 16:20):

saulecabrera submitted PR review.

RFC notifications bot (Aug 08 2022 at 16:20):

saulecabrera created PR review comment:

Indeed a great post! I've used it as a source of inspiration and as a guide to navigate SpiderMonkey's code.

RFC notifications bot (Aug 08 2022 at 18:28):

cfallin created PR review comment:

Agree, this reference should probably be removed @saulecabrera ; what we describe here isn't true linear-scan but something more like "single-pass register allocation" (proper linear scan requires lookahead to know which register is used furthest in the future, to choose which to evict).

RFC notifications bot (Aug 08 2022 at 18:28):

cfallin submitted PR review.

RFC notifications bot (Aug 08 2022 at 19:39):

saulecabrera updated PR #28 from wasmtime-baseline-compilation to main.

RFC notifications bot (Aug 08 2022 at 19:40):

saulecabrera submitted PR review.

RFC notifications bot (Aug 08 2022 at 19:40):

saulecabrera created PR review comment:

Thanks for flagging this, I should've updated it in https://github.com/bytecodealliance/rfcs/pull/28/commits/141a3c6e3df27b1a9dc2d70580e021ab2ec58d00. For consistency I'd like to keep this as winch

RFC notifications bot (Aug 08 2022 at 19:42):

saulecabrera created PR review comment:

Yeah, the current intention is to spill all the registers.

RFC notifications bot (Aug 08 2022 at 19:42):

saulecabrera submitted PR review.

RFC notifications bot (Aug 08 2022 at 19:52):

saulecabrera submitted PR review.

RFC notifications bot (Aug 08 2022 at 19:52):

saulecabrera created PR review comment:

Agreed, I've removed the reference. For my own knowledge: are there any papers that describe each of the linear scan variants? Or are variations purely a result of implementation differences?

RFC notifications bot (Aug 08 2022 at 19:55):

cfallin submitted PR review.

RFC notifications bot (Aug 08 2022 at 19:55):

cfallin created PR review comment:

I'm not aware of any that describe the SpiderMonkey-style single pass allocator, but I haven't looked too hard either; at the level we care about it tends to be more instructive to look at other implementations and academics tend not to study the fiddly details that matter here. But if I find one I'll let you know (and vice-versa) :-)

RFC notifications bot (Aug 09 2022 at 00:24):

saulecabrera submitted PR review.

RFC notifications bot (Aug 09 2022 at 00:24):

saulecabrera created PR review comment:

From my understanding of Spidermonkey's code base, a MacroAssembler tries to hide some extra complexity by only having "operands" (and details about memory addresses/registers are hidden by the abstraction).

My understanding is a bit different here. As far as I can tell, the MacroAssembler aside from operating on top of the concept of operands, also operates on top of other abstractions like Register, Imm, and Address; still encapsulating most of the architecture-specific details.

As far as encapsulating complexity, my mental model doesn't diverge from what you've outlined: emitting a set of assembly assembly instructions representing a given wasm instruction, with the difference that we're not strictly relying on/introducing the presence of an Assembler for lowering; but instead potentially relying on helper methods.

From a strict point of view, at least taking in to account SpiderMonkey's implementation, maybe MacroAssembler is not the right term, but eventually it could grow to be the right term. For the initial implementation introducing an Assembler abstraction seems a bit premature IMO. That said, I wouldn't be opposed to naming this abstraction Assembler instead.

@cfallin do you have any further thoughts that you'd like to share according to your experience with SpiderMonkey?

RFC notifications bot (Aug 09 2022 at 00:48):

saulecabrera edited PR review comment.

RFC notifications bot (Aug 09 2022 at 00:48):

saulecabrera edited PR review comment.

RFC notifications bot (Aug 09 2022 at 00:51):

saulecabrera edited PR review comment.

RFC notifications bot (Aug 09 2022 at 04:01):

cfallin submitted PR review.

RFC notifications bot (Aug 09 2022 at 04:01):

cfallin created PR review comment:

Yes, I agree generally with Ben here that what we want initially is not-so-macro; actually I think as a first approximation, one method call per MachInst is about right. At that level, we'll need different wasm-opcode-to-assembler-API code for each architecture. Details like addressing modes, and two-register (x86) vs three-register (aarch64) form, are best dealt with explicitly at this level, rather than being papered over; for example an alternative approach to the 2/3-register-form difference would be to emit a mov/add sequence for every add and emulate 3-reg form on x86, but that's a significant pessimization.

But I also think there's a lot of room to evolve a general set of abstractions on top of this and share implementations where necessary, especially for the more complex lowerings like (for example) reference-type read/write barrier fastpaths, and other ops built from primitives. We could implement something like a CommonAssembler trait that provides load/store/move/add/sub/compare/branch/... and then write another layer that is generic over that.

I think that starting point and evolutionary path is the important point and the names aren't so important; so Assembler rather than MacroAssembler is fine, if that's the way others prefer!

RFC notifications bot (Aug 09 2022 at 11:22):

saulecabrera updated PR #28 from wasmtime-baseline-compilation to main.

RFC notifications bot (Aug 09 2022 at 11:53):

saulecabrera edited PR review comment.

RFC notifications bot (Sep 02 2022 at 14:45):

saulecabrera updated PR #28 from wasmtime-baseline-compilation to main.

Last updated: Apr 15 2025 at 00:13 UTC