peterhuene opened PR #2791 from compile-command
to main
:
This PR adds a
compile
command to the Wasmtime CLI.The command can be used to Ahead-Of-Time (AOT) compile WebAssembly modules.
With the
all-arch
feature enabled, AOT compilation can be performed for
non-native architectures (i.e. cross-compilation).The
Module::compile
method has been added to perform AOT compilation.A few of the CLI flags relating to "on by default" Wasm features have been
changed to be "--disable-XYZ" flags.A simple example of using the
wasmtime compile
command:$ wasmtime compile input.wasm $ wasmtime input.cwasm
peterhuene updated PR #2791 from compile-command
to main
.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
*entries
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
const COMPILED_MODULE_HEADER: &[u8] = b"\0wasmtimeaot";
peterhuene updated PR #2791 from compile-command
to main
.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Are you sure I didn't mean uncastrated male horse? :scream:
Good eye! I selected the wrong word in the spell checker drop-down menu it seems.
peterhuene updated PR #2791 from compile-command
to main
.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
I think that this can be done with
cranelift_other_flag(flag, "true")
, right? (not sure if we needed an extra method for this)
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
FWIW this method can probably be removed since we can just hash the return value of
enabled_isa_flags
I think?
alexcrichton created PR Review Comment:
I've seen this in a few places, could this perhaps be refactored to a common function in
src/lib.rs
?
alexcrichton created PR Review Comment:
We may actually want to avoid this to avoid duplicating the logic below, and instead we could initialize
ret
likewasm_backtrace_details
is initialized perhaps?
alexcrichton created PR Review Comment:
Instead of being a separate constructor could this be a configuration method?
We could document
Config::new
as matching whatever the native machine is and then a.target(..)
method which resets that internally?
alexcrichton created PR Review Comment:
FWIW I was just thinking about this recently and how we might have to do this. It seems kinda bad that we're guaranteed to break anyone using these flags as soon as things stabilize, and it feels kinda odd having a mishmash of enable/disable.
To help reduce the clutter in the CLI options as well as to make this a bit more future proof, what do you think of something like:
$ wasmtime --wasm-features=+simd $ wasmtime --wasm-features=-bulk-memory $ wasmtime --wasm-features=+multi-value,-bulk-memory,+threads $ wasmtime --wasm-features=+all
(similar to LLVM features and
-Ctarget-feature
in Rust)This also seems fine for me to file a follow-up too
alexcrichton created PR Review Comment:
This sort of makes me think that we should lean towards
-C
flags which affect codegen. In any case though this might be good to lift intoCommonOptions
perhaps?
alexcrichton created PR Review Comment:
Could this method perhaps be written as
Module::new(engine, bytes)?.serialize_into(output)
?I'd ideally prefer to avoid having this method since it seems like it should be a trivial combination of a constructor plus a
serialize
invocation, and splitting it up shouldn't be all that much less ergonomic. (I'd imagine it's fine to add the compiled module header to serialized modules too and we could even support them inModule::new
to automaticallydeserialize
as well)
alexcrichton created PR Review Comment:
I think it'd be best if we didn't have to list out flags like this. I find it pretty unwieldy for the CLI since
wasmtime --help
becomes super long in not always so interesting ways.I thought that this could be done with
--cranelift-extra-flag has_znver1=true
, but that's naturally a whole lot more wordy than--znver1
. Could we perhaps take some liberties and make this a bit more compact though? Similar to what I'm thinking below with wasm features we could do something like:
- Backends have the ability to return all possible flags (removing the need to keep the backend in sync with this file)
- We have a general
--codegen-features=...
or something similar to Rust's-C
flag where we do-Ccranelift=...
- The CLI would parse everything received and then match that against what the backend says is available, returning errors for anything enabled that the backend doesn't support
- We could implement something like
--codegen-features=help
to print out a helpful listing of all codegen features (similar to-Ctarget-feature=help
)I think that'd help reduce the need to keep this file in sync with the cranelift backends and also clean up the CLI UI perhaps?
alexcrichton created PR Review Comment:
FWIW using
wat::parse_file
was done before because it automatically gives better error messages if the input file is*.wat
. We can give rustc-style errors with filenames/line numbers and snippets outlined.
alexcrichton created PR Review Comment:
FWIW for this to be a bit more robust I think we'd want to move this outside of the bincode-encoded payload. Bincode may change formats (perhaps?) or we could change the structure here which means that if you're mismatching versions you're unlikely to get the error about different versions, that's only if the binary formats happen to align well.
alexcrichton created PR Review Comment:
With these various
check_*
methods we can probably remove theHash
implementation forCompiler
since IIRC it's only really used for this?
alexcrichton created PR Review Comment:
I always get this confused with the
isa_flags
below. Do you think we could use different names for these flags? IIRC there's like a two-level table of flags for cranelift ISAs, the "this applies to all targets" table and the "this applies to only one target" table. I think this is checking the target-independent table, right? And thecheck_isa_flags
below is checking the target-specific table?I basically found it a bit confusing to have this and
check_isa_flags
below.
alexcrichton created PR Review Comment:
Do you think it's worth ignoring this? Out of the codegen options this is definitely one where it doesn't actually affect the output in a super meaningful way. An unoptimized
Engine
could easily take an optimized precompiled wasm module
alexcrichton created PR Review Comment:
I wondering, instead of hashing all flags could we alter cranelift to return an iterator over enabled configuration settings and their values? That way we could store the full map here and have better error messages perhaps
alexcrichton created PR Review Comment:
Similar to above using a destructuring statement to exhaustively list all fields I think might work well here too?
alexcrichton created PR Review Comment:
Could this use named fields along the lines of:
let Tunables { static_memory_bound, /* ... */ } = self.tunables; // .. check all the fields
so when we add a field to
Tunables
we're sure to add a case here too?
alexcrichton created PR Review Comment:
(ideally the same could be done for
opt_level
too)
alexcrichton created PR Review Comment:
Also I've personally never been really that sold on the
impl Write
output for this sort of use case. It's hard for me to imagine where the output is so large it must be streamed rather than being held resident in memory.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Unfortunately not;
Configurable::set
andConfigurable::enable
have different functionality relating to presets: the former will error for a preset whereas the latter applies it.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
That sounds good :+1: I'll make that change.
peterhuene created PR Review Comment:
This is still used for the compiler's hash implementation to generate the unique-for-this-system hash used by the code cache.
Given that implementation is based on
Hash
, I felt it easier to just keep as-is than implement something else that could do the less-specific checks that deserializing the AOT artifact now does.
peterhuene submitted PR Review.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
:+1: I'll make this change to keep the two in sync.
peterhuene created PR Review Comment:
I'll extract this out.
peterhuene submitted PR Review.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
That makes sense to me. I'll move this out and validate it from there.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Agreed, there's no reason this can't be done. The intention behind this was to keep the previous requirement of having the common flags match 1:1, but loosen the requirement behind the ISA flags as they are (currently) just CPU features and having the feature present on the host running the code (e.g. "I have AVX") while not targeted by AOT compilation (i.e. "I don't use AVX") should be allowed.
I can remove this check and allow the optimization level to differ (although I think it's a bit weird to say "I want to run this module as 'none' when it was already compiled optimized").
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Agreed, it is confusing.
flags_hash
here is the hash of the common/shared Cranelift flags and theisa_flags
are those enabled by the target ISA (currently these are 1:1 with CPU features).I can see about a better name.
peterhuene created PR Review Comment:
Excellent suggestion, I'll add that and we can then remove the assert on hashes as that was the intention for at least catching this case at runtime.
peterhuene submitted PR Review.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Definitely will add :+1:
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'll see what I can do to improve this.
peterhuene created PR Review Comment:
It's still used for the code cache, which remains
Hash
-based.
peterhuene submitted PR Review.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Do you know how it does that as I'm looking at the implementation of
_parse_file
inwat
and it seems to just read the file to aVec<u8>
and callparse_bytes
, effectively what we're doing here?
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I like that idea.
I propose that we keep the
--enable-XXX
as deprecated, undocumented flags for now for backcompat and to clean up--help
, but perhaps break the--enable-XXX false
options that only exist for the on-by-default ones by turning them back to bools; those users we'd steer to--wasm-features=-XXX
immediately.Re the name: might
wasm-features
be slightly redundant given this is _wasm_time? Perhaps justfeatures
?
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
There's one big reason we _don't_ call
Module::new
fromModule::compile
: this explicitly does not want to callCompiledModule::from_artifacts_list
as that will register the JIT'd code for the platform.The compilation might be for a target completely foreign to the host, so creating a
Module
, even a short-lived one that gets immediately serialized and dropped, doesn't make much sense.In fact, given that
Config
can now have a foreign target, I propose thatModule
's constructors should be verifying that the given config's target matches the host before even attempting JIT compilation. I'll add that.Re:
impl Write
I don't see much point in buffering unbounded data entirely in memory to immediately write it to disk and discard it like we'd do for the compile command. I think, from an API-design perspective, thatimpl Write
offers maximum flexibility for users of the API to do as they see fit: they can always just pass aVec<u8>
and be done.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Lifting it into
CommonOptions
meanswasmtime run --interuptable foo.cwasm
would be possible, which to me doesn't make nearly as much sense as the--wasm-timeout
option specific to the run command that inherently enables interruption in a useful way.I think
wasmtime compile --interruptable foo.wasm
and thenwasmtime run --wasm-timeout 60s foo.cwasm
would make sense to users (the latter errors if--interruptable
wasn't given to the compile command).That said, though, I could be convinced that
wasmtime run --interruptable foo.cwasm
is a way to run a AOT module that was compiled with interruption support without specifying a timeout, similar to--wasm-timeout <some-reaaaaaally-long-timeout>
.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Oh whoops, I missed the
set_path
call in the error handling. I'll change this back.
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
As I'd like to avoid opening the file twice (once to check if the AOT header is present before handing it off to
wat::parse_file
which opens the file in turn), do you think we could add awat::Error::set_path
method that sets the path if the underlying error kind isWast
?
peterhuene edited PR Review Comment.
peterhuene edited PR Review Comment.
peterhuene edited PR Review Comment.
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I like the idea of a
--codegen-features
option. This will require a bit of refactoring in the generated code for Cranelift's settings as the metadata about the settings isn't suitable for describing the settings to users this way.I'll see what I can make happen.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I've opened https://github.com/bytecodealliance/wasm-tools/pull/252 that would allow us to set the path in the error without having to resort to opening the file twice.
peterhuene updated PR #2791 from compile-command
to main
.
peterhuene updated PR #2791 from compile-command
to main
.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I've added a
--wasm-features
option with the latest commit.
peterhuene updated PR #2791 from compile-command
to main
.
alexcrichton created PR Review Comment:
:+1:
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Perhaps we could just outright remove this method now? With
Config::target
it doesn't seem too useful any more
alexcrichton created PR Review Comment:
Could this method (the one below that I can't actually comment on) be renamed to
cranelift_flag_set
to mirror thecranelift_flag_enable
?
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Yeah that was kind of my thinking where you could evaluate the overhead of interruptability with
wasmtime run --interruptable
, but I don't think it's super important so seems fine to stay here for now.
alexcrichton created PR Review Comment:
Nah seems fine to just leave this until it trips someone up.
alexcrichton created PR Review Comment:
To bikeshed a bit, could this perhaps be called
cranelift_enable
and the above method is calledcranelift_set
? (I figure that if cranelift wants two different operations we could probably mostly just try to mirror them here well enough
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Bah ok, well in that case we should probably lean into this naming in all the various places, either
set
orenable
I think to be consistent about what maps to what
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Ah ok seems fine to leave for now in that case. I think this means, though, that we're ripe to refactor that part now too and use the same serialization/deserialization system at some point, but that's fine to happen later.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Ah I see, those are good points. I think though I'd ideally still like to push on a small set of constructors for
Module
instead of a standalonecompile
function. Could we perhaps defer the registration of JIT'd code for the platform to instantiation? I'm imagining we could have aMutex
of some kind which mutates the state ofModule
the first time it's instantiated or something like that.Personally I think it'd be a bummer if, when using cross-compilation, you're forced to never touch
Module::new
and friends and you instead have to useModule::compile
, but there's no real fundamental reason to do so per se.Another possible alternative is that if the module looks cross-compiled we just never do the registration bits (and somehow we should error-out in
Instantiate
if the target actually mismatches the current host...), but I think I'd prefer to have the defer for now to avoid registering jit code on hosts where you don't actually do any instantiation.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'm still inclined to believe that constructing a
Module
implies it's usable (i.e.instantiatable) for the current host. I think the constructors should error if 1) the module is AOT-compiled for a different target or 2) the associatedConfig
has a target other than host and thus JIT doesn't make sense.I'm also inclined to believe that it's not desirable to produce a
Module
for AOT compilation (even internally) as the point of AOT compilation is getting an artifact that can be loaded _at a later time_ and likely from a different host. I think that anyone making use ofModule::compile
would not be usingModule::new
on the artifact from the same host as doing AOT compilation and loading it on the same host isn't gaining anything over the code cache.It's not just code registration that we're skipping in
Module::compile
currently, but all the work done to get the module ready to be executed (i.e. "linking it" - copying the text sections into executable memory, applying relocations, etc.), none of which is required for AOT compilation.To me, at least, AOT compilation is a pretty explicit action that's independent of constructing a
Module
that can be executed. PerhapsModule::compile
isn't the best place for this, but I am strongly in favor of now allowing aModule
to construct if the target isn't for host.
peterhuene edited PR Review Comment.
peterhuene edited PR Review Comment.
peterhuene edited PR Review Comment.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
My main worry is that having
Module::{new, compile}
could possibly be confusing for users because they both look like constructors and most might even lean towards "compile" since wasm is generally understood to be compiled to native code. Perhaps though we can solve all this by tweaking things a bit API-wise.In a hypothetical future version of the
wasmtime
crate you'll be able to jettison the runtime entirely and instead only have the cranelift compiler (and vice versa, you could only have the runtime without the compiler). In this hypothetical world ifModule
is a "ready to instantiate module" then we wouldn't want to expose it from the version of the crate that is missing the runtime. That would putModule::compile
in a weird place since it needs to exist but the type it's attached to doesn't want to exist.Perhaps we could solve this by moving the method to
Engine::compile
? That way it's theoretically far away enough api-wise to not confuse users, we'll still have anEngine
even with all modes of the wasmtime crate (whether you're removing the runtime or not), and we can clearly document that it's only intended for ahead-of-time-compiled cases.I think though that I would still want to tweak things a bit ideally:
- I think the format of the data from
Module::compile
andModule::serialize
should be the same- We could probably remove
Module::deserialize
and just document more sources of binary data forModule::new
and/orfrom_binary
- Personally I would still like to remove the
impl Write
in favor of just returning aVec<u8>
. I think that working with aVec<u8>
is generally more convenient and if we really need it for performance we could always add something likecompile_into
orserialize_into
.How does that all sound?
peterhuene created PR Review Comment:
I'm on-board with everything you just proposed; I agree that moving
compile
out ofModule
into (perhaps)Engine::compile_module
would make it clear that this is not useful for constructing aModule
itself.I also am not opposed to using
Vec<u8>
instead ofimpl Write
; agree that it's easy enough to enhance if necessary in the future.
peterhuene submitted PR Review.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Oh actually a nice name might be
precompile_module
or something like that to emphasize it a bit more?
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Even better!
peterhuene updated PR #2791 from compile-command
to main
.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Okay, so I kept the
--cranelift-set
and--cranelift-enable
flags, removed the various Cranelift flags fromwasmtime compile
, and added awasmtime settings
command that prints out the available settings for the target triple, e.g.:$ wasmtime settings Cranelift settings for target 'x86_64-apple-darwin': Boolean settings: has_sse3 Has support for SSE3. has_ssse3 Has support for SSSE3. has_sse41 Has support for SSE4.1. has_sse42 Has support for SSE4.2. has_avx Has support for AVX. has_avx2 Has support for AVX2. has_avx512dq Has support for AVX512DQ. has_avx512vl Has support for AVX512VL. has_avx512f Has support for AVX512F. has_popcnt Has support for POPCNT. has_bmi1 Has support for BMI1. has_bmi2 Has support for BMI2. has_lzcnt Has support for LZCNT. Presets: baseline A baseline preset with no extensions enabled. nehalem Nehalem microarchitecture. haswell Haswell microarchitecture. broadwell Broadwell microarchitecture. skylake Skylake microarchitecture. cannonlake Canon Lake microarchitecture. icelake Ice Lake microarchitecture. znver1 Zen (first generation) microarchitecture. Settings enabled for this host: - has_sse3 - has_ssse3 - has_sse41 - has_sse42 - has_avx - has_avx2 - has_popcnt - has_bmi1 - has_bmi2 - has_lzcnt
I've pushed up the commit, so let me know if this is inline with what you were thinking.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
The command intentionally doesn't present any of the "shared" settings as those really shouldn't be mucked around with by users, but we allow them for
--cranelift-set
/--cranelift-enable
for users that know what they're doing.
peterhuene updated PR #2791 from compile-command
to main
.
peterhuene updated PR #2791 from compile-command
to main
.
peterhuene updated PR #2791 from compile-command
to main
.
peterhuene updated PR #2791 from compile-command
to main
.
peterhuene updated PR #2791 from compile-command
to main
.
peterhuene updated PR #2791 from compile-command
to main
.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Technically I think we might want to check a number of things here in the limit, e.g. if you enable avx512 or something like that but you don't have it we should probably bail out here at some point. I'm fine with that being a FIXME though to basically check things other than just the target.
alexcrichton created PR Review Comment:
Reading this again, is there perhaps a method on a
target_lexicon
for checking this? For example if the arch/os match you could I suppose in theory have a glibc/musl mismatch which would prevent running things. That's never an issue for us in wasmtime but this seems like an abstract thingtarget_lexicon
might handle for us.
alexcrichton created PR Review Comment:
Are these groups still applicable? (I won't pretend to know what clap is doing here)
alexcrichton created PR Review Comment:
Could the documentation be expanded here to mention that the intent of this method is that you'd take these bytes to
Module::new
at some other time, perhaps on a different machine or in a different process?
alexcrichton created PR Review Comment:
That looks fantastic to me, thanks for doing that!
alexcrichton submitted PR Review.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
There's nothing I can see in
target_lexicon
(re: a "compatibility check") that would be useful here. Did you happen to spot something that might be?
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Good eye, these should be removed.
peterhuene updated PR #2791 from compile-command
to main
.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I've removed
flags_hash
in favor of storing all of the shared and ISA-specific settings in the compiled module.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I've done just that; we now store all the shared and ISA-specific settings in the compiled module and can now print out which of the shared flags is missing or different.
peterhuene updated PR #2791 from compile-command
to main
.
peterhuene updated PR #2791 from compile-command
to main
.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Nah this is more of just a random musing.
alexcrichton submitted PR Review.
cfallin submitted PR Review.
cfallin created PR Review Comment:
One possible backwards-compatibility consideration: it looks like this will result in no encoding of default options in the saved compiled module, but if we later change a default in an incompatible way, would we want to catch that?
(It's entirely possible that this is made irrelevant by some other versioning -- do you encode e.g. a git hash or wasmtime version in the
.cwasm
file too?)
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Agreed that there are way too many ways incompatibilities can be introduced, so to combat that we do indeed record the version in the cwasm and must be an identical match for now.
peterhuene edited PR Review Comment.
peterhuene edited PR Review Comment.
peterhuene merged PR #2791.
Last updated: Dec 23 2024 at 12:05 UTC