The abstract of https://arxiv.org/abs/2011.13127 seems interesting:
Copy-and-Patch Compilation: A fast compilation algorithm for high-level languages and bytecode
[...]
The generated code runs an order of magnitude faster than interpretation and 14% faster than LLVM -O0. Our WebAssembly compiler generates code 4.9X-6.5X faster than Liftoff, the WebAssembly baseline compiler in Google Chrome. The generated code also outperforms Liftoff's by 39%-63% on the Coremark and PolyBenchC WebAssembly benchmarks.
i came here to share that too! i think cranelift could be useful to generate a stencil library. could be a nice improvement for baseline compilation in wasmtime
It's a really interesting technique, thanks!
is the full text available anywhere?
At the arxiv link above, I think! "PDF" in the right bar
oh, I missed it. thanks
in the same area, there's the classic "multi-level quickening" https://arxiv.org/abs/2109.02958
better link: http://fredrikbk.com/publications/copy-and-patch.pdf
@Alexa VanHattum @Chris Fallin continuing this thread, but not quite a paper, there's been some interesting work verification work on the LLVM side of things for aarch64: https://blog.regehr.org/archives/2265
Super cool, thank you!
https://drops.dagstuhl.de/opus/volltexte/2022/15933/pdf/dagrep_v011_i010_p173_21481.pdf
Secure compilation is an emerging field that puts together advances in security, programming languages, compilers, verification, systems, and hardware architectures in order to devise more secure compilation chains that eliminate many of today’s security vulnerabilities and that allow sound reasoning about security properties in the source language. For a concrete example, all modern languages provide a notion of structured control flow and an invoked procedure is expected to return to the right place. However, today’s compilation chains (compilers, linkers, loaders, runtime systems, hardware) cannot efficiently enforce this abstraction against linked low-level code, which can call and return to arbitrary instructions or smash the stack, blatantly violating the high-level abstraction. Other problems arise because today’s languages fail to specify security policies, such as data confidentiality, and the compilation chains thus fail to enforce them, especially against powerful side-channel attacks. The emerging secure compilation community aims to address such problems by identifying precise security goals and attacker models, designing more secure languages, devising efficient enforcement and mitigation mechanisms, and developing effective verification techniques for secure compilation chains.
This seminar strived to take a broad and inclusive view of secure compilation and to provide a forum for discussion on the topic. The goal was to identify interesting research directions and open challenges by bringing together people working on building secure compilation chains, on designing security enforcement and attack-mitigation mechanisms in both software and hardware, and on developing formal verification techniques for secure compilation.
@Alexa VanHattum https://stefanheule.com/s/projects/strata/ seems relevant for your work on verifying ISLE lowering rules. It has formal semantics for a lot of x86_64 instructions that were semi-automatically discovered. In all cases where the discovered semantics didn't match hand-written semantics, the hand-written semantics were in fact incorrect.
Not exactly a paper and possibly not all that practical, but https://www.mattkeeter.com/blog/2022-10-04-ssra/ shows a register allocator that is supposedly very fast, but does have some limitations.
Interesting write-up for sure; have seen this in LuaJIT before (as they mention) and it's a very effective technique for trace compilers. I don't see any mention of control-flow in the post, and that's one of the confounders for any single-pass algorithm; curious if they have more thoughts on it
It explicitly mentions that it doesn't work with control-flow. Of course you can always spill everything on any kind of control-flow, but that is probably very flow at runtime.
It also produces sub-optimal results if you have fixed-reg constraints since you only have one pass through the code.
But I guess you could remedy that by adding a pre-processing pass that gathers register hints for each vreg.
It's somewhat like what I had in mind for https://github.com/bytecodealliance/regalloc2/issues/81.
https://www.cs.cmu.edu/~dkoes/research/CGO08-NOLTIS.pdf looks interesting. It is a near-optimal instruction selector running in linear time.
(linked from https://lobste.rs/s/p93jau/e_graph_extraction_problem_is_np_complete)
Hydra: Generalizing Peephole Optimizations with Program Synthesis
https://users.cs.utah.edu/~regehr/generalization-oopsla24.pdf
Pretty cool! Bolting this onto souper/souper-harvest would be neat.
Afonso Bordado said:
Hydra: Generalizing Peephole Optimizations with Program Synthesis
https://users.cs.utah.edu/~regehr/generalization-oopsla24.pdf
See also https://github.com/bytecodealliance/wasmtime/issues/5783 -- would be great to land those already-generalized-for-us rules as a first step!
Scaling Instruction-Selection Verification against
Authoritative ISA Semantics
https://dl.acm.org/doi/pdf/10.1145/3764383
The follow up to Crocus from what I can tell.
Indeed, and @Michael McLoughlin ‘s talk (freshly given, just yesterday Singapore time) is here: YouTube - - YouTube
so cool! Besides bandwidth, is anything currently blocking us from realizing the CI setup described in section 6.3, where full-coverage verification happens as a recurring scheduled job?
The main thing is "merge all of this to main" -- I know that both Michael and Alexa have this on their radars but many other things have also come up. @Michael McLoughlin what's your latest estimate about how the upstreaming work might happen, timeline-wise?
Thanks, that makes sense.
Separate question: are there specific plans to expand coverage? Specifically I'd be interested in coverage of all operations relevant to guest isolation? IIUC that'd entail coverage of IR terms reachable post-optimization, stack operations, and post-wasm-1.0 operations such as those introduced by the Bulk Memory proposal.
as far as I know we should cover CLIF opcodes that opt rewrite rules generate as well; I don't know if Michael has specifically analyzed for that though.
Accesses to the stack spill area should be "verified" by the regalloc checker in the sense that the loaded slot number will be verified to have the value expected as stored by a prior store; so no uninitialized reads. The actual stack frame layout and the glue code to convert slot numbers to SP offsets is not part of what the checker sees, nor this isel verifier; that'd be interesting, too. To some degree one needs a whole-function context to validate that, which isn't a good fit for the rule-by-rule SMT verification approach we had here. I think a good solution actually looks something like proof-carrying code, which of course I haven't had the privilege of having time to push forward... someday and/or with the right collaborator!
I was basing the questions on the caveats listed on page 20 of the paper, in section 6.1. Those make it sound as though the opcodes not directly reachable from Wasm, but emitted by rewrites, aren't fully covered yet.
And yes, I've been wondering whether this leaves a gap to be closed by PCC, and that makes sense!
Ah, yes, I don't think we specifically know of opcodes that aren't covered that do appear, that was more of a caveat describing what has been evaluated / how the list was obtained.
I would love to get back to PCC and build it out to be robust and usable, too -- this gap bothers me as well :-) Maybe in, like, 2029 or something...
hey, right decade at least :grimacing:
Last updated: Dec 06 2025 at 06:05 UTC