Hi all -- just wanted to leave a note here that (as of yesterday) I made a Motion to Finalize the RFC regarding our instruction-selector DSL. Here is the link: https://github.com/bytecodealliance/rfcs/pull/15. @fitzgen (he/him) gave an excellent presentation on the latest progress yesterday as well (link).
Merging this RFC would mean that we've decided to move forward with defining our instruction-selection backends in Cranelift with the DSL we've been prototyping. The details of the bindings and definitions can of course be tweaked over time, just as for any other part of the codebase.
So far it has a few approvals but I wanted to make sure folks saw this -- especially if you've participated in conversations so far, and are ok with this direction, please do head over to the RFC and let us know if you give final approval! Or, if not, let us know if any issues still remain.
cc @Benjamin Bouvier @Anton Kirilov @Sam Parker @Johnnie Birch @Ulrich Weigand especially, who have participated actively in the last several discussions in our meetings -- thanks for the time and patience on this :-)
Hi! I was absent from the meeting because of a public holiday here. I was a bit curious about the next steps, especially around the approach: does it mean we're going to have the new system checked in in code in wasmtime soon? And then starting to rewrite incrementally lowering so as to use the new system (as opposed as having the new system be developed on the side, and have a one-time switch of the backends to the new systems entirely)? In the latter cases, do we have benchmarks running on a very frequent basis, to make sure that the compile times don't get bad?
Hey Ben -- sorry, didn't realize the date was a conflict for .fr! (FWIW this seems like a totally valid reason to ask for agenda items to move in the future, if important folks can't make it :-) )
The tl;dr of my answer is (i) gradual, not all-at-once switch, and (ii) yes, we'll benchmark, and avoid any perf regressions. In more detail:
The way that the integration branch works now is that the generated code from the patterns is invoked first, and if it returns None
, then the original code is invoked. The idea is that this will allow us to have a lot of little PRs to move over a few instruction lowerings at a time, all the while keeping the thing working; this is less risky than trying to match behavior exactly and switch all at once, especially if the backend is a moving target.
Performance should be faster or at parity, by construction (by generating code that is the same as what we write by hand initially, then improving). Actually in the current code we're already doing a few tricks that the handwritten backends do not, like matching directly on the InstructionData
rather than opcode first then extracting data separately. In the future we can algorithmically optimize the order of matching and combining of matching effort; @fitzgen had a great idea yesterday for example to do a pass to reorder matches to increase the amount of work shared between rules (isle#11). The cool thing about ideas like that is that it's now practical to make such changes across all backend code; it would've been very hard or impossible to "transpose" the whole backend or turn it inside out for performance with the handwritten methodology.
To be sure of this, we will benchmark compile time when we do the initial PR to bring in the framework and the first few lowering rules, and we will not merge if there is any slowdown until we can fix it. As we move code over, we expect compile time to remain the same or faster, but it would be nice to have benchmarks for this too. I don't know if we want to go to the length of requiring contributors to benchmark manually in every PR -- this seems extreme, when we don't have such a requirement for other changes, and hopefully an initial benchmark plus the argument that the generated code is the same or better is enough -- but I'm curious what you and others say to that.
To some degree, "continuous benchmarking" is the domain of RFCs #3 and #4, in which we agreed we wanted infrastructure like this; I haven't kept up with the progress on implementing these but I know we at least allocated a CI machine for it earlier in the year. @Johnnie Birch do you have any updates on the continuous benchmarking infrastructure?
Last updated: Nov 22 2024 at 17:03 UTC