Thought I'd create a stream for the new proposed backend, to continue on from irc
cc @Chris Fallin (Julian is not here yet).
Hello! Looking forward to new-backend discussions here.
does the slide deck from a month or so ago still reflect the current state of work for the new backend? interested in staying abreast of its design and thinking about how we might fit in pattern match and replace DSLs for peephole opts and legalization etc
@Nick Fitzgerald , you're referring to the 2020-01-06 presentation? At a high level, yes, we're still moving in that direction, though there's been a lot of implementation and refinement since then!
For raw data, you can see our in-progress side-branch at https://github.com/cfallin/cranelift (the interesting bits are in cranelift-codegen/src/{machinst,isa/arm64,isa/x64}), though we could (and will) do a better job of documenting the high-level structure once we've got our MVP (integer instruction set on ARM64, running Wasm) done.
I'd be happy to talk more about how this might integrate with peephole / superoptimizer-type work, though!
thanks for the link! I'll add a topic to the next cranelift meeting's agenda.
context is I'm going to help Jubi Taneja finish her research project to create a peephole optimizer for cranelift that is seeded with optimizations from souper that she started as an intern a couple summers ago. my hope is that once the new backend gets in place we can try and merge this in / rebase it on top
@Chris Fallin thanks for this nice doc comment :)
https://github.com/cfallin/cranelift/blame/new-isa-def-2/cranelift-codegen/src/machinst/mod.rs#L1
For reference: @Nick Fitzgerald is probably talking about https://github.com/jubitaneja/codegen
yes :)
@Nick Fitzgerald https://github.com/jubitaneja/codegen/blob/master/fn.rs this is auto-generated?
@Chris Fallin has 'clif-util compile' changed behaviour? It seems to be running code from cranelift-codegen/meta/src/shared/legalize.rs
now?
@Joey Gouly yes, that's right, I've wired (some) legalization passes into the new backend's pipeline now. Is it interfering with one of the new instructions you're adding?
@Chris Fallin but there seems to be a difference between clif-util test and clif-util compile?
try cargo r test filetests/vcode/arm64/bitops.clif
(works) cargo r compile --target arm64 filetests/vcode/arm64/bitops.clif
(fails)
oh, I see -- I'll take a look. The wiring up of the new pipeline is still very much a work-in-progress so I may have missed something :-)
you might have to implement the ihsr+imm stuff, or turn off the bitrev legalisation for now
@Joey Gouly I believe so, yes
@Chris Fallin also --set=opt_level=speed
doesn't work. In cranelift-codegen/src/context.rs
opt_level
is still None
.
@Joey Gouly : it seems that opt_level comes through the ISA flags, which I haven't plumbed through in any real way yet... thanks for the heads-up!
@Chris Fallin It also seems like the legalisation isn't working as intended, I'm still getting GlobalValue and IcmpImm through to the lower() functions
Yes, I was debugging this yesterday afternoon as part of wasmtime bringup; I'll try to work this out today
Cool, I saw that you were working on that. Let me know when there is something testable on wasmtime. I can try on my arm64 desktop
Will do!
I've been trying to compile some wasm with clif-util wasm
, and I'm just stubbing stuff out as I go, to see what isn't implemented (stubbing them out usually with a mov rd, 0)
I'm using a hello world compiled with wasi-sdk, so I'm jumping into the deep end with interesting libc initialization bits; so far I know that the new backend is at least missing jump tables, and probably a few more details related to global values, and I haven't written any code to interpret the arm64 relocations yet so that should come in somewhere. But it seems it's not too far off
@Chris Fallin I have a function where RA takes 460s!
@Joey Gouly wow, that's... impressive. @jseward and @Benjamin Bouvier are leading work on the regalloc crate -- we'd probably be interested in a test case (or at least a general description -- very long function, deeply nested control flow, too many overlapping live ranges, ...?)
Is Julian on holiday? Seems like he doesn't want to join zulip :P I see that there's a hot function SortedRangeFragIxs::check
function which looks like it should (?) be enabled in debug only
Just pinged both on Matrix (they're both online)!
I saw that Benjamin added a EpiloguePlaceholder
, I guess that's the new backends FallthroughReturn
SortedRangeFragIxs::del
and SortedRangeFragIxs::can_add
seem to be other hot functions. I changed del
to use with_capacity
instead of new
for res
, but it didn't seem to have much of an effect
Joey Gouly said:
I saw that Benjamin added a
EpiloguePlaceholder
, I guess that's the new backendsFallthroughReturn
Yep, we decided this was a less confusing name.
In this test case SortedRangeFragIxs::del
is called 21,902,029 times
Progress: new ARM64 backend has enough working to get through wasi libc init and print "Hello world"!
$ qemu-aarch64 target/aarch64-unknown-linux-gnu/release/wasmtime run ~/hello-world.wasm hello world
It takes quite a lot of code to print hello world with libc, so that really says something!
welcome @Julian Seward
Hi!
That's a big bit of CLIF. Can I ask a couple questions about it?
So did you manage to reproduce the long compile time?
sure!
Not yet. I spent all yesterday and this morning rewriting the allocator's core allocate-evict loop so as to remove a very stupid performance problem that exists in the current version (and which I'm sure is related to what you saw).
Now I'm trying to un-break it :-/
Q: (mostly for my curiousity): (1) what is that CLIF? Where is it from? and (2) do the existing allocator sources create a correct allocation for it, after 70 mins?
I don't know what function it is, but it's from a benchmark I wrote using the regex crate. Im not sure if it's correct, since I didn't run it (+ some arm64 functionality was stubbed out)
Ok. Well, let me try to get this thing back on the road. Then I'll have a look at the test case.
Trying it now. So far, it's spending a lot of time in calculation of dominators
Were you using a debug or release build?
Release
Er, how do I get that? (cd cranelift && cargo build release) doesn't work
--release
you can also do cargo run --release
Running now. Roughly how big was the original function? I am seeing tens of thousands of virtual registers coming into regalloc.
I didn't find out what the original function was, just took the CLIF that was generated
Chris Fallin said:
Progress: new ARM64 backend has enough working to get through wasi libc init and print "Hello world"!
$ qemu-aarch64 target/aarch64-unknown-linux-gnu/release/wasmtime run ~/hello-world.wasm hello world
I reproduced this natively on my arm64 desktop!
Last updated: Jan 24 2025 at 00:11 UTC