Stream: cranelift

Topic: new-backend


view this post on Zulip Joey Gouly (Feb 21 2020 at 14:45):

Thought I'd create a stream for the new proposed backend, to continue on from irc

view this post on Zulip Benjamin Bouvier (Feb 21 2020 at 16:28):

cc @Chris Fallin (Julian is not here yet).

view this post on Zulip Chris Fallin (Feb 21 2020 at 17:02):

Hello! Looking forward to new-backend discussions here.

view this post on Zulip fitzgen (he/him) (Feb 21 2020 at 18:16):

does the slide deck from a month or so ago still reflect the current state of work for the new backend? interested in staying abreast of its design and thinking about how we might fit in pattern match and replace DSLs for peephole opts and legalization etc

view this post on Zulip Chris Fallin (Feb 21 2020 at 18:27):

@Nick Fitzgerald , you're referring to the 2020-01-06 presentation? At a high level, yes, we're still moving in that direction, though there's been a lot of implementation and refinement since then!

For raw data, you can see our in-progress side-branch at https://github.com/cfallin/cranelift (the interesting bits are in cranelift-codegen/src/{machinst,isa/arm64,isa/x64}), though we could (and will) do a better job of documenting the high-level structure once we've got our MVP (integer instruction set on ARM64, running Wasm) done.

I'd be happy to talk more about how this might integrate with peephole / superoptimizer-type work, though!

Contribute to cfallin/cranelift development by creating an account on GitHub.

view this post on Zulip fitzgen (he/him) (Feb 21 2020 at 18:32):

thanks for the link! I'll add a topic to the next cranelift meeting's agenda.

context is I'm going to help Jubi Taneja finish her research project to create a peephole optimizer for cranelift that is seeded with optimizations from souper that she started as an intern a couple summers ago. my hope is that once the new backend gets in place we can try and merge this in / rebase it on top

view this post on Zulip fitzgen (he/him) (Feb 21 2020 at 18:59):

@Chris Fallin thanks for this nice doc comment :)

https://github.com/cfallin/cranelift/blame/new-isa-def-2/cranelift-codegen/src/machinst/mod.rs#L1

Contribute to cfallin/cranelift development by creating an account on GitHub.

view this post on Zulip bjorn3 (Feb 21 2020 at 19:00):

For reference: @Nick Fitzgerald is probably talking about https://github.com/jubitaneja/codegen

Automatic peephole optimizer for Cranelift JIT compiler - jubitaneja/codegen

view this post on Zulip fitzgen (he/him) (Feb 21 2020 at 19:00):

yes :)

view this post on Zulip Joey Gouly (Feb 25 2020 at 10:16):

@Nick Fitzgerald https://github.com/jubitaneja/codegen/blob/master/fn.rs this is auto-generated?

Automatic peephole optimizer for Cranelift JIT compiler - jubitaneja/codegen

view this post on Zulip Joey Gouly (Feb 25 2020 at 12:47):

@Chris Fallin has 'clif-util compile' changed behaviour? It seems to be running code from cranelift-codegen/meta/src/shared/legalize.rs now?

view this post on Zulip Chris Fallin (Feb 25 2020 at 16:42):

@Joey Gouly yes, that's right, I've wired (some) legalization passes into the new backend's pipeline now. Is it interfering with one of the new instructions you're adding?

view this post on Zulip Joey Gouly (Feb 25 2020 at 16:46):

@Chris Fallin but there seems to be a difference between clif-util test and clif-util compile?

view this post on Zulip Joey Gouly (Feb 25 2020 at 16:47):

try cargo r test filetests/vcode/arm64/bitops.clif (works) cargo r compile --target arm64 filetests/vcode/arm64/bitops.clif (fails)

view this post on Zulip Chris Fallin (Feb 25 2020 at 16:50):

oh, I see -- I'll take a look. The wiring up of the new pipeline is still very much a work-in-progress so I may have missed something :-)

view this post on Zulip Joey Gouly (Feb 25 2020 at 16:54):

you might have to implement the ihsr+imm stuff, or turn off the bitrev legalisation for now

view this post on Zulip fitzgen (he/him) (Feb 25 2020 at 18:04):

@Joey Gouly I believe so, yes

view this post on Zulip Joey Gouly (Feb 26 2020 at 15:47):

@Chris Fallin also --set=opt_level=speed doesn't work. In cranelift-codegen/src/context.rs opt_level is still None.

view this post on Zulip Chris Fallin (Feb 26 2020 at 17:40):

@Joey Gouly : it seems that opt_level comes through the ISA flags, which I haven't plumbed through in any real way yet... thanks for the heads-up!

view this post on Zulip Joey Gouly (Feb 26 2020 at 17:57):

@Chris Fallin It also seems like the legalisation isn't working as intended, I'm still getting GlobalValue and IcmpImm through to the lower() functions

view this post on Zulip Chris Fallin (Feb 26 2020 at 17:58):

Yes, I was debugging this yesterday afternoon as part of wasmtime bringup; I'll try to work this out today

view this post on Zulip Joey Gouly (Feb 26 2020 at 17:59):

Cool, I saw that you were working on that. Let me know when there is something testable on wasmtime. I can try on my arm64 desktop

view this post on Zulip Chris Fallin (Feb 26 2020 at 18:02):

Will do!

view this post on Zulip Joey Gouly (Feb 26 2020 at 18:03):

I've been trying to compile some wasm with clif-util wasm, and I'm just stubbing stuff out as I go, to see what isn't implemented (stubbing them out usually with a mov rd, 0)

view this post on Zulip Chris Fallin (Feb 26 2020 at 18:07):

I'm using a hello world compiled with wasi-sdk, so I'm jumping into the deep end with interesting libc initialization bits; so far I know that the new backend is at least missing jump tables, and probably a few more details related to global values, and I haven't written any code to interpret the arm64 relocations yet so that should come in somewhere. But it seems it's not too far off

view this post on Zulip Joey Gouly (Feb 27 2020 at 14:57):

@Chris Fallin I have a function where RA takes 460s!

view this post on Zulip Chris Fallin (Feb 27 2020 at 17:18):

@Joey Gouly wow, that's... impressive. @jseward and @Benjamin Bouvier are leading work on the regalloc crate -- we'd probably be interested in a test case (or at least a general description -- very long function, deeply nested control flow, too many overlapping live ranges, ...?)

view this post on Zulip Joey Gouly (Feb 27 2020 at 17:19):

Is Julian on holiday? Seems like he doesn't want to join zulip :P I see that there's a hot function SortedRangeFragIxs::check function which looks like it should (?) be enabled in debug only

view this post on Zulip Chris Fallin (Feb 27 2020 at 17:20):

Just pinged both on Matrix (they're both online)!

view this post on Zulip Joey Gouly (Feb 27 2020 at 17:22):

I saw that Benjamin added a EpiloguePlaceholder, I guess that's the new backends FallthroughReturn

view this post on Zulip Joey Gouly (Feb 27 2020 at 17:28):

SortedRangeFragIxs::del and SortedRangeFragIxs::can_add seem to be other hot functions. I changed del to use with_capacity instead of new for res, but it didn't seem to have much of an effect

view this post on Zulip Chris Fallin (Feb 27 2020 at 17:29):

Joey Gouly said:

I saw that Benjamin added a EpiloguePlaceholder, I guess that's the new backends FallthroughReturn

Yep, we decided this was a less confusing name.

view this post on Zulip Joey Gouly (Feb 27 2020 at 17:41):

In this test case SortedRangeFragIxs::del is called 21,902,029 times

view this post on Zulip Chris Fallin (Feb 28 2020 at 22:27):

Progress: new ARM64 backend has enough working to get through wasi libc init and print "Hello world"!

$  qemu-aarch64  target/aarch64-unknown-linux-gnu/release/wasmtime run ~/hello-world.wasm
hello world

view this post on Zulip Dan Gohman (Feb 28 2020 at 22:41):

It takes quite a lot of code to print hello world with libc, so that really says something!

view this post on Zulip Joey Gouly (Mar 03 2020 at 12:09):

welcome @Julian Seward

view this post on Zulip Julian Seward (Mar 03 2020 at 12:24):

Hi!

view this post on Zulip Julian Seward (Mar 03 2020 at 12:25):

That's a big bit of CLIF. Can I ask a couple questions about it?

view this post on Zulip Joey Gouly (Mar 03 2020 at 12:25):

So did you manage to reproduce the long compile time?

view this post on Zulip Joey Gouly (Mar 03 2020 at 12:25):

sure!

view this post on Zulip Julian Seward (Mar 03 2020 at 12:26):

Not yet. I spent all yesterday and this morning rewriting the allocator's core allocate-evict loop so as to remove a very stupid performance problem that exists in the current version (and which I'm sure is related to what you saw).

view this post on Zulip Julian Seward (Mar 03 2020 at 12:27):

Now I'm trying to un-break it :-/

view this post on Zulip Julian Seward (Mar 03 2020 at 12:28):

Q: (mostly for my curiousity): (1) what is that CLIF? Where is it from? and (2) do the existing allocator sources create a correct allocation for it, after 70 mins?

view this post on Zulip Joey Gouly (Mar 03 2020 at 12:30):

I don't know what function it is, but it's from a benchmark I wrote using the regex crate. Im not sure if it's correct, since I didn't run it (+ some arm64 functionality was stubbed out)

view this post on Zulip Julian Seward (Mar 03 2020 at 12:32):

Ok. Well, let me try to get this thing back on the road. Then I'll have a look at the test case.

view this post on Zulip Julian Seward (Mar 03 2020 at 15:19):

Trying it now. So far, it's spending a lot of time in calculation of dominators

view this post on Zulip Julian Seward (Mar 03 2020 at 15:19):

Were you using a debug or release build?

view this post on Zulip Joey Gouly (Mar 03 2020 at 15:20):

Release

view this post on Zulip Julian Seward (Mar 03 2020 at 15:21):

Er, how do I get that? (cd cranelift && cargo build release) doesn't work

view this post on Zulip Joey Gouly (Mar 03 2020 at 15:21):

--release

view this post on Zulip Joey Gouly (Mar 03 2020 at 15:24):

you can also do cargo run --release

view this post on Zulip Julian Seward (Mar 03 2020 at 15:51):

Running now. Roughly how big was the original function? I am seeing tens of thousands of virtual registers coming into regalloc.

view this post on Zulip Joey Gouly (Mar 03 2020 at 15:52):

I didn't find out what the original function was, just took the CLIF that was generated

view this post on Zulip Joey Gouly (Mar 03 2020 at 16:35):

Chris Fallin said:

Progress: new ARM64 backend has enough working to get through wasi libc init and print "Hello world"!

$  qemu-aarch64  target/aarch64-unknown-linux-gnu/release/wasmtime run ~/hello-world.wasm
hello world

I reproduced this natively on my arm64 desktop!


Last updated: Jan 24 2025 at 00:11 UTC