Stream: cranelift

Topic: emmiting clif from rust?


view this post on Zulip marin (Dec 04 2024 at 11:02):

I am building an embedded SQL database and experimenting with cranelift to generate efficient query operators. One of the main issues with jitted query executors is that they quickly become rather complex. You often want to inline calls made in loops, meaning that you have to emit code for many primitives and data structures. The task quickly becomes daunting if you add to that specialization for all data types. Now, I was wondering if it would be possible to write some library in rust, and store that in the program static assets. When it comes time to build the query, I could assemble fragments from that library, inlining where necessary. Now I have a couple of questions:

Sorry if any of those questions seem obvious, I'm not very experienced working with JITs :)

view this post on Zulip marin (Dec 04 2024 at 11:11):

one more question:

view this post on Zulip fitzgen (he/him) (Dec 04 2024 at 18:25):

database queries are definitely outside of my expertise, but I'll try to take a crack at answering your questions

Could something like that be hacked together with rustc_codegen_cranelift

Is your input language Rust? If not, then I don't think it makes sense to use rustc_codegen_cranelift. you would want to write your own input-language-to-CLIF frontend instead.

Is there any way those functions could be inlined? Does Cranelift even support that?

Cranelift does not have an inlining pass today. It is something we want to add as an option eventually, but probably not something we would run automatically. In the meantime, it is expected that the CLIF producer (eg wasmtime's wasm-to-clif frontend or rustc_codegen_cranelift) has already done inlining where it was beneficial or desired.

does this approach even make any sense?

without knowing more, I suspect that you probably want to write your own query-language-to-clif frontend. other than that, I think it does make sense.

I remember seeing a paper not too too long ago that benchmarked a bunch of different JIT compilers/frameworks in the context of JITing database queries. It may make sense for you to track that paper down and read it for more context and inspiration. If I can dredge it up, I'll link it here.

Would it be possible to attach debug information to those fragments to pass compiled queries through a debugger?

yes, primarily through cranelift_frontend::FunctionBuilder::set_val_label. although you may find gaps around certain corners of DWARF or incompleteness. certainly we don't do the best job of preserving debug info across all our optimizations; would certainly appreciate issues filed for any specific cases you run into so we can track, diagnose, and fix them. for the best debugging fidelity, disable optimizations.

view this post on Zulip Chris Fallin (Dec 04 2024 at 18:32):

fitzgen (he/him) said:

Could something like that be hacked together with rustc_codegen_cranelift

Is your input language Rust? If not, then I don't think it makes sense to use rustc_codegen_cranelift. you would want to write your own input-language-to-CLIF frontend instead.

if I understood the proposal above correctly, the idea was to write a runtime library of sorts in Rust, and then take its functions and inline them directly into Cranelift-compiled JIT'd query code. Is that right, @marin ?

That seems plausible to me -- the technical bit that makes it work is that rustc_codegen_cranelift is ABI-compatible with LLVM Rust (the default/main toolchain), so if you access internal data structures, etc., everything should work just as if you had called a separately compiled function. If it doesn't, we would consider that a bug (either in cg_clif or Cranelift, depending on the issue).

I also suspect you'd want to build a purpose-built inliner for this, at least at first, rather than try to build general framework in Cranelift: you have more information about intent and performance tradeoffs than a generic inliner's heuristics would, so you could know that, for example, you always want to inline simple field accessor functions in your runtime, and that they don't call back into the query code. The "code transcription" part of an inliner is the relatively simple part; the heuristics and the plumbing to operate over connected components of the callgraph is much harder, and that's the part you wouldn't need if you do "manual inlining" at runtime-library calls.

view this post on Zulip marin (Dec 04 2024 at 18:42):

@Chris Fallin yes that's correct! I wasn't even expecting to be able to work with rust datatype, but rather, just being able to write the operator in rust leverage monomorphisation to generate a specialized implementation for all my database types and reduce the maintenance burden. At runtime, I would assemble those building blocks into pipelines

@fitzgen (he/him) are you thinking about this paper? https://www.vldb.org/pvldb/vol11/p2209-kersten.pdf

view this post on Zulip fitzgen (he/him) (Dec 04 2024 at 18:47):

don't think it was that paper; the paper I'm thinking of included cranelift in its benchmark evaluation. I remember having some nitpicks about the way they used or presented cranelift but otherwise thinking it was good. it compared building the "same" query JIT with multiple approaches/frameworks, iirc

view this post on Zulip Chris Fallin (Dec 04 2024 at 18:47):

ah, yeah, that paper is right on the tip of my neurons too, unfortunately not recalling the author or title... I'll dig a bit

view this post on Zulip marin (Dec 04 2024 at 18:48):

As a first step, I will probably stick to compiling expressions in where clauses and maybe an aggregation operator. But even then, take the group by aggregation clause, for example. You could implement that as a hashmap, with a specialization for every possible key type. Being able to leverage the Rust ecosystem would be really neat.

view this post on Zulip marin (Dec 04 2024 at 18:48):

I'll run a search on google scholars, if it mentions cranelift it should pop up :)


Last updated: Dec 23 2024 at 13:07 UTC