Stream: git-wasmtime

Topic: wasmtime / issue #12238 Compiling SQL queries with Cranel...


view this post on Zulip Wasmtime GitHub notifications bot (Jan 02 2026 at 13:51):

espoal opened issue #12238:

I'm building a next generation distributed datastore and I would like to add the possibility of querying using a relational model. A few weeks ago I had the crazy idea of compiling SQL prepared statements instead of running them through a traditional query planner, before finding out I'm not the first one to use this approach.

Right now I _transliterate_ the query to Rust before compiling it, but maybe there is a smarter approach. My questions are:

More details:
Let's say I have a query in the form:

SELECT
    ot.customer_ID,
    ct.customer_name,
    SUM(ot.order_total) as total_amount,
    UDF.heavy_function(ot.order_total) AS processed_total
FROM orders_table AS ot
JOIN
    customers_table AS ct ON ct.customer_ID = ot.customer_ID
WHERE
    ot.customer_ID IN $customers_array
AND
    ot.order_date BETWEEN $start_date AND $end_date
GROUP BY
    ot.customer_ID,
    ct.customer_name;

Right now the compilation looks like:

To run the query:

Ideally I would like to run the query inside a WASM runtime.

I apologize in advance if this question is off topic for this repo and for my noobishness.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 02 2026 at 16:31):

cfallin commented on issue #12238:

@espoal yes, it should be possible to use Cranelift as a JIT-compiler backend for SQL queries -- we are a fully general-purpose compiler and there's nothing too special about SQL once it's query-planned down to loops and loads and compares and the like, IIUC. (We have SIMD operators you'd likely want to use for scans as well.)

That said, one point of clarification: you ask about Cranelift (which is an SSA IR to machine code compiler) but then say

Ideally I would like to run the query inside a WASM runtime.

which is a different thing -- to run Wasm (in our universe) one would typically use Wasmtime, which translates Wasm bytecode to Cranelift's IR then invokes Cranelift to compile that to machine code; and the resulting machine code calls into Wasmtime's runtime, so it's not a separable piece.

Either generating machine code and invoking it directly (raw Cranelift JIT backend) or generating Wasm modules on the fly and loading/invoking them in Wasmtime are possible; the latter harder mainly because of the memory access question (does the Wasm module get the whole database in memory as its heap? compile-time builtins? something else?). Using just Cranelift and generating loads (and stores) to your native data structures would be the conventional and recommended approach.

One other point: you ask if you could "skip the transliteration step", and I don't think you can no matter what you do, though you may be transliterating ("compiling"!) to a different target. Right now you're basically compiling SQL to Rust; instead you could compile SQL to Cranelift IR (CLIF). It's lower-level, so you'll have to implement e.g. match semantics yourself with raw if-else branches or branch tables, but very possible.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 02 2026 at 18:40):

espoal commented on issue #12238:

Dear @cfallin thank you for your polite reply and for engaging with my idea! I appreciate it a lot!

So to repeat what I understood:

  1. I would parse the SQL and lower it to an AST
  2. I would perform query planning / some optimizations during this first lowering
  3. I would need to write a custom frontend that takes the AST and feeds it to Cranelift for lowering into CLIF
  4. Linking external code via cranelift-module would happen here?
  5. I could feed the generated CLIF to Wasmtime, letting the pipeline deal with any further lowering
  6. Or I could lower the CLIF to machine code for native execution

The situation gets a bit more complicated because I would like to let the user create UDFs in any language that is supported by Cranelift. This linking would happen at step 4 I guess? If the user writes a Rust/C++ UDF I would use the respective frontends to lower the code to CLIF, and then link the two representations using cranelift-module?

I'm also a bit confused about AOT vs JIT in the Wasmtime pipeline, let me see if I got this right:

With obvious tradeoffs in execution/compilation speed. Is this right?

To further elaborate on my idea I took inspiration from this presentation, where a JIT or AOT is picked based on some heuristics (slow vs fast queries, one time vs reusable queries). I also would like to improve the current compiling databases approaches by putting emphasis on parameterized queries which should enable better caching. Cranelift seems to be IDEAL to replicate the current body of academic research, without having to write custom LLVM patchsets like Umbra and Hyper are doing.

the latter harder mainly because of the memory access question (does the Wasm module get the whole database in memory as its heap? compile-time builtins? something else?)

My datastore is architected as a distributed state machine with swizzled pointers, like:

project@account/collectionName/tableName/columnName[:special_qualifier]

At compile time the data requirements are analyzed and cached along the output. At runtime I would fetch the data before execution, so I could load it in the heap. I was thinking of exposing this via a load_swz_ptr syscall, possibly using the WASI specification if it's not overkill, very much like today normal load operations are used to load arbitrary records on the heap, or read is used to load an arbitrary opened file.

The execution runs in stages, so for the query I used as an example I would:

One other point: you ask if you could "skip the transliteration step", and I don't think you can no matter what you do, though you may be transliterating ("compiling"!) to a different target. Right now you're basically compiling SQL to Rust; instead you could compile SQL to Cranelift IR (CLIF). It's lower-level, so you'll have to implement e.g. match semantics yourself with raw if-else branches or branch tables, but very possible.

You are totally right! My background is math, not CS, and I suffer a bit from impostor syndrome. What I meant by skipping _transliteration_ is the idea of using a real frontend to target CLIF, like a true CS PhD would do, instead of the very hacky approach I'm using right now.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 02 2026 at 18:43):

espoal edited a comment on issue #12238:

Dear @cfallin thank you for your polite reply and for engaging with my idea! I appreciate it a lot!

So to repeat what I understood:

  1. I would parse the SQL and lower it to an AST
  2. I would perform query planning / some optimizations during this first lowering
  3. I would need to write a custom frontend that takes the AST and feeds it to Cranelift for lowering into CLIF
  4. Linking external code via cranelift-module would happen here?
  5. I could feed the generated CLIF to Wasmtime, letting the pipeline deal with any further lowering
  6. Or I could lower the CLIF to machine code for native execution

The situation gets a bit more complicated because I would like to let the user create UDFs in any language that is supported by Cranelift. This linking would happen at step 4 I guess? If the user writes a Rust/C++ UDF I would use the respective frontends to lower the code to CLIF, and then link the two representations using cranelift-module? Also you can see why Wasmtime is beneficial for execution, since UDFs are user generated untrusted code and the sandboxing comes very useful.

I'm also a bit confused about AOT vs JIT in the Wasmtime pipeline, let me see if I got this right:

With obvious tradeoffs in execution/compilation speed. Is this right?

To further elaborate on my idea I took inspiration from this presentation, where a JIT or AOT is picked based on some heuristics (slow vs fast queries, one time vs reusable queries). I also would like to improve the current compiling databases approaches by putting emphasis on parameterized queries which should enable better caching. Cranelift seems to be IDEAL to replicate the current body of academic research, without having to write custom LLVM patchsets like Umbra and Hyper are doing.

the latter harder mainly because of the memory access question (does the Wasm module get the whole database in memory as its heap? compile-time builtins? something else?)

My datastore is architected as a distributed state machine with swizzled pointers, like:

project@account/collectionName/tableName/columnName[:special_qualifier]

At compile time the data requirements are analyzed and cached along the output. At runtime I would fetch the data before execution, so I could load it in the heap. I was thinking of exposing this via a load_swz_ptr syscall, possibly using the WASI specification if it's not overkill, very much like today normal load operations are used to load arbitrary records on the heap, or read is used to load an arbitrary opened file.

The execution runs in stages, so for the query I used as an example I would:

One other point: you ask if you could "skip the transliteration step", and I don't think you can no matter what you do, though you may be transliterating ("compiling"!) to a different target. Right now you're basically compiling SQL to Rust; instead you could compile SQL to Cranelift IR (CLIF). It's lower-level, so you'll have to implement e.g. match semantics yourself with raw if-else branches or branch tables, but very possible.

You are totally right! My background is math, not CS, and I suffer a bit from impostor syndrome. What I meant by skipping _transliteration_ is the idea of using a real frontend to target CLIF, like a true CS PhD would do, instead of the very hacky approach I'm using right now.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 03 2026 at 01:10):

cfallin commented on issue #12238:

You're on the right track! A few thoughts:


Last updated: Jan 09 2026 at 13:15 UTC