Database connection pooling or pre-warmed connections · wasi

Yesterday we discussed with @Joel Dice, @Till Schneidereit and others that TCP+TLS handshake typically have significant network latency.

That latency is not great fit for hosting model where we want to create application/component instance per incoming request/event.

In dotnet application servers this is solved by connection pooling.. But that is not compatible with instance per request hosting.

So I wonder if we could have something like network session pooling as WASI interface. It could be implemented by the host or by another component with longer lifecycle.

We are designing TLS stream as a transformer. Without the TCP connection. But I wonder if a wrapper of that together with TCP connection is the right direction? Let's call it "TLS session pooling".

Problems I can see are in security and state management of such session.
The session typically would be authenticated to particular DB user, using password or using private key on TLS layer.
The generic session cache would no way knowing how to make the application level (SQL) handshake, like login, selection of the database schema etc.
So the creation of new session would have to be done inside of the application component ?
Also the application should not release the session back to the pool, unless it's in some base state. For example, no open DB transaction.

Alternatively I can see alternate design, where the Microsoft.Data.SqlClient would long lived WASI component living side by side next to the short lived request handler WASI component.
That would make the whole affair very specific to SQL server and dotnet. We could have bespoke WIT for that.
The benefit would be that the existing code is solving those security/state problems already.
I would call that "SQL server connection pooling component for dotnet".

In any case, it seems to me we could not make it work transparently without WASI specific changes in Microsoft.Data.SqlClient, right ?

I can see that other long lived protocols may have similar problems, if they are implemented in terms of wasi:sockets, rather than by the WASI host. Web Sockets and HTTP/3 come to mind.

So I think it would be good to establish at least some common best practice guidance.

Ralph (Sep 12 2024 at 10:42):

this is of course a great and difficult conversation. I'd chatted a bit with Till periodically about similar things

Ralph (Sep 12 2024 at 10:43):

Ralph (Sep 12 2024 at 10:45):

Ralph (Sep 12 2024 at 11:13):

in very short-lived functions, CDN functions for example, you really do not WANT threading because scheduling/orch of external work is not really what the function is for. You want execution and cleanup. In these cases:

Ralph (Sep 12 2024 at 11:13):

and for those kinds of functions, you're very likely to ship one component with everythign in it, as there are only a few functions you're implementing/using.

Ralph (Sep 12 2024 at 11:15):

Ralph (Sep 12 2024 at 11:17):

what does is the real world in which to achieve high throughput requires the things we do not yet have:threads/streams and so on. it also requires components that hold state (caches or pools) for shorter-lived items that use them -- here, the connection pooling is a great example. So I immediately think of layers of components most of which do basic work we already do in native "servers" or "clients that handle lots of things on behalf of various functions"

Ralph (Sep 12 2024 at 11:18):

wasi:sockets would be the bottom layer, in this view, and then wasi:tcp and wasi:http. Somewhere at the appropriate layer, wasi:tls would be involved -- it handles that portion of the network handshake.

Ralph (Sep 12 2024 at 11:20):

so this kind of layering might require additions to wasi:http/tcp and so on to do the tls dancing by calling out to a wasi:tls implementation. in each case, we lean into the security boundary of a component to protect against memory attacks from outside the component (keeping always in mind the lack fo readonly memory).

Till Schneidereit (Sep 12 2024 at 11:21):

one quick note before replying to other parts: wasi:http is very intentionally not specified in terms of wasi:sockets, so that layering picture doesn't reflect how things are actually set up. This is pretty key, because it means that wasi:http isn't restricted to functionality that can be expressed in terms of wasi:sockets, nor does being able to implement wasi:http need to imply also being able to implement wasi:sockets (see browsers as an example of the latter)

Ralph (Sep 12 2024 at 11:21):

in the usual case, these would all be shipped as a "stack" of components that do the right thing, used by a client that invokes the highest abstraction it's necessary to use.

Till Schneidereit (Sep 12 2024 at 11:22):

I also don't think that we need to change any of this to address the issues Pavel raised

Till Schneidereit (Sep 12 2024 at 11:22):

Pavel, Joel, and I had a chance to talk about this during a meeting yesterday, and my following thoughts are substantially based on that conversation:

I agree with Pavel that establishing a new TCP+TLS connection in terms of wasi:sockets each and every time will be prohibitively costly and inefficient. I don't think that'll change in any meaningful way with WASIp3+, nor do intra-component security considerations change the picture all that much—but I'd like to understand your argument about that better, @Ralph

Ralph (Sep 12 2024 at 11:23):

Till, that's a great point -- and one I love. I'm using a layering metaphor merely to ensure that we take into account the feature of the component boundary for memory and the higher level abstraction that most people should use that means -- like wasi:http -- that you can't just reach down and grab wasi:sockets from guest code.

Till Schneidereit (Sep 12 2024 at 11:24):

I also don't think threads really are involved all that much in this. In dotnet specifically, the connection pool is implemented in terms of threads, but that wouldn't have had to be the case. And to me the instance lifetime issues are the much more substantial concern

Ralph (Sep 12 2024 at 11:24):

Take the abstraction and figure out the path that relieves it. connection pooling is a cache to enable shorter-lived things to NOT do connection creation. and this happens at multiple layers with varying kinds of data. Caches are great things.

Till Schneidereit (Sep 12 2024 at 11:24):

Till Schneidereit (Sep 12 2024 at 11:26):

Ralph (Sep 12 2024 at 11:27):

so I'm thinking out loud about how you would establish a coherent http/tls story that doesn't just open calls up to everyone. Maybe we need to! But I'd like to think that whatever needs to establish secure connections AND pool them might be their own components that are typically configured together. Yes, the user might code to wasi:sql (for one example) and oracle:sql (for another) but we wouldn't be building either with full access to all the calls involved.

Ralph (Sep 12 2024 at 11:28):

I happen to love the component memory boundary as a feature, and I look for places to lean in.

Ralph (Sep 12 2024 at 11:30):

but when we're building the innards of the core protocols, it's possible we can't do it easily -- yet. And this is where the threading/streams comes in. Once you have threads, you can have async scale processing that takes advantage of cores. That means that prohibitively costly and inefficient will become less so. Once you have streams, you can have network filters that can actually approach native speeds (which can't happen with copying that fast).

Ralph (Sep 12 2024 at 11:31):

a real web server does several layers of caching different things and each one is managed using thread systems. They max out the OS functionality to the very best of their ability. There is no way we could hope to approach that in components until we have similar OS-like capabilities. Maybe even then we don't get close enough! But that's the point I'm trying to make about the difference between p2 and p3+.

Pavel Šavara (Sep 12 2024 at 11:32):

I'm thinking that read-only "caching" of the sessions would be ideal from layering and security perspective. We know how to do that for HTTP. Maybe the host uses keep-alive, but individual HTTP requests are well isolated and long-lived aspect is no business of the application code.

This is not the case with SQL session, you use SET NOCOUNT in your session and now the session is "dirty".

Ralph (Sep 12 2024 at 11:32):

F5's unitd absolutely screams using wasi:http and it's because it handles all the networking.

Ralph (Sep 12 2024 at 11:34):

I'll be very interested to hear Till's ideas here when he gets on the next train. But wrt This is not the case with SQL session, you use SET NOCOUNT in your session and now the session is "dirty"., how would you model that abstractly now?

Ralph (Sep 12 2024 at 11:34):

Ralph (Sep 12 2024 at 11:35):

what are the consequences of a dirtyconn, here? That the conn is long lived and has state floating around but is reused anyway?

Pavel Šavara (Sep 12 2024 at 11:36):

And probably the implementation of the pooling in the Microsoft.Data.SqlClient is able to deal with that already. Modeling it abstractly .. we can't trust the application code to say "i made it dirty" with confidence.

Ralph (Sep 12 2024 at 11:36):

Ralph (Sep 12 2024 at 11:37):

Dave Bakker (badeend) (Sep 12 2024 at 11:37):

In the case of MSSQL specifically: when the shortlived client exits, the host can sp_reset_connection to "clean up" the session and get it ready for the next client session, right?

Ralph (Sep 12 2024 at 11:38):

Pavel Šavara (Sep 12 2024 at 11:39):

I means I didn't know about it. And that I love it's there. Is is "clean enough" ? IDK

Ralph (Sep 12 2024 at 11:39):

Ralph (Sep 12 2024 at 11:40):

question: does Microsoft.Data.SqlClient only connect to mssql? or can u use it against other dbs?

Dave Bakker (badeend) (Sep 12 2024 at 11:41):

Pavel Šavara (Sep 12 2024 at 11:41):

Dave Bakker (badeend) (Sep 12 2024 at 11:42):

But, AFAIK, every major database has its own equivalent of sp_reset_connection

Ralph (Sep 12 2024 at 11:42):

So that's a research point, because something like that will really help this conversation about sql

Dave Bakker (badeend) (Sep 12 2024 at 11:42):

Pavel Šavara (Sep 12 2024 at 11:43):

Ralph (Sep 12 2024 at 11:43):

Ralph (Sep 12 2024 at 11:44):

that one requires thought. I still think the focus on lifetimes of things doing dependent caching for layers above is the thing that pops the design free

Ralph (Sep 12 2024 at 11:45):

you can handle varying lifetimes in the same component, but without internal threading that's going to bog down

Ralph (Sep 12 2024 at 11:46):

but kept separate, you have more possibilities and are likely leaning into the component memory boundary feature

Ralph (Sep 12 2024 at 11:46):

Dave Bakker (badeend) (Sep 12 2024 at 11:46):

I don't see what threads have to with this, though. The real issue to me seems to be: how to compose components with varying instance lifetimes.
E.g. in this case there should be a SQL "driver"/"service" instance with a longer lifetime than any of its short-lived consumers. _Without_ having to special case everything as special wasmtime/host behavior

Ralph (Sep 12 2024 at 11:47):

What I'm saying is that currently if you wanna do pooling, your pool is going to want to scale out and that's done using async/threads.

Ralph (Sep 12 2024 at 11:48):

if you wanted a wasi:connectionpooling impl, you're either going to have everythign be a component inside or youu're going to use async/threads inside because that's something you already know how to do

Dave Bakker (badeend) (Sep 12 2024 at 11:48):

Right, in the current world, you'd need to set up one long-lived component instance that handles _all_ requests.

Pavel Šavara (Sep 12 2024 at 11:49):

Ralph (Sep 12 2024 at 11:50):

now, @Dave Bakker (badeend) you're right in your focus on "compose components with varying lifetimes". Right now, using wasi:http, we don't use threads to go fast! in fact, threads get in the way. The question becomes more important once "people" want to build a connectionpooling component. They can do it using subcomponents as shorterlived items that the outercomponents manages. it's essentially a very small "serverless" approach to avoid threads.

Till Schneidereit (Sep 12 2024 at 11:50):

well, I had written a long thing, which didn't go through because WIFI on trains, and now Zulip reloaded and (properly: correctly) decided that all of that was too poorly worded to retain

Ralph (Sep 12 2024 at 11:51):

in centralized services, you're always going to be handling the really large scale, long lived stuff outside the guest function

Ralph (Sep 12 2024 at 11:52):

Till Schneidereit (Sep 12 2024 at 11:52):

Dave Bakker (badeend) (Sep 12 2024 at 11:52):

It Depends™. There exists a conversation somewhere on this Zulip with much more background on that. But the TLDR is somewehre on the spectrum between: "No" and "Yes, but it will be a lot of work"

Ralph (Sep 12 2024 at 11:53):

but I'm thinking again of the hardware gateway that won't be updated for years and for which someone might want the entire webserver in a component.

Till Schneidereit (Sep 12 2024 at 11:53):

anyway, I propose we set threading aside, because I think we can fully assume that we want to have a way to do pooling without requiring (very) long-lived instances

Ralph (Sep 12 2024 at 11:53):

Dave Bakker (badeend) (Sep 12 2024 at 12:02):

While the "dirty/clean" aspect is an important prerequisite for connection sharing to work, its not really of importance to the WASI/WIT/Components discussion. Either the underlying protocol supports it (HTTP, SQL, ..) and can be implemented by an implementation-specific "driver". Or: the protocol doesn't support it, in which case there's also no need to think about it any further here :P

Ralph (Sep 12 2024 at 12:02):

Dave Bakker (badeend) (Sep 12 2024 at 12:02):

Ralph (Sep 12 2024 at 12:03):

Till Schneidereit (Sep 12 2024 at 12:03):

exactly! (And I now have proof that Zulip was correct in eating my homework: that's much more concise than what I had)

Till Schneidereit (Sep 12 2024 at 12:05):

What I'm imagining as a minimum client connection pooling API for wasi:tls would be roughly this:

interface client-connection-pool {
    put(connection: client-connection, identities: option<list<borrow<private-identity>>>);
    get(identities: option<list<borrow<private-identity>>>) -> option<result<client-connection>>;
}

The idea being that for connections that make use of client certificates, you must prove that you'd be able to create a new connection with the same certificate, otherwise you shouldn't get to reuse an existing one.

Till Schneidereit (Sep 12 2024 at 12:05):

I guess it'd make sense to add a few things such as optional TTL setting and such

Till Schneidereit (Sep 12 2024 at 12:06):

Till Schneidereit (Sep 12 2024 at 12:07):

@Dave Bakker (badeend) do you see any reason why we wouldn't be able to implement this kind of pool?

Till Schneidereit (Sep 12 2024 at 12:07):

oh also, I think it'd make sense to have the same kind of pool for non-TLS socket connections

Dave Bakker (badeend) (Sep 12 2024 at 12:10):

Hmm. I'd have to think about it more.
My initial reaction is that TCP/TLS sockets is the wrong abstraction level (too low) to provide a pool for. As Resetting a connection requires higher-level knowledge on how to do that. (e.g. the sp_reset_connection example from above)

Till Schneidereit (Sep 12 2024 at 12:13):

my thinking is that, outside of a hypothetical wasi:mssql, it should be up to the component to ensure that the connection is ready for reuse. Yes, that does mean that there's a risk of improperly reusing a connection, but that seems pretty fundamental to me (again, outside of higher-level interfaces)

Ralph (Sep 12 2024 at 12:18):

Dave Bakker (badeend) (Sep 12 2024 at 12:21):

Maybe, that can only work if the components are cooperating and can 100% trust each other.

Till Schneidereit (Sep 12 2024 at 12:22):

I would imagine that the most common scenario is for this pool to be implemented by the host

Ralph (Sep 12 2024 at 12:23):

most commonly yes, but in the future a long lived client component could want to pool a large number of calls as well. But first things first.

Dave Bakker (badeend) (Sep 12 2024 at 12:26):

Till Schneidereit (Sep 12 2024 at 12:27):

true, yes. But in that case it seems like you're fundamentally trusting the pooling component to give you back a connection with the same state that you'd have put it in, and conversely the pooling component fundamentally trusts its clients to properly clean up connections before putting them into the pool

Ralph (Sep 12 2024 at 12:28):

yes, this must be the case. the pooling component is an inner component of the ultimate used connection manager. Does that make any sense?

Till Schneidereit (Sep 12 2024 at 12:28):

ah, that gets to another thing Pavel and I talked about yesterday: ideally the pooling mechanism would be client-isolated. I.e., you'd not share a pool with other client components, so you get to rely on the exact set of properties you ensure for pooled connections

Till Schneidereit (Sep 12 2024 at 12:29):

Ralph (Sep 12 2024 at 12:29):

Dave Bakker (badeend) (Sep 12 2024 at 12:29):

Till Schneidereit (Sep 12 2024 at 12:30):

I think for component composition scenarios that'd largely Just Happen, but we'd absolutely want to specify this as part of the semantics

Till Schneidereit (Sep 12 2024 at 12:31):

(then again, we don't even have the spec mechanisms for composing components with differing lifetimes, so who knows whether it'd still Just Happen once we have those)

Ralph (Sep 12 2024 at 12:33):

that's interesting: I was unaware we hadn't fleshed out what happens with different lifetimes. hmmmm. Or are you just saying we don't have the language to describe that?

Dave Bakker (badeend) (Sep 12 2024 at 12:34):

Right. And that's exactly why I'm doubting this solution path. Ideally, a component shouldn't have to worry about pooling at all and would be able to just say "give me a SQL[1] connection, I'll drop it when I'm done". And let the pooling component figure out how to reset & resuse the connection.

Ralph (Sep 12 2024 at 12:35):

Ralph (Sep 12 2024 at 12:36):

what we're discussing here is the underlying imple components that actually do that work, right?

Ralph (Sep 12 2024 at 12:37):

Ralph the PM writing code to call a db should just say, "give me a connection, I'll drop it when I'm done"

Ralph (Sep 12 2024 at 12:37):

but something underneath that interface has to do the work of managing a pool, and underneath that actually implement the pool

Ralph (Sep 12 2024 at 12:37):

Ralph (Sep 12 2024 at 12:38):

Pavel Šavara (Sep 12 2024 at 12:38):

I realized when reading we are possibly dealing with MSDTCbecause of "transaction context"

Ralph (Sep 12 2024 at 12:38):

Till Schneidereit (Sep 12 2024 at 12:38):

@Dave Bakker (badeend) absolutely. But that seems to fundamentally require specific interfaces such as wasi:mssql, no?

The only really alternative way to set up pooling without abstracting all the connection handling completely would involve an interface with setup and teardown/cleanup hooks, where you'd say "give me a connection, and if you need to set it up, call this function, and if you need to reset it, call this one". But I don't think that'd address your concerns at all, because you'd still have to trust that the reset is done correctly

Till Schneidereit (Sep 12 2024 at 12:40):

I mean, just fundamentally something has to do the setup/reset/teardown. And I think we should provide an interface that lets that "something" be the client component. Which then allows us to implement things like a pooling wasi:mssql in user space

Till Schneidereit (Sep 12 2024 at 12:42):

one important aspect here is that we've learned the hard way that even if we wanted to (and could) provide all the high-level interfaces, it'd not be enough: we'd not just have to provide all these interfaces, we'd also have to convince the world to change All The Code to make use of these interfaces instead of the implementations they already have in terms of a lower-level thing

Ralph (Sep 12 2024 at 12:42):

Ralph (Sep 12 2024 at 12:43):

Till Schneidereit (Sep 12 2024 at 12:43):

that's not to say the high-level interfaces aren't a good thing: where possible and where people are asking for them, we should provide them. But we shouldn't force them on people

Ralph (Sep 12 2024 at 12:43):

the higher level and different interfaces will become popular if they hit the sweet spot for users. that is the only path they have. it may well take time for a lot of them.

Till Schneidereit (Sep 12 2024 at 12:44):

(yes, I'm one of the people who had to learn this the hard way. See also: wasi:grpc requiring substantially more work in the spec, host implementations, and all language ecosystems than extending wasi:http to support gRPC)

Ralph (Sep 12 2024 at 12:46):

it will be better having wasi:grpc! But man, that will take time for adoption.

Till Schneidereit (Sep 12 2024 at 12:46):

Ralph (Sep 12 2024 at 12:47):

at a minimum, the best higher level abstractions will take 3-5 years before they hit the sweet spot. It seems like a long time, but it really isn't.

Ralph (Sep 12 2024 at 12:47):

but, I guess the optimistic look at this is that we need time to make those all happen anyway

Ralph (Sep 12 2024 at 12:47):

Till Schneidereit (Sep 12 2024 at 12:48):

I think we got it right by-and-large with wasi:http, and I still believe in the fundamental approach to WASI (and WIT more generally) API design of "as high-level as feasible, but no higher". My thinking on the "feasible" bit has evolved a bit

Ralph (Sep 12 2024 at 12:48):

yes. I have a feeling that "new" interfaces will have more adoption than "why did you screw up tcp?"

Till Schneidereit (Sep 12 2024 at 12:49):

my hope is also that as we see things like component-based middleware, db connectors, etc, all of this will matter less and less, because we'll have much tighter pinch-points

Ralph (Sep 12 2024 at 12:50):

Till Schneidereit (Sep 12 2024 at 12:50):

@Dave Bakker (badeend) how do you feel about TLS connection pooling after all this discussion?

Pavel Šavara (Sep 12 2024 at 13:00):

I think that single-use/throw-away pre-opened anonymous TLS sessions would reduce necessary latency on of the application code. And be generic and secure. It would not bring the scalability. But maybe that good enough for MVP ?

Ralph (Sep 12 2024 at 13:01):

Till Schneidereit (Sep 12 2024 at 13:06):

so you're thinking of something that'd reduce this example to something like this?

// TCP setup:
let(tls_input, tls_output) = wasi_tls::connect("example.com")?.await?;

// Usage:
tls_output.blocking_write_and_flush("GET / HTTP/1.1\r\nHost: example.com\r\n\r\n");
let http_response = tls_input.blocking_read();

println!(http_response);

Lann Martin (Sep 12 2024 at 13:07):

Could we encourage pool-per-client by defining a standard resource but not a standard interface? Consumers would need to define how the pool is exposed but it would take deliberate effort to share between components.
edit: actually I'm not sure bindgen produces the same type for a resource in different interfaces, so maybe not all that useful

Till Schneidereit (Sep 12 2024 at 13:07):

(i.e., let the import handle the DNS lookup, socket connection, and TLS handshake, so that that can all happen concurrently and preemptively)

Pavel Šavara (Sep 12 2024 at 13:08):

Now I'm thinking how to make that transparent to existing C# Socket & SslStream APIs

Till Schneidereit (Sep 12 2024 at 13:09):

Till Schneidereit (Sep 12 2024 at 13:12):

@Lann Martin you'd ultimately still import an interface that would provide a function for acquiring the pool resource handle though, right? so it'd still be the most straightforward thing to always return the same handle

Pavel Šavara (Sep 12 2024 at 13:12):

the host could just hand out unbound handle/resource when wasi:sockets:connect and if that is followed by call to TLS transform, it could take it from different pool.

Lann Martin (Sep 12 2024 at 13:12):

The problem is that it is disastrous to share a TLS connection pool with an untrusted component

Till Schneidereit (Sep 12 2024 at 13:12):

but I just remember that there were sketches somewhere about shared and non-shared instance imports. We'd want a non-shared instance import here, I think

Till Schneidereit (Sep 12 2024 at 13:13):

Lann Martin (Sep 12 2024 at 13:14):

an "optimistic pre-fetching pool" (or whatever you want to call the pre-warmed connection approach) definitely seems like the best bang-for-buck

Till Schneidereit (Sep 12 2024 at 13:14):

i.e., to the degree we have this issue for connection pools, we also have it for anything that can establish outgoing connections

Lann Martin (Sep 12 2024 at 13:14):

Pavel Šavara (Sep 12 2024 at 13:15):

Those would be throw away, after the end of life of that resource. The host would consider it dirty and actually close the real connection. Is that not enough ?

Till Schneidereit (Sep 12 2024 at 13:16):

what I mean is that a specific outgoing-handler provides specific capabilities, at least as long as the exporter applies some kind of restrictions to where requests can be sent to

Lann Martin (Sep 12 2024 at 13:16):

@Pavel Šavara Yeah sorry; we should call the "pre-warmed connections" idea something other than "pooling". I like that idea.

Till Schneidereit (Sep 12 2024 at 13:16):

a naive userspace implementation of outgoing-handler in a persistent instance would share its allowlist with all importers

Till Schneidereit (Sep 12 2024 at 13:17):

same as a naive userspace implementation of a connection pool would share the pool with all importers

Lann Martin (Sep 12 2024 at 13:24):

I guess I'm just thinking of the obvious "allow all" case for both http and a tls pool. For HTTP, a reused connection has pretty well understood state(lessness), assuming the implementation doesn't allow returning e.g. websockets to the pool. A TLS socket pool has too much flexibility here; e.g. a malicious component could set up an HTTP proxy on a socket and then put it in the pool masquerading as a "normal" HTTPS connection to the proxy host.

Till Schneidereit (Sep 12 2024 at 13:29):

I agree that that is a very bad scenario. It seems to me like it's ultimately one example of the more fundamental issue that the moment you allow yourself to be imported by multiple components you'd better ensure that you retain the right level of isolation between the state you provide them with

Dave Bakker (badeend) (Sep 12 2024 at 13:29):

What are the actual real-world use cases we're thinking of here, other than HTTP & SQL?

Joel Dice (Sep 12 2024 at 13:29):

Lann Martin (Sep 12 2024 at 13:30):

Till Schneidereit (Sep 12 2024 at 13:30):

Till Schneidereit (Sep 12 2024 at 13:31):

where not all of them are actual real-world use cases, but there are enough that I think it makes sense for us to treat them as unbounded

Joel Dice (Sep 12 2024 at 13:35):

There's a continuum of "how much I trust the other component" from "I don't trust it at all" (in which case I probably shouldn't be using it) to "I wrote it myself and trust it completely, and I have other reasons besides security to make it a separate component" (e.g. different lifetimes, different languages, etc.).

Till Schneidereit (Sep 12 2024 at 13:36):

@Lann Martin the more I think about it, the more I really don't think connection pools are special. Fundamentally, a very reasonable approximation is Thou Shalt Not Mix Capabilities.

I do think this poses very interesting problems for composition between long- and short-lived things, which I think we've only gotten away with ignoring so far because existing systems manage capabilities in the host pretty exclusively

Till Schneidereit (Sep 12 2024 at 13:37):

as in, I don't even know if we'd have a mechanism by which a component that'd be imported by two other components would be able to tell apart which of those a call originated in

Lann Martin (Sep 12 2024 at 13:38):

Till Schneidereit (Sep 12 2024 at 13:38):

Till Schneidereit (Sep 12 2024 at 13:39):

better question: how do you ensure that a component importing you should get access to the same resource a previous instance of the same component definition did?

Till Schneidereit (Sep 12 2024 at 13:41):

Say I have a single, long-lived, component Pool, imported by an arbitrary series of instances of both A and B. How can I tell calls from instances of A apart from those of instances of B, so I can establish isolated caches for each of the component definitions?

Dave Bakker (badeend) (Sep 12 2024 at 13:42):

Everything discussed so far isn't POSIX (obviously), so "All The Code" will need to change anyway in some form or another. Right?

Till Schneidereit (Sep 12 2024 at 13:43):

One possible answer could be "you don't, because that's not a setup you get to have". Instead, there could be one long-lived instance Pool-A imported by all instances of A, and another Pool-B imported by all instances of B

Till Schneidereit (Sep 12 2024 at 13:44):

That was my intuition as well, but I don't think it holds, no: there's a huge difference between having to change an ecosystem's HTTP abstraction(s) and let everything on top work without modification, and having to change all the things on top individually

Lann Martin (Sep 12 2024 at 13:46):

The only idea that comes to mind that would be compatible with existing code would be per-instance session management

Till Schneidereit (Sep 12 2024 at 13:47):

as in, the thing we have now for sockets, and will have for TLS with Dave's proposal?

Dave Bakker (badeend) (Sep 12 2024 at 13:47):

I agree that the required changes should be limited to the standard-library(-like) libraries, and should not impact each and every application

Lann Martin (Sep 12 2024 at 13:48):

Lann Martin (Sep 12 2024 at 13:49):

or ~equivalently slice up composed components to give them wrapped copies of shared imports

Till Schneidereit (Sep 12 2024 at 13:50):

Lann Martin (Sep 12 2024 at 13:50):

Joel Dice (Sep 12 2024 at 13:51):

Would it make sense to add a "hint flag" to the component model that tells the host "instances of this (sub)component should be kept alive and reused if possible", i.e. the app will work fine even if the hint is ignored, but it will work better if the instances are reused? I believe we discussed this with @Luke Wagner and others already, but I don't recall if we discussed the scenario where e.g. a component has three subcomponents, only one of which has the hint flag attached to it, meaning the other two are not expected to be reused (and in fact shouldn't be reused), but they may use the "long lived" one to cache state.

Till Schneidereit (Sep 12 2024 at 13:51):

with dynamic instantiation, you could imagine a component exporting an API that gives a fresh instance for an interface instance export, but provides that exported instance with imports that are shared internally. Then each of those short-lived instances could hold a session key

Pavel Šavara (Sep 12 2024 at 13:52):

Are you guys still discussing session "pool" ? Meaning that the session would not be throw-away ? And Joel means "hint that I trust the pool" ? I think I got lost.

Joel Dice (Sep 12 2024 at 13:53):

My comment was regarding the general problem of caching for otherwise short-lived instances. Could be data caching, connection caching, or whatever.

Lann Martin (Sep 12 2024 at 13:53):

You don't even need dynamic instantiation per-se, just preprocessing to split shared imports plus a convention for how the host maps those split imports to their components

Till Schneidereit (Sep 12 2024 at 13:54):

I guess I'm not convinced that a pool is inherently more dangerous. And OTOH, prewarmed connections would require a completely different approach to establishing connections from what's proposed right now—one which would be harder to integrated into content toolchains

Lann Martin (Sep 12 2024 at 13:55):

Some prewarming strategies wouldn't require new interfaces; as a simple example you could immediately prewarm a connection upon opening a cold connection, optimistically assuming that it is likely to be used soon

Pavel Šavara (Sep 12 2024 at 13:56):

And there could be configuration that for each IP the pre-warmer should keep 10 open un-used connections ready.

Till Schneidereit (Sep 12 2024 at 13:57):

given that Dave's proposal has multiple discrete steps, are you suggesting the host (or more generally, the exporter) would effectively record the sequence of these steps and then rerun them optimistically because they're likely to be repeated in exactly the same way?

Dave Bakker (badeend) (Sep 12 2024 at 13:58):

@Till Schneidereit Earlier you gave a client-connection-pool example. Does this need to be specialized for any kind of transport (TCP and/or TLS, ..) or can it even be a generic duplex stream pool, like:

get-preopened-pool: func(name: string) -> option<io-pool>;

interface io-pool {
    open() -> tuple<input-stream, output-stream>;
    close(input-stream, output-stream);
}

Till Schneidereit (Sep 12 2024 at 13:59):

ooh, that's a great question! I can't immediately see any reason why you'd not be able to generalize it.

The one thing you'd lose is having to provide a proof that you'd be able to recreate the same configuration

Till Schneidereit (Sep 12 2024 at 13:59):

Pavel Šavara (Sep 12 2024 at 14:00):

I assumed that wasi:TCP & wasi:TLS are fused implementation. And that they know how to do that handshake, there are probably no application specific steps to replay, or are there ?

Lann Martin (Sep 12 2024 at 14:00):

Pavel Šavara (Sep 12 2024 at 14:01):

Till Schneidereit (Sep 12 2024 at 14:01):

Lann Martin (Sep 12 2024 at 14:02):

Well "token" is maybe the wrong term. You could for example derive the pool key from a TLS private key

Pavel Šavara (Sep 12 2024 at 14:02):

Till Schneidereit (Sep 12 2024 at 14:03):

oh, and you'd be able to derive the same key/hash/token by an operation taking the same imputs

Till Schneidereit (Sep 12 2024 at 14:03):

Lann Martin (Sep 12 2024 at 14:04):

In case this isn't obvious: this requires very careful design; don't just hash the PK

Till Schneidereit (Sep 12 2024 at 14:04):

That's at least not how things are specified right now, and it gets us back to higher-level abstractions being harder to integrate into existing toolchains

Lann Martin (Sep 12 2024 at 14:06):

hmm actually this is a good question for TLS in particular; do we actually need socket pools or do we just need session resumption?

Lann Martin (Sep 12 2024 at 14:07):

TLS 1.3 has "0-RTT" session resumption, which just requires a secret resumption ticket iirc

Lann Martin (Sep 12 2024 at 14:08):

Till Schneidereit (Sep 12 2024 at 14:12):

if we wanted to support a proof system for a general-purpose cache, we could do something like this:

Pavel Šavara (Sep 12 2024 at 14:15):

How would the new instance ask for connection from pool? What would give it the token ?

Till Schneidereit (Sep 12 2024 at 14:15):

actually, ignore all the "and token" parts of the first list: those aren't needed

Till Schneidereit (Sep 12 2024 at 14:15):

Lann Martin (Sep 12 2024 at 14:16):

for backward compatibility: trade the token to the host for a reserved ip/port that can be passed to existing client code

Dave Bakker (badeend) (Sep 12 2024 at 14:19):

I'm a bit overwhelmed by all the discussion above. Could someone explain why we need "tokens/private-keys/proof-system/etc.." ?

Lann Martin (Sep 12 2024 at 14:20):

It is essentially about "authenticating" that your instance has permission/capability to get a particular connection from the pool

Till Schneidereit (Sep 12 2024 at 14:21):

the concern I'm trying to address is that you shouldn't be able to reuse a connection if you'd not be able to create a new one with the same properties. For example if you lost access to a client certificate, you shouldn't be able to reuse a connection that used that certificate

Dave Bakker (badeend) (Sep 12 2024 at 14:24):

Continuing with my example; that could be the responsibility of the child component who is putting in the streams into the pool. By putting them in, they're also giving access to use the streams. Capability-style.

Dave Bakker (badeend) (Sep 12 2024 at 14:25):

It then becomes a game of logically divvying up separate kind of streams (with different permissions / authority) into separate pools.

Lann Martin (Sep 12 2024 at 14:26):

Yeah, "private imports" are a reasonable approach. I think the tooling just doesn't support them yet.

Till Schneidereit (Sep 12 2024 at 14:27):

Till Schneidereit (Sep 12 2024 at 14:28):

Till Schneidereit (Sep 12 2024 at 14:29):

i.e., being able to put a connection into the pool isn't sufficient to ensure that it's also okay to later retrieve it from the pool

Pavel Šavara (Sep 12 2024 at 14:30):

Till Schneidereit (Sep 12 2024 at 14:30):

// TCP setup:
fn from_pool() -> result<(io::incoming_stream, io::outgoing_stream)> {
    let ip_token = wasi_sockets::get_address_resolution_token("example.com")?;
    let connection_token = wasi_sockets::get_connection_token(ip_token, 443)?;
    let tls_token = wasi_tls::get_client_connection_token(connection_token)?;
    wasi::io_pool::get(tls_token)
}

let (tls_input, tls_output)) = match from_pool() {
    Ok(connection) => connection,
    _ => {
        // TCP setup:
        let ip = wasi_sockets::resolve_addresses("example.com").await?[0];
        let tcp_client = wasi_sockets::TcpSocket::new();
        let (tcp_input, tcp_output) = tcp_client.connect(ip, 443).await;

        // TLS setup:
        let (tls_input, tls_output) = wasi_tls::ClientConnection::new(tcp_input, tcp_output)
            .connect("example.com")?
            .finish().await?;
    }
}

// Usage:
tls_output.blocking_write_and_flush("GET / HTTP/1.1\r\nHost: example.com\r\n\r\n");
let http_response = tls_input.blocking_read();

println!(http_response);

// Reset and prepare for reuse
// [Do whatever is needed to reset the connection]
wasi::io_pool::put(tls_input, tls_output);

Till Schneidereit (Sep 12 2024 at 14:31):

the same issue applies to allowlists for outgoing connections, and to the ability to do DNS resolution

Dave Bakker (badeend) (Sep 12 2024 at 14:33):

Till Schneidereit (Sep 12 2024 at 14:35):

Till Schneidereit (Sep 12 2024 at 14:36):

I shouldn't have used "revoked": I mean only the access to the certificate, not the certificate itself

Dave Bakker (badeend) (Sep 12 2024 at 14:37):

Yeah, but the physical connection is still alive in the meantime. For the native TLS implementation, it would still be considered "in use"

Till Schneidereit (Sep 12 2024 at 14:38):

right. All I'm trying to ensure is that if you're losing access to the capability required to create a new connection, you should also lose the ability to reuse an existing connection established with those capabilities

Till Schneidereit (Sep 12 2024 at 14:39):

if we want to do a TLS specific pool, that's easy to tie to the certificate. But I really like your idea of a more general IO pool, and I think this setup would enable it

Pavel Šavara (Sep 12 2024 at 15:01):

Pavel Šavara (Sep 12 2024 at 16:08):

that's SQL client connection code, which ideally would not need any modifications, if we were able to hide those "caching tokens" inside dotnet base class library. I don't know if that's possible.

Luke Wagner (Sep 17 2024 at 22:50):

Chatting with Joel and Lann a bit more about this, my impression is that, while, in the abstract, I see the value of enabling TLS connections to be pooled by the host (just like HTTP already allows), given the pure semantics of TLS (with no baked-in knowledge that we're, e.g., talking to a database with a "reset" command), it doesn't seem like something we can safely do in general without putting too much trust in the guests. Thus, I like the idea of using a long-lived instance that handles multiple requests while maintaining its own guest-implemented pool of long-lived TLS connections.

The issue Joel mentioned above for the "reuse hint" is C-M/#307 and my impression is that the reuse hint proposed in that issue is a perfect match for this use case and a good short-term "Step 1".

As for a later "Step 2", it does seem like, as Joel suggested, we can recover the per-request isolation using the "runtime instantiation" feature (which I think should be the last significant feature to add after Preview 3 before a 1.0-rc). With runtime instantiation, a long-lived root component could create 1 long-lived connection-pooling child instance, and then export a "handle request" function that internally uses runtime-instantiation to create a fresh request-handler child instance per request, with these dynamic children all importing the same connection-pooling instance. What's nice is that this would all be under producer toolchain control, which I think is important because I expect there are many fine-grained policy choices to tweak how this works that we wouldn't want to bake once-and-for-all into the spec or host implementation.

Consider defining an "instance reuse hint" · Issue #307 · WebAssembly/component-model

There's an interesting question and discussion in wasi-http/#95 that, by the end, doesn't feel specific to "HTTP" at all and thus perhaps deserving of being addressed more generally in the Componen...

Stream: wasi

Topic: Database connection pooling or pre-warmed connections

Pavel Šavara (Sep 12 2024 at 08:14):

Ralph (Sep 12 2024 at 10:42):

Ralph (Sep 12 2024 at 10:43):

Ralph (Sep 12 2024 at 10:45):

Ralph (Sep 12 2024 at 11:13):

Ralph (Sep 12 2024 at 11:13):

Ralph (Sep 12 2024 at 11:15):

Ralph (Sep 12 2024 at 11:17):

Ralph (Sep 12 2024 at 11:18):

Ralph (Sep 12 2024 at 11:20):

Till Schneidereit (Sep 12 2024 at 11:21):

Ralph (Sep 12 2024 at 11:21):

Till Schneidereit (Sep 12 2024 at 11:22):

Till Schneidereit (Sep 12 2024 at 11:22):

Ralph (Sep 12 2024 at 11:23):

Till Schneidereit (Sep 12 2024 at 11:24):

Ralph (Sep 12 2024 at 11:24):

Till Schneidereit (Sep 12 2024 at 11:24):

Till Schneidereit (Sep 12 2024 at 11:26):

Ralph (Sep 12 2024 at 11:27):

Ralph (Sep 12 2024 at 11:28):

Ralph (Sep 12 2024 at 11:30):

Ralph (Sep 12 2024 at 11:31):

Pavel Šavara (Sep 12 2024 at 11:32):

Ralph (Sep 12 2024 at 11:32):

Ralph (Sep 12 2024 at 11:34):

Ralph (Sep 12 2024 at 11:34):

Ralph (Sep 12 2024 at 11:35):

Pavel Šavara (Sep 12 2024 at 11:36):

Ralph (Sep 12 2024 at 11:36):

Ralph (Sep 12 2024 at 11:37):

Dave Bakker (badeend) (Sep 12 2024 at 11:37):

Ralph (Sep 12 2024 at 11:38):

Ralph (Sep 12 2024 at 11:38):

Pavel Šavara (Sep 12 2024 at 11:39):

Ralph (Sep 12 2024 at 11:39):

Ralph (Sep 12 2024 at 11:40):

Dave Bakker (badeend) (Sep 12 2024 at 11:41):

Pavel Šavara (Sep 12 2024 at 11:41):

Dave Bakker (badeend) (Sep 12 2024 at 11:42):

Ralph (Sep 12 2024 at 11:42):

Dave Bakker (badeend) (Sep 12 2024 at 11:42):

Pavel Šavara (Sep 12 2024 at 11:43):

Ralph (Sep 12 2024 at 11:43):

Ralph (Sep 12 2024 at 11:44):

Ralph (Sep 12 2024 at 11:45):

Ralph (Sep 12 2024 at 11:46):

Ralph (Sep 12 2024 at 11:46):

Dave Bakker (badeend) (Sep 12 2024 at 11:46):

Ralph (Sep 12 2024 at 11:47):

Ralph (Sep 12 2024 at 11:48):

Dave Bakker (badeend) (Sep 12 2024 at 11:48):

Pavel Šavara (Sep 12 2024 at 11:49):

Ralph (Sep 12 2024 at 11:50):

Till Schneidereit (Sep 12 2024 at 11:50):

Ralph (Sep 12 2024 at 11:51):

Ralph (Sep 12 2024 at 11:51):

Ralph (Sep 12 2024 at 11:52):

Till Schneidereit (Sep 12 2024 at 11:52):

Dave Bakker (badeend) (Sep 12 2024 at 11:52):

Ralph (Sep 12 2024 at 11:53):

Till Schneidereit (Sep 12 2024 at 11:53):

Ralph (Sep 12 2024 at 11:53):

Dave Bakker (badeend) (Sep 12 2024 at 12:02):

Ralph (Sep 12 2024 at 12:02):

Dave Bakker (badeend) (Sep 12 2024 at 12:02):

Ralph (Sep 12 2024 at 12:03):

Till Schneidereit (Sep 12 2024 at 12:03):

Till Schneidereit (Sep 12 2024 at 12:05):

Till Schneidereit (Sep 12 2024 at 12:05):

Till Schneidereit (Sep 12 2024 at 12:06):

Till Schneidereit (Sep 12 2024 at 12:07):

Till Schneidereit (Sep 12 2024 at 12:07):

Dave Bakker (badeend) (Sep 12 2024 at 12:10):

Till Schneidereit (Sep 12 2024 at 12:13):

Ralph (Sep 12 2024 at 12:18):

Dave Bakker (badeend) (Sep 12 2024 at 12:21):

Till Schneidereit (Sep 12 2024 at 12:22):