alexcrichton commented on issue #7243:
Thanks for this! I've got a few comments on some API design decisions as well as their implications on the implementation:
Primarily I think it would be best to avoid the usage of
Mutex<T>
in the implementation. I feel like it's best to reflect the concurrency of the underlying OS which is that mutexes aren't required and concurrent calls to various bits and pieces are ok. I realize that mutexes aren't held during blocking I/O operations as a part of this PR, but that effectively means that the lock is overhead and I'd prefer to do away with it. I tested this out locally and the main issue that I ran into was that the inbound/outbound streams need access to theremote_address()
of the state of the socket. This is problematic because, theoretically, astart_connect
can race with asend
orreceive
call, meaning that aMutex
is basically required to arbitrate that race (or something atomic like that). Could this perhaps be altered at the API level where an inbound/outbound stream is returned fromfinish-connect
in addition tofinish-bind
? That way the streams frombind
are never connected and the streams fromconnect
are always connected. That may have weird implications if all the streams are used, though, so I'm not 100% sure.Bikeshed-wise the name
inbound-*
andoutbound-*
are inconsistent withwasi:io/streams
usage ofinput-*
andoutput-*
. Is it perhaps worth it to use the same terminology?Bikeshed-wise the
oubound-datagram-stream
type is similar tooutput-stream
but has a different mechanism for receiving data. Theoutput-stream
type for example has acheck-write
method which returns how much can be written and only then canwrite
be called. Additionallywrite
traps if too much is given to it. The main rationale for this design decision is that it avoids the need for callees to receive a possibly unbounded amount of data forcibly within their module (since callingsend
will copy over all the data into the second module if there is one). This may best be solved with acheck-write
lookalike foroutbound-datagram-stream
as well perhaps?
badeend commented on issue #7243:
I think it would be best to avoid the usage of Mutex<T> in the implementation.
I agree, yet I couldn't figure out how to make it work.
My initial version hadbind
andconnect
both return streams. (Also I allowed connect to be called multiple times, but I didn't want to pollute this PR too much).
In that setup, callingconnect
has to invalidate the previously returned streams somehow. At which point I ended up at Mutex again. Allowing the previous streams to remain somehow usable might be easier to implement, but is arguably a worse API design.
Bikeshed-wise the name inbound-* and outbound-* are inconsistent with wasi:io/streams usage of input-* and output-*. Is it perhaps worth it to use the same terminology?
I (tried to) use the wasi-http naming scheme, as they feel more natural to the domain at hand (networking) as opposed to input&output.
Sidenote, I just noticed that wasi-http actually uses slightly different terminology (incoming&outgoing) :no_mouth:For the stream types, I'm fine with either naming scheme. They're temporary anyway.
For the datagram types themselves I do prefer to keep calling them incoming&outgoing-datagram rather than input&output-datagram.
The output-stream type for example has a check-write method which returns how much can be written and only then can write be called
Good point. I'll look into it.
The main rationale for this design decision is that it avoids the need for callees to receive a possibly unbounded amount of data forcibly within their module
Even with a
check-write
method onoutbound-datagram-stream
that can still happen, right. I assumecheck-write
will return the amount of datagrams that can be sent. But each datagram can still contain an unbounded number of bytes in thedata
field.
alexcrichton commented on issue #7243:
One way to get invalidation working without
Mutex
is to have a generational counter stored in each incoming/outgoing stream which is compared to the "current generation", an atomic counter, stored within theArc
. That way invalidation would be "increment the generation" and that may work out?For naming ok sounds like we're already a bit inconsistent, so I think it's ok to keep the names here :+1:
For
check-write
you're correct that you can always still pass in more data, but implementations can trap when this happens vs being expected to work correctly. I also think that for datagrams the number returned from check-write should probably be "size of all datagrams summed up" since you're right that the size of an individual datagram can change. Unless we can place a hard cap on the size of the datagram though in which case implementations could trap if any individual datagram is too large.
badeend commented on issue #7243:
Potentially answering my own question:
calling connect has to invalidate the previously returned streams somehow
Rather than performing validation in the *-stream code, maybe we can require the consumer to have dropped all prior child streams before calling
connect
(again) and trap otherwise. That way it should be impossible to have multiple competing streams active at the same time.
alexcrichton commented on issue #7243:
Ah I think we raced there a bit, but I like your idea more of requiring the streams are dropped than my idea of a generation counter
badeend commented on issue #7243:
This is ready for review again.
I have:
- Updated the API to disallow multiple pairs of active streams.
- Updated the API to mostly match wasi-io's
input-stream
&output-stream
.- Removed the mutex
- Renamed the resources to align with wasi-http.
badeend commented on issue #7243:
Alright, looks like the MacOS issue is fixed now
Last updated: Jan 24 2025 at 00:11 UTC