Hello everybody. I'm new to the community, although I've been following WebAssembly for quite some time. I just started a new project that is going to be using WebAssembly. One of the requirements I have is to be able to use sockets. I know the support for sockets is not yet there so I thought I could help with that.
This is the initial WASI additions, this only covers connect for the moment. I just wanted to get a feeling as to if this is the right direction or not before I continue and do the implementation on wasi-common.
typenames.witx
;;; Socket type
(typename $socktype
(enum
;;; The file descriptor or file refers to a datagram socket.
$socket_dgram
;;; The file descriptor or file refers to a byte-stream socket.
$socket_stream
)
)
;;; IP port number
(typename $ipport u16)
;;; An IPv4 address is a 32-bit number that uniquely identifies a network interface on a machine.
(typename $ipaddr4
(struct
(field $n0 u8)
(field $n1 u8)
(field $h0 u8)
(field $h1 u8)
)
)
(typename $ipaddr4_array (array $ipaddr4))
;;; An IPv6 address is a 128-bit number that uniquely identifies a network interface on a machine.
(typename $ipaddr6
(struct
(field $n0 u16)
(field $n1 u16)
(field $n2 u16)
(field $n3 u16)
(field $h0 u16)
(field $h1 u16)
(field $h2 u16)
(field $h3 u16)
)
)
(typename $ipaddr6_array (array $ipaddr6))
wasi_snapshot_preview1.witx
;;; Resolves a hostname to one or more IPv4 addresses
;;; Note: This is similar to `getaddrinfo` in POSIX
(@interface func (export "addr_resolve_ip4")
;;; Host to resolve
(param $host string)
(result $ipaddr4 $ipaddr4_array)
(result $error $errno)
)
;;; Resolves a hostname to one or more IPv6 addresses
;;; Note: This is similar to `getaddrinfo` in POSIX
(@interface func (export "addr_resolve_ip6")
;;; Host to resolve
(param $host string)
(result $ipaddr6 $ipaddr6_array)
(result $error $errno)
)
;;; Open a local socket
;;; Note: This is similar to `socket` in POSIX using PF_UNIX
(@interface func (export "sock_open")
;;; Socket type, either datagram or stream
(param $type $socktype)
;;; The opened socket
(result $fd $fd)
(result $error $errno)
)
;;; Open a IPv4 socket
;;; Note: This is similar to `socket` in POSIX using PF_INET
(@interface func (export "sock_open_ip4")
;;; Socket type, either datagram or stream
(param $type $socktype)
;;; The opened socket
(result $fd $fd)
(result $error $errno)
)
;;; Open a IPv4 socket
;;; Note: This is similar to `socket` in POSIX using PF_INET6
(@interface func (export "sock_open_ip6")
;;; Socket type, either datagram or stream
(param $type $socktype)
;;; The opened socket
(result $fd $fd)
(result $error $errno)
)
;;; Initiate a connection on a local socket
;;; Note: This is similar to `connect` in POSIX, though rather than receiving a sockaddr
;;; we splitted it into separate functions with different arguments.
(@interface func (export "sock_connect")
;;; Socket descriptor
(param $fd $fd)
;;; Address of the local socket (ex '/dev/tmp')
(param $addr string)
(result $error $errno)
)
;;; Initiate a connection on a network socket using IPv4
;;; Note: This is similar to `connect` in POSIX when the sockaddr uses AF_INET
(@interface func (export "sock_connect_ip4")
;;; Socket descriptor
(param $fd $fd)
;;; Address of the network socket
(param $addr $ip4)
;;; Port number to connect to
(param $port $ipport)
(result $error $errno)
)
;;; Initiate a connection on a network socket using IPv4
;;; Note: This is similar to `connect` in POSIX when the sockaddr uses AF_INET6
(@interface func (export "sock_connect_ip6")
;;; Socket descriptor
(param $fd $fd)
;;; Address of the network socket
(param $addr $ip6)
;;; Port number to connect to
(param $port $ipport)
(result $error $errno)
)
I tried to keep the spirit of other parts of WASI such as file functions, not sure that I got it right.
I just realized that wiggle doesn't support array result elements. So I copy the approach taken for fd_readdir.
Here is the revised witx.
;;; Resolves a hostname to one or more IPv4 addresses.
;;; Note: This is similar to `getaddrinfo` in POSIX
;;;
;;; When successful, the contents of the output buffer consist of a sequence of
;;; IPv4 addresses. Each address entry consists of a ipaddr4_t object.
;;
;;; This function fills the output buffer as much as possible, potentially
;;; truncating the last address entry. It is advisable that the buffer is
;;; always multiple of ipaddr4_t size.
(@interface func (export "addr_resolve_ip4")
;;; Host to resolve
(param $host string)
;;; The buffer where IP address are stored
(param $buf (@witx pointer u8))
(param $buf_len $size)
(result $error $errno)
;;; The number of bytes stored in the buffer. If less than the size of the buffer, no more IP addresses are available.
(result $bufused $size)
)
;;; Resolves a hostname to one or more IPv6 addresses.
;;; Note: This is similar to `getaddrinfo` in POSIX
;;;
;;; When successful, the contents of the output buffer consist of a sequence of
;;; IPv4 addresses. Each address entry consists of a ipaddr6_t object.
;;
;;; This function fills the output buffer as much as possible, potentially
;;; truncating the last address entry. It is advisable that the buffer is
;;; always multiple of ipaddr5_t size.
(@interface func (export "addr_resolve_ip6")
;;; Host to resolve
(param $host string)
;;; The buffer where IP address are stored
(param $buf (@witx pointer u8))
(param $buf_len $size)
(result $error $errno)
;;; The number of bytes stored in the buffer. If less than the size of the buffer, no more IP addresses are available.
(result $bufused $size)
)
;;; Open a local socket
;;; Note: This is similar to `socket` in POSIX using PF_UNIX
(@interface func (export "sock_open")
;;; Socket type, either datagram or stream
(param $type $socktype)
;;; The opened socket
(result $fd $fd)
(result $error $errno)
)
;;; Open a IPv4 socket
;;; Note: This is similar to `socket` in POSIX using PF_INET
(@interface func (export "sock_open_ip4")
;;; Socket type, either datagram or stream
(param $type $socktype)
;;; The opened socket
(result $fd $fd)
(result $error $errno)
)
;;; Open a IPv4 socket
;;; Note: This is similar to `socket` in POSIX using PF_INET6
(@interface func (export "sock_open_ip6")
;;; Socket type, either datagram or stream
(param $type $socktype)
;;; The opened socket
(result $fd $fd)
(result $error $errno)
)
;;; Initiate a connection on a local socket
;;; Note: This is similar to `connect` in POSIX, though rather than receiving a sockaddr
;;; we splitted it into separate functions with different arguments.
(@interface func (export "sock_connect")
;;; Socket descriptor
(param $fd $fd)
;;; Address of the local socket (ex '/dev/tmp')
(param $addr string)
(result $error $errno)
)
;;; Initiate a connection on a network socket using IPv4
;;; Note: This is similar to `connect` in POSIX when the sockaddr uses AF_INET
(@interface func (export "sock_connect_ip4")
;;; Socket descriptor
(param $fd $fd)
;;; Address of the network socket
(param $addr $ipaddr4)
;;; Port number to connect to
(param $port $ipport)
(result $error $errno)
)
;;; Initiate a connection on a network socket using IPv4
;;; Note: This is similar to `connect` in POSIX when the sockaddr uses AF_INET6
(@interface func (export "sock_connect_ip6")
;;; Socket descriptor
(param $fd $fd)
;;; Address of the network socket
(param $addr $ipaddr6)
;;; Port number to connect to
(param $port $ipport)
(result $error $errno)
)
I would like to also add timeout for connect
somehow, altho I am not sure as what would be the best approach for that except to make it always mandatory to provide a timeout. I guess the most natural approach would be to add a u16 there with zero being wait forever. Another note, still missing setsockopt equivalent too.
Hi! This is cool! I don't have a lot of time today or tomorrow to dig into this, but I wanted to quickly pop in here and say this is something we're interested in, and to feel free to ask questions and post ideas like this
at a quick glance, one idea: instead of having separate _ip4, _ip6, etc. variants of the functions, another option is to define union
type with fields for each address type, so that applications wanting to support multiple address families don't need to have as many code paths
Yeah, I wasn't sure about splitting them up which is why I ask for feedback. I definitely didn't want to bring addrinfo, sockaddr, etc complexity. Didn't knew we had unions. I'll look into it.
Also as a heads up, I'll want to talk about how to fit this into WASI's capability-based security model, and I'll be happy to help figure that out when we're ready.
I have to head out now, but feel free to ask questions here, and I or others will answer them when we can!
@Dan Gohman Thank you. I'll be happy to chat some more. Of course we need security features. My current line of thought is a basic feature gate, either you can connect or can't. After which we need some sort of whitelisting "what" you can connect to. That one is more trickier. I mean is easy to whitelist and bunch of IP addresses, or even ranges but the problem lies on host address resolution, how can we avoid DNS spoofing I am not sure.
Ok. I've been working on this some more. Here is the revised WITX.
typenames.witx
;;; Socket type
(typename $sock_type
(enum u8
;;; The file descriptor or file refers to a datagram socket.
$socket_dgram
;;; The file descriptor or file refers to a byte-stream socket.
$socket_stream
)
)
;;; IP port number
(typename $ip_port u16)
;;; Address type
(typename $addr_type
(enum u8
;;; Unix local address
$local
;;; IPv4 address
$ip4
;;; IPv6 address
$ip6
)
)
;;; A local socket address such as /dev/sock
(typename $addr_local
(struct
;;; Pointer to the name of the socket
(field $path (@witx pointer u8))
;;; The length of the name of the socket
(field $path_len $size)
)
)
;;; An IPv4 address is a 32-bit number that uniquely identifies a network interface on a machine.
(typename $addr_ip4
(struct
(field $n0 u8)
(field $n1 u8)
(field $h0 u8)
(field $h1 u8)
)
)
;;; An IPv6 address is a 128-bit number that uniquely identifies a network interface on a machine.
(typename $addr_ip6
(struct
(field $n0 u16)
(field $n1 u16)
(field $n2 u16)
(field $n3 u16)
(field $h0 u16)
(field $h1 u16)
(field $h2 u16)
(field $h3 u16)
)
)
;;; Union of all possible addresses type
(typename $addr
(union $addr_type
(field $local $addr_local)
(field $ip4 $addr_ip4)
(field $ip6 $addr_ip6)
)
)
wasi_snapshot_preview1.witx
;;; Resolves a hostname to one or more IP addresses.
;;; Note: This is similar to `getaddrinfo` in POSIX
;;;
;;; When successful, the contents of the output buffer consist of a sequence of
;;; IPv4 and/or IPv6 addresses. Each address entry consists of a addr_t object.
;;
;;; This function fills the output buffer as much as possible, potentially
;;; truncating the last address entry. It is advisable that the buffer is
;;; always multiple of addr_t size.
(@interface func (export "addr_resolve")
;;; Host to resolve
(param $host string)
;;; The buffer where IP address are stored
(param $buf (@witx pointer u8))
(param $buf_len $size)
(result $error $errno)
;;; The number of bytes stored in the buffer. If less than the size of the buffer, no more IP addresses are available.
(result $bufused $size)
)
;;; Open a socket
;;; Note: This is similar to `socket` in POSIX using PF_INET
(@interface func (export "sock_open")
;;; Address type
(param $addrtype $addr_type)
;;; Socket type, either datagram or stream
(param $socktype $sock_type)
(result $error $errno)
;;; The opened socket
(result $fd $fd)
)
;;; Initiate a connection on a socket to the specified address
;;; Note: This is similar to `connect` in POSIX
(@interface func (export "sock_connect")
;;; Socket descriptor
(param $fd $fd)
;;; Address of the local socket (ex '/dev/tmp')
(param $addr $addr)
(result $error $errno)
)
@Dan Gohman Given our discussion about passing unions. You still think that the union is the better choice vs having individual calls per protocol? I am even not sure that we should support local sockets vs just TCP/UDP. I'm intrigued about what others think on that subject.
I'm curious, what would a just-TCP/UDP API look like?
Berkeley sockets API is so pervasive, I myself don't know what an alternative would look like.
If we do go with sockets, then yes, I still think pointer to union (or union, if the witx tooling adds support for it) is the way to go.
@Dan Gohman I meant higher level, akin to TcpStream in Rust. I agree with you about Sockets API, and I worry about C/C++ folks targeting WebAssembly that depend on that kind of API.
Even Rust's TcpStream
is just a thin abstraction around Berkeley sockets.
Eg. SocketAddr
is a Rust enum, which is just a more type-safe version of union :-)
@Dan Gohman I know it is thanks to trait magic, but you can do TcpStream::connect("rust-lang.org")
. That looks high level, in the sense I don't have to go fetch addresses, open the socket, then connect, etc etc. That is what I mean by higher level. I'll keep on the current path don't worry.
Rust's API may be valuable to look at, because native socket APIs have accumulated a lot of flags and options, and Rust has done a lot of work to figure out which things are important
@Dan Gohman Understood. Once I have something decent and tested. What would be my next step? Should I construct a PR? What is the process?
Ah, I see. I think what we can say there is, even Rust retains the ability to do each of the steps separately if you want to, so we should focus on enabling that, and then we can talk about how to provide convenience functions that combine multiple steps into one
as a followup
Yeah, I think a PR would be a good next step
I don't know how well witx supports this yet, but I feel like it would be possible to use an object that "remembers" the type of the socket.
As in, you can't mismatch the socket family and the address you use.
I think that makes more sense than a union.
The C API lets you do things like pass a sockaddr_un to a TCP socket or vice versa. That seems likely to provoke unsafe bugs in implementations, to the point that we will have to make sure callers can't do that via validation before calling the underlying socket library.
And if we have to remember that anyway, let's encode it in the type system.
I can very easily imagine ways that the interface allowing mismatched calls to connect vs socket could bypass capability security.
@Dan Gohman I finally managed to complete the implementation. I would appreciate if you can give it a quick review before I start assembling the PR.
The repo for the WASI spec changes is here:
https://github.com/Kong/WASI/tree/feat/wasi-sockets
And the reference implementation for it, is here:
https://github.com/Kong/wasmtime/tree/feat/wasi-sockets
sock_open
and addr_resolve
now receive an address pool as we discussed.
Sure. I'm in meeting atm, but I can take a look a little later today
Finally taking a look here; the wasmtime changes look like a good start!
Before submitting a PR, it's worth rebasing on main to avoid the testsuite diffs
Yeah, of course. All the changes are on snapshot. Should I move them to ephemeral? Or should I leave them in snapshot? Also, this would need to be split into two PRs. One for WASI spec, and the other for wasmtime.
Yeah, the WASI changes should be in ephemeral
@Dan Gohman So, the procedure would be:
@Emiliano Lesende Yes
One thing to be aware of is that ephemeral is currently blocked on some big modularization changes, so the timeline for the next snapshot is unclear.
However, I think we can go ahead and put experimental APIs in wasi-common without waiting for a snapshot. We can use different names for now, to avoid conflicts if there are changes during the standardization process.
I think we can put experimental APIs in the WASI bindings too, again without waiting for a snapshot. Possibly by putting them behind a feature flag.
@Emiliano Lesende What is the "address pool" fd? I don't think I understand that.
@Josh Triplett The address pool fd is a pool of IP addresses and ports. You need to configure that at WASI for you to be able to use the socket calls. When you create the WASI ctx you can pass in address pools which the program can then use to issue socket calls. Is a way to have some security control to know who you can connect to/listen to, without overblowing the whole system into a fully featured firewall.
Ah, that makes sense.
@Dan Gohman Ok. I'll see if I can make the change this week.
Following up on a question from earlier in this thread, what happens if you attempt to pass a mismatched address family and address?
(deleted)
Mismatched as in? What you mean?
Wrong branch of the union.
What happens if you pass an IPv6 address and a UNIX socket family, or vice versa?
With the interface posted much earlier in this thread, it seems like that would not get caught, and would get as far as the underlying library/system call.
At the moment there is no UNIX socket support, but having said that if you pass an IPv6 pool and you try to use an IPv4 address you will get a not capable response.
Is that what you where asking about?
Or you mean if I sock_open to an IPv6 address family and then sock_connect to an IPv4? In that case we will pass is along to the OS which in turn it will return you a EAFNOSUPPORT.
That's one possibility, and probably a more innocuous one.
I'm thinking of things like trying to create a UNIX socket but passing an IPv4 or IPv6 address.
A Unix socket address is a string-like buffer with some really strange semantics.
We likely need to do some extra validation there, but if someone can just pass an IPv6 address and have the octets interpreted as the bytes of a UNIX socket address...
I had been wondering if we could enforce a match between the address family used at sock_open
and the address passed to things like connect or bind.
So that it never gets to the OS call if you try to play games like that, which means a much smaller attack surface.
Oh, not sure that can happen thou. Unions at WASI are tagged, which means that sock_connect knows what that address is for and when it converts it to sockaddr
it will set the correct address family. Yeah granted, it will set the correct address family at the sockaddr
struct level but not at the sock_open
.
I missed that it was using a tagged union. I thought WITX had both tagged and untagged.
But yes, I also mean a mismatch between open and later calls like connect.
I don't particularly like the Berkeley socket interface either, and we could delay opening the socket until we know the address family. The reason I didn't go this route is because I was worried what it could mean when we try to implement the C calls in WASI.
I could easily imagine an OS socket implementation ignoring the mismatch between the two calls and just interpreting the buffer passed into the later call using the address family from the open call.
No, I don't think that we should delay opening the socket.
I'm more wondering if we can remember the address family that was given when opening the socket, and enforce a match ourselves, rather than relying on the operating system to do so.
I see. Good point. We could add that extra check, is there an OS API call to retrieve the AF? if not we could store it ourselves in the handle.
I think we should store it ourselves, perhaps by using different types of handle.
Agreed.
That would avoid making an extra OS call. We don't want to slow down the common case of programs behaving correctly.
Makes sense
Thank you. I appreciate you hearing out the concern. :)
Of course. Always.
Because the Berkeley sockets interface is so very C-oriented, and the sockaddr structure in particular is one of the most horrible things that people need to deal with on a regular basis in common programs, I didn't want to just trust the underlying OS to do the right thing for security.
I would be happy to review the updated patch series as soon as you have it available. Send me a link?
You can find the link the repos above. I'll repost it here.
The repo for the WASI spec changes is here:
https://github.com/Kong/WASI/tree/feat/wasi-sockets
And the reference implementation for it, is here:
https://github.com/Kong/wasmtime/tree/feat/wasi-sockets
I meant a link when you have a new patch that enforces the match, and I'd be happy to review that. :)
I already read the patches so far.
Oh sure.
@Dan Gohman @Josh Triplett I noticed that there is a main branch now, should I switch to main for rebasing or keep it on master?
for all Bytecode Alliance repos use main
ok
@Dan Gohman it looks that main points to 0.11.0 of wasi, but the most recent published version of it is still 0.10.0+wasi-snapshot-preview1
Ah, it looks like we should publish a new version then
@Josh Triplett the changes you requested are in :)
@Emiliano Lesende Awesome, thanks!
@Josh Triplett @Dan Gohman The PR for the socket API is in. I cannot send the PR for the implementation yet. I tried to put it behind a feature flag, but there is some work which is complicated as I cannot put feature flags inside match expressions in Rust. I cannot wall everything up properly. Any idea how long it will take to make the sock API into snapshot?
https://github.com/WebAssembly/WASI/pull/312
That sounds like great progress! \o/
Would it make sense to open a PR for the implementation in its current state and then have people weigh in on how best to do the feature flag?
@Till Schneidereit I could do that. I guess you can also take a look here: https://github.com/Kong/wasmtime/tree/feat/wasi-sockets
My issue right now, is on this match:
https://github.com/Kong/wasmtime/blob/feat/wasi-sockets/crates/wasi-common/src/sys/unix/mod.rs#L117
I need to gate that case expression since AddressPool ain't available yet until ephemeral gets merged.
Last updated: Nov 22 2024 at 16:03 UTC