Stream: wit-bindgen

Topic: preview1 -> preview2 shim


view this post on Zulip Alex Crichton (Sep 23 2022 at 19:29):

@Dan Gohman I think this is going to be a pretty meaty implementation so I figured I'd start a stream here and see where things go.

One wrinkle I just found (irrespective of *.wat vs *.rs or w/e) -- how is readlink going to work? For example the preview1 API takes a buffer to fill in, but in the canonical ABI it would look more like path_readlink: func(fd: fd) -> result<string, errno>. The difference here is that the cabi_realloc call to allocate space for the string will not necessarily fit within the buffer provided, meaning that if the buffer is too small I'm not sure what the shim module can do because it's too late to learn that and there's nowhere else to return the string.

view this post on Zulip Alex Crichton (Sep 23 2022 at 19:30):

one option would be to fail the allocation in the cabi_realloc implementation, but that would trap and destroy the module which sounds bad

view this post on Zulip Alex Crichton (Sep 23 2022 at 19:31):

the only other option I can think of though is to fit the return value on the stack page and then return E2BIG or w/e to the original caller, but there's no guarantee that the value can fit on the stack page, which further might lead to a trap

view this post on Zulip Alex Crichton (Sep 23 2022 at 19:31):

oh you know the best solution might be something like path_readlink_compat: func(fd: fd, max: u32) -> result<string, errno>

view this post on Zulip Alex Crichton (Sep 23 2022 at 19:31):

where the host simply returns E2BIG if the size is larger than max

view this post on Zulip Dan Gohman (Sep 23 2022 at 20:18):

For readlink, users will typically either use a buffer of size libc::PATH_MAX, which is 4096 in wasi-libc so it'll almost always be big enough, or they'll use a smaller buffer and be prepared for a ENAMETOOLONG error

view this post on Zulip Dan Gohman (Sep 23 2022 at 20:20):

Or oh, readlink's way of indicating that the name is truncated is to return a number of bytes which is equal to the provided buffer size.

view this post on Zulip Alex Crichton (Sep 23 2022 at 20:26):

this is part of the adapter though, where if the new function is readlink: func(fd) -> string then the return value is communicated by calling malloc which the embedder assumes has the right size

view this post on Zulip Alex Crichton (Sep 23 2022 at 20:26):

this means that we can't naively "polyfill" preview1 readlink with something that is -> string because the buffer provided to preview1's readlink originally may not be large enough

view this post on Zulip Dan Gohman (Sep 23 2022 at 20:55):

Ok, my first cut at generating a wat for this is here: https://github.com/sunfishcode/preview1.wasm/blob/main/pregenerated/preview1.wat

Experimental experiments. Contribute to sunfishcode/preview1.wasm development by creating an account on GitHub.

view this post on Zulip Dan Gohman (Sep 23 2022 at 20:55):

the code is here: https://github.com/sunfishcode/preview1.wasm/blob/main

view this post on Zulip Dan Gohman (Sep 23 2022 at 20:57):

As we discussed, it needs some fixing up, but it shows the idea here. I've implemented random_get, where the wit API returns list<u8>, so far.

view this post on Zulip Alex Crichton (Sep 23 2022 at 21:06):

hm this is using dlmalloc though which I think we want to remove

view this post on Zulip Dan Gohman (Sep 23 2022 at 21:07):

Yeah, though it's convenient for now, because it's a real malloc.

view this post on Zulip Dan Gohman (Sep 23 2022 at 21:07):

I added an implementation of path_readlink to preview1.c, to show how it works

view this post on Zulip Alex Crichton (Sep 23 2022 at 21:09):

this works with a full-blown malloc yeah but I think we want to avoid that

view this post on Zulip Alex Crichton (Sep 23 2022 at 21:11):

__wasm_init_memory also looks problematic

view this post on Zulip Alex Crichton (Sep 23 2022 at 21:11):

but that's probably just there b/c of dlmalloc

view this post on Zulip Dan Gohman (Sep 23 2022 at 21:13):

weird, that __wasm_init_memory is just doing a memory.fill to zero out memory which should already be zeroed

view this post on Zulip Alex Crichton (Sep 23 2022 at 21:18):

the fact that it's present though is worrisome, because it may mean .rodata/.data is in play which we can't have

view this post on Zulip Alex Crichton (Sep 23 2022 at 21:18):

(which is one of the reasons I initially didn't want to use a high level language)

view this post on Zulip Dan Gohman (Sep 23 2022 at 21:20):

Ah, it does use static memory, for things like RET_AREA at least

view this post on Zulip Dan Gohman (Sep 23 2022 at 21:24):

oh, dlmalloc has some static state too

view this post on Zulip Alex Crichton (Sep 23 2022 at 21:26):

this is what I meant by writing this would almost be an entirely different dialect of C

view this post on Zulip Alex Crichton (Sep 23 2022 at 21:26):

and there's no static assurances that you're not "doing the wrong thing"

view this post on Zulip Alex Crichton (Sep 23 2022 at 21:26):

and when the wrong thing is done it's not obvious always what did it

view this post on Zulip Dan Gohman (Sep 23 2022 at 21:27):

If I stub out malloc and hack RET_AREA, there's no __wasm_init_memory

view this post on Zulip Alex Crichton (Sep 23 2022 at 21:28):

agreed yeah

view this post on Zulip Alex Crichton (Sep 23 2022 at 21:28):

but you will need access to custom wasm globals

view this post on Zulip Alex Crichton (Sep 23 2022 at 21:28):

to get the pointers provided to preview1 functions into the cabi_realloc function

view this post on Zulip Dan Gohman (Sep 23 2022 at 21:29):

yeah, there should be a way to do that, let me check

view this post on Zulip Alex Crichton (Sep 23 2022 at 21:29):

IIRC inline asm should give you a way to shove it in

view this post on Zulip Dan Gohman (Sep 23 2022 at 21:29):

To be sure, I agree this thing is in a special dialect of C and it's tricky. But it might still be better than writing all the bindings in raw wat

view this post on Zulip Dan Gohman (Sep 23 2022 at 22:29):

I've now added code to create wasm globals, and added a minimal malloc implementation that uses them.

view this post on Zulip Dan Gohman (Sep 23 2022 at 23:15):

And https://github.com/bytecodealliance/wit-bindgen/pull/326 is a PR to move RET_AREA out of static memory.

Move RET_AREA variables out of static memory and onto the stack. This ensures that the bindings are reentrant, which will be needed when the wasi-threads proposal makes threads available in wasm....

view this post on Zulip Dan Gohman (Sep 23 2022 at 23:17):

With those changes, we no longer have a __wasm_init_memory.

view this post on Zulip Alex Crichton (Sep 26 2022 at 17:15):

looking at the malloc changes here again, what I was originally imagining is that when you called fd_read, for example, the malloc return value would be the pointer passed to fd_read (similarly for path_readlink or things like that). That way we don't have to worry about free and things like that and we're basically leveraging the fact that we know, statically, the order that cabi_realloc will be called in and with what sizes

view this post on Zulip Alex Crichton (Sep 26 2022 at 17:16):

also, if you're up for it, I'd personally prefer if we could get this in Rust instead of C since that'll likely be much easier to maintain over time

view this post on Zulip Alex Crichton (Sep 26 2022 at 17:17):

the final thing I think is that we'll need to figure out the update to the $__stack_pointer global, probably via a post-processing pass

view this post on Zulip Dan Gohman (Sep 26 2022 at 22:34):

Rust doesn't have quite as many features exposed, like address_space(1) to create and access wasm globals, or -mexec-model=reactor. And Rust is more opinionated about how linking works. But I'll look into it.

view this post on Zulip Alex Crichton (Sep 27 2022 at 14:53):

That's true, but I would still much rather have this in Rust. Otherwise once we make this only one person will basically be able to modify it and this seems like it's going to be a generally useful ability to have. Ideally this would be a forcing function to motivate the addition of wasm globals or more flags to Rust, since the further C drifts from Rust's support of wasm the worse things get in general

view this post on Zulip Alex Crichton (Sep 28 2022 at 19:56):

ok I've just finished up the portion of this that will be in wit-component which takes as input a set of preview1 functions and the preview1.wasm module and then spits out a new wasm module which is the "minimal" module to satisfy the set of preview1 functions required. Basically it takes the input wasm, asserts the shape is as expected, deletes a few exports, then runs a "gc" pass, and then outputs the final module.

The module is only mutated by adding a start function which calls memory.grow (i32.const 1) and sets the return value to the preview1.wasm's stack pointer so it's got stack space


Last updated: Dec 23 2024 at 12:05 UTC