@Yehuda Katz is your use case primarily the wasm blob calling host functions or the host calling wasm functions?
Alex Crichton said:
Yehuda Katz is your use case primarily the wasm blob calling host functions or the host calling wasm functions?
ohai
I think both directions will end up being important
mk makes sense, so our general story here is that today it's not amazing
I think if there's one piece of "low-hanging fruit" I'd say it's making an ergonomic way to pass Strings and Vec<u8>
wasm-bindgen only works on the web (aka not wasmtime)
and otherwise there is no equivalent of wasm-bindgen today for host-side wasm
I was able to jury-rig something up, but it's very hard to convince myself about the safety properties, even if I put more work into it
stuff like "make sure you grow" etc. etc. is rough
right
I made a WasmSlice
struct
yeah so our general answer to this is interface types
which is basically { ptr, len }
which works well enough
but the song and dance is rough to get right
that's the longer-term vision for how host-wasm communication will be nicer (and it'll abstract all the details of memory growth, string allocations, your slice structures, etc)
and yeah this is why we're pushing towards a longer-term vision b/c getting it all right today is not easy
Alex Crichton said:
yeah so our general answer to this is interface types
for my use-case it's really important that whatever I do can easily work both in wasmtime and on the web
so I'm willing to do more work to wire it up if necessary
long-term that'll all be in place yeah
interface types was actually originally created for web host apis
what are the challenges with higher-level APIs for moving blobs of bytes around?
yeah I know
I'm a little weirded out by the fact that they're conceptually coupled to wasm-gc, fwiw
oh it's mostly just standardization
Rust teaches us that interface types and GC have nothing to do with each other :P
like just finding the right way to express all this in wasm and standardize it
it's not quite just standardization
but yeah I get it :)
I noticed all of this because I was working on the interaction between the JS typed objects proposal and wasm
and it really was "integration with wasm-gc" which seemed weird to me as a Rust person
the problem with interface types right now it's that it's a very long-term vision and we've not made a ton of progress in making it a closer reality
yep
that's something we're starting to make progress on, however
fitzgen/I are working on the next step towards interface types and such
and I just recently finished one half of it
personally, I wish interface types were just separated into their own proposal, but that's so far above my pay grade :)
excellent
what would you say is the correct pattern for moving around Vec<u8> today?
it's also worth pointing out that interface types is somewhat nebulous
the long-term vision is aligned with everyone I think
this is what I was trying to do: https://github.com/wycats/wand/blob/main/crates/wand-cli/src/slice.rs
but there's a lot of various mid-states that are very worthwhile and useful before we get to the long-term part
so for example a great "mvp" of interface types probably doesn't need wasm-gc
(e.g. what fitzgen/I are working on)
Alex Crichton said:
the long-term vision is aligned with everyone I think
as an aside, I think wasm has been most effective when the scope has been:
but that's my personal opinion :)
agreed yeah
Alex Crichton said:
so for example a great "mvp" of interface types probably doesn't need wasm-gc
it seems like, at minimum, the equivalent of Rust Copy
types should be doable as an MVP
right yeah
when I bring up Copy
to non-Rust people, it's just not a category they know how to wrap their heads around
for getting something done today what you've got there is a good start, the missing pieces would be safety/error handling as well as dealing with malloc/free
yeah
e.g. if you want to pass a string to wasm you need to malloc space for it
basically I don't know how to think about safety
or if wasm returns a string you may need to free it
but so you said you've heard of wiggle?
https://github.com/wycats/wand/blob/main/crates/wand-cli/src/main.rs#L31
this is the other half of it
I've heard of wiggle but haven't looked into it much
ah right yeah
lemme get some links for you
that link is where I'm confident I'm messing up :P
@Alex Crichton the good news about Rust is that most things have rustdoc. The bad news is that many things only have Rustdoc :P
it's true :(
we could alway benefit from more examples
it's ok ;)
so the purpose of wiggle is to make integration with *.witx
files easier
I'm always happy to help with docs on things I'm working on
and write decently fast
but I need to understand first
so for example WASI is specified with *.witx
which is a high-level description of an API
as you can see takes a string and returns a Result
what wiggle
does is it allows you to hook that up automatically to a host implementation -- https://github.com/bytecodealliance/wasmtime/blob/aed6de32d4e5603a7f85619d84099a9a05cb7a7c/crates/wasi-common/src/snapshots/preview_1.rs#L687-L696
which as you can see is all safe
wiggle does all the "ABI goop" of bounds checks, errors, etc
so you only write in 100% safe rust
this is the theory behind wiggle
it's still kind of boiler-plate-y to get all this hooked up
and is something we're continuing to improve
@Alex Crichton I think one of the things I would benefit from the most is just a sense of what things are stable
but that's the general idea of where our end-state will be
@Alex Crichton oh, another random topic
not that random :P
I was wondering if folks are thinking that wasi should integrate with stuff like the JS FileSystem API
I asked Aaron about it and he didn't know
but I was a little surprised it didn't do that
(and also, how wasi would handle the need to get permissions in the first place)
wasi currently focuses mainly on non-web applications of wasm
it's always been expected, however, that a web port is possible
but there's not currently an official "here's the wasi api for the web" implementation
and afaik posix has more of an effect on the design than the web
(I'm totally unfamiliar with the filesystem api myself)
Alex Crichton said:
wasi currently focuses mainly on non-web applications of wasm
see my earlier thought about wasm scope
;)
https://rustwasm.github.io/wasm-bindgen/api/web_sys/struct.FileSystem.html
in any case today everything about this, especially crossing js/host embeddings, is going to be pretty manual
e.g. you're going to be writing very similar glue code in both Rust and in JS
that's what's currently available, and in the future we hope to provide tools to make this much easier, basically auto-generating both the JS and the Rust code based on the desired interface of the module
like you export a function that takes/returns a string and we'd auto-generate the JS glue and the Rust glue for doing that
no need to manage WasmSlice
yourself
I don't mind writing a lot of the glue code myself
I am expecting to need to manually write two hosts
wasmtime and web
but I don't want to be forced into assuming a JS client
I'm literally using wasm-pack right now because it's the best way to run the steps
and then I literally copy the .wasm
another aspect which may also be possible is to have two wasm blobs, one using wasm-bindgen for the web and one using handwritten stuff for wasmtime
that would at least relinquish you from having to write js glue code
but they can't have a single set of externs
what's slightly frustrating is that there's not a good story for "how to interact manually with the .wasm file created by wasm-bindgen"
it kind of feels like there should be a Rust API for calling a wasm_bindgen function that takes a String that doesn't have much to do with JS
true, the wasm-bindgen ABI is unstable and not really documented
is that intentional, or just path dependence?
I would be happy to help :)
that's totally possible, but it breaks down very quickly
there's just way too much in wasm-bindgen that assumes JS
presumably there's some need for a somewhat stable answer in order for the JS side to work?
right
once you start only talking about structs and strings what you're actually talking about is interface types
would it be possible to break out wasm_bindgen_core and wasm_bindgen_js?
I've toyed with the idea of this in the past
the tl;dr; is basically "no"
or rather, it's possible, but you're just doing interface types
I plan to, one day, rebuild wasm-bindgen on interface types
but interface types needs to make more progress in the meantime naturally
why wouldn't the rough interning strategy for JS work more generally?
half the battle of wasm_bindgen is inside of Rust, right?
why isn't that stuff fully applicable to non-JS clients?
oh it is
if you touch just the right set of things you could write a native side of wasm-bindgen
like you could just translate the JS glue code to Rust
there's no fundamental reason that can't be done
there's practical reasons for why I haven't done that though
yeah that makes sense
Alex Crichton said:
another aspect which may also be possible is to have two wasm blobs, one using wasm-bindgen for the web and one using handwritten stuff for wasmtime
I'm trying to work out what exactly it would mean to do this
I could create two totally different sets of extern functions
but it feels like it'll just be the same thing twice with slightly different low-level protocols
yeah that's' what i'm imagining
basically
ok, next question
how do I learn wiggle?
all the "meat" would be a shared rust dep
wiggle is only intended for wasm-to-host communication at this time
hm
so while it's the right shape of what you want I don't think it's entirely waht you want
b/c you also want "enriched" host-to-wasm communication
I'm basically used to the JS story of just transferring very basic bytes
and I don't mind having to do that
but even that felt like a song and dance
I could build WasmString on top of WasmSlice very easily, obviously
I guess what I'm saying is I want WasmSlice in wasmtime :P
in addition to Memory
at the limit, I can just transmute the hell out of things
The new Memory::read
API is close I think to what you might want
perhaps coupled with another Memory::read_vec
, Memory::read_string
, and Memory::read_str
right
which would require a tiny ABI
basically len/ptr
it also kinda sucks that Rust can't do multi-value easily
like you want to write a wasm function that retuns two i32
instead of a packed i64
but Rust won't let you do that
can't you just use a tuple?
I guess I don't know the wasm abi for multi-value
I guess you can just do read()
and carefully make sure to make a Vec with the right capacity?
it seems like read() is a bit of a rube goldberg machine for reading a certain amount of bytes
first make a Vec with the right capacity
well
first read a u32
then make a Vec::with_capacity(that size)
then advance the ptr
(by 4)
then read()
yeah that's what I mean with read_vec
in addition to the read
that we have today
yeah
like you should be able to do memory.read(ptr..ptr+len)
yeah
and it allocates the Vec
for you and all that
it does?
no I mean we should have an API that does that
oh you're saying read_vec
yeah
oh sorry yeah
like I said it would require an ABI
which is a bit "new" to this API
how does wasm_bindgen do it?
hm you may need to be more precise
do you mean like how does wasm-bindgen return a string from wasm to JS?
yeah
what's the protocol?
I've seen people try to reverse engineer the ABI in various threads
but it seemed crazy :P
it's funky -- first 8 bytes of the shadow stack is reserved, and that return pointer is passed as the first parameter
then when the call finishes the 8 bytes are interpreted as ptr/len
the ptr/len are then copied out and decoded as utf-8
then the ptr/len are freed
Alex Crichton said:
it's funky -- first 8 bytes of the shadow stack is reserved, and that return pointer is passed as the first parameter
is it really that crazy to make an API for the low-level details like the shadow stack?
it must already be encapsulated for maintenance reasons
not sure what you mean by encapsulated
but this is indeed something we could expose an API for
it's in a shaky realm though b/c we're just guessing what global is actually the shadow stack pointer, if any
I just mean it must not be a bunch of spaghetti code strewn around
otherwise you'd keep breaking the codegen :P
most wasm modules don't even export the shadow stack pointer
you mean like in wasm-bindgen?
I think it's ok to say "your client must do XXX YYY ZZZ and your wasm module must expose AAA BBB CCC"
right
I literally used wasm-bindgen + wasm-objdump and it's not THAT crazy
but eventually I was like "this is too hard" and implemented WasmSlice
wasm-bindgen is a bit more powerful here in that it has complete control over the .wasm output iteslf
e.g. wasm-bindgen injects functions to manipulate the stack pointer
b/c they're not natively present in the wasm file
but those functions are totally decoupled from the details of a JS client, right?
wasm-bindgen also just makes blind guesses as to what global is the stack pointer
lol
I mean "memory" being the magical memory name is also a protocol ;)
oh I see, yes, what you mean about taking the exact output of wasm-bindgen and feeding it into rust
:)
I was looking at the code myself, but at some point I don't work on the codebase and I hit some walls
but at a conceptual level it seems like it ought to work
I can see why interface types takes up the mental oxygen, though
oh yeah there's nothing stopping you from doing that
like it's very plausible to add a "rust" output to wasm-bindgen
where instead of spitting out *.wasm + *.js it spits out *.wasm + *.rs
I think the use-case of "something that works with both a Rust client and a JS client" is good enough motivation, right?
right exactly
bingo
that's exactly what I mean
but this is what I mentioned earlier where it breaks down fairly quickly
the main downside is it occupies the same space as interface types
and wasm-bindgen is entirely Rust-specific
but interface types won't work on the web without wasm-gc
it also breaks down quickly once the *.rs needs to do something js-specific
like you have a Rust function taking &JsValue
as an argument
that's what I was thinking re: wasm_bindgen_core
break out the non-JS parts of the protocol
this is what I meant about multiple phases or interim-periods of interface types
and you're limited to that if you want a truly portable .wasm
yeah
so like today the rust compiler spits out a *.wasm
when you compile it and use wasm-bindgen
that *.wasm
has a whole bunch of weird wasm-bindgen-specific stuff
imagine instead it spits out a *.wasm
that's a standard wasm module using interface types
you can then natively consume that *.wasm
in wasmtime (since it's standard)
yeah
yeah
and you can also run a tool to polyfill interface types for the web
basically exactly what wasm-bindgen does today, except based on the standard
Alex Crichton said:
you can then natively consume that
*.wasm
in wasmtime (since it's standard)
as soon as wasmtime actually supports interface types
you could imagine a polyfill even for wasmtime in the meantime
according to the issue tracker, the answer to that is ... some time in the future?
yeah
I don't know how to write it tho :P
I like the idea of using interface types as a meta-language for this
correct yeah, this is basically going to happen but takes a lot of planning
that can then be turned into the glue layer without needing Rust-specific logic
we're inching forward but there's a lot of moving pieces and we also have to prioritize with work stuff
I think that's what Luke pitched to me 3 years ago :P
I think for now, I don't mind going down the "two extern functions path"
as long as I feel confident about the safety properties
.read()
is good
more APIs on Memory
should be able to give you the safety guarantee
I feel you on the 3 years part though
this is the part where standards are... hard
like wasm-bindgen was easy b/c we could do whatever we wanted
yeah
but with interface types it's different b/c we're trying to design a system that lasts
I think it makes sense to think about the web as a locus of standards control, imo
the web is the forcing function
if the entire theory of an API is based on a place where people can theoretically compete with the standard, it makes everything take 10x longer
if not more :P
heh true
I don't have a very clear sense of how the people who are working on wasi think about wasi
I know how to think of wasm of course
ok, so it seems like my next steps ought to be:
yeah that sounds good
I'm making a PR for some more read_*
methods soon
I think I need to understand the song-and-dance with relation to freeing memory communicated through Memory
I assume that to a first approximation, & borrows give you what you need
Yehuda Katz said:
I assume that to a first approximation, & borrows give you what you need
trying to figure out what the right place to put the wrapper that could borrow out of Memory
in general for safety you'll want to avoid borrowing Memory
"for a long time"
right
you'll typically want to copy out
I already have a wrapper around Module
I could put read_vec in there
the real question is:
let mem = module.memory();
let len = source.len() as u32;
let ptr: u32 = module.call1("allocate", len as u32).unwrap();
let slice = mem.data_ptr();
let zero = unsafe { slice.offset(ptr as isize) };
unsafe { zero.copy_from(source.as_ptr(), source.len()) };
// slice[ptr] = len;
module.call2("hello", ptr, source.len() as u32)?;
is this song-and-dance actually legit?
I would recommend using memory.write(ptr as usize, &source).unwrap()
, but yes
I saw some comments about the stability of offsets, but I'm not actually casting anything into actual pointers
but is it possible for ptr
to move?
also: should I be calling grow() somewhere?
do I need to drop() something somewhere?
so ptr
cannot move because it is relative to the memory base
slice
can move due to memory.grow
you do not need to memory.grow
because allocate
will do that internally if necessary
ah
is the allocate song-and-dance roughly correct?
TLDR make an allocate function in the WASM Rust
yes
call it
get a ptr
is it correct to cast to u32?
yes
I'm assuming wasm32, which seems like it should be fine
phew
well allocate
will have an exact signature
and it'll probably return an i32
so if you say i64
then your rust code will just fail at runtime saying the signatures mismatch
Alex Crichton said:
slice
can move due tomemory.grow
so just make sure to use it "now"?
i.e. make it ~ &
?
yep
ok
this is why i'd recommend memory.write
yeah that makes sense
b/c that's a safe API
I was stuck on the 0.21 docs for a while
also had no internet or power
it was fun times :P
oh dear :(
@Alex Crichton I'm trying to make the changes you recommended, and I was wondering why the Memory
APIs use usize
I didn't review them that closely
I guess it makes sense at a low-level, but you need to get the underlying pointers as u32
because wasm32
so then you end up with blind casts to usize
We may want to take u64 even since there's memory64 one day too, but I think I agree that usize
is wrong
yeah I don't mind u64
but it's not actually talking about the host's memory sizes :P
we may want a type a la usize for the client's sizes
csize :P
I admit that I have given 0 thought to portability or evolution ;)
the funny thing about the errors is that you get stern warnings like:
you can convert a `u32` to a `usize` and panic if the converted value doesn't fit rustc E0308
yeah I think rustc warnings are starting to go off the deep end...
I literally never read them
hm so one difference between .read() and what I was doing before is that read
forces you to copy out of the memory
maybe I want that anyway?
I was doing this before:
&memory.data_unchecked()[(self.ptr as usize)..][..(self.len as usize)]
which lets me have this signature:
pub unsafe fn as_buf<'memory>(&self, memory: &'memory Memory) -> &'memory [u8] {
obviously the unsafe is not very nice
but it seems like as long as I have a &Memory
, I ought to be able to borrow out of it safely?
yes and no, that's basically what this talks about -- https://docs.rs/wasmtime/0.23.0/wasmtime/struct.Memory.html#memory-and-safety
I guess I don't really know what unsafe
is vis a vis wasm memory
use-after-free in Rust
segfaults in Rust
hm
etc, etc
basically it's literal UB
but shouldn't the RefCell be protecting us from that?
I guess there's no way to actually lock the memory to prevent wasm code from writing to it
no b/c this has to do with UB in the Rust type system
since it can be happily running in some other thread
I basically want to be able to have a GIL I think :P
right so you're not looking at &[Cell<u8>]
right
you get &mut [u8]
which has a lot of Rust guarantees
but also the pointer itself is not stable
Both Neon and Helix ended up introducing a GIL for this ~ reason
every memory.grow
can invalidate it
the truth is that copying is not so bad
indeed
it's basically memcpy :shrug:
all copies in an out of memory are safe
(that's what interface types basically do)
hehe MemoryAccessError
we finally got our OOM results
yeah most of Memory
is for interacting with a "possibly hostile" wasm module
but in your use case the wasm module is trusted
yeah
for example it seems pointless to do UTF-8 checks
@Alex Crichton I ended up creating a WasmUnwrap trait that uses my own custom error function to panic
is this crazy?
basically I couldn't use the one everyone uses because I'm not JS
uh... maybe?
not entirely sure the context in which this trait exists
and fwiw I'd still do utf-8 checks and unwrap them
we could change that crate to allow the user to customize the wasm function to call
it's always better to get panics instead of UB on busg
yeah sure
:)
@Alex Crichton it's for cases where I'd have written unwrap() or expect()
in the wasm blob itself?
yah
ah I see, yeah that works
it assumes:
extern "C" {
pub fn log_str(ptr: u64);
}
which is less coupled than the JS one
this is an unfortunate thing where wasm32-wasi "just works" but wasm32-unknown-unknown doesn't
bio break, then call
right
Last updated: Dec 23 2024 at 14:03 UTC