As someone who is trying to compile a rust crate to wasm32-wasi
that has dependencies on C libs (QuickJS via rquickjs), how can I go about finding the reason that different WASI preview 1 imports are being pulled in?
This risks being something that seems trivially obvious to some folks so to add extra context: I'm at the start of my rust journey and haven't played in compiled languages since a time when I had no idea what I was doing.
This unfortunately generally isn't easy AFAIK, you'd have to trace symbols backwards to some exported function or something in a table for example, and that's a pretty manual process.
If you want zero imports you can use wasm32-unknown-unknown
, though, but not all crate will compile for that target
I did try targeting wasm32-unknown-unknown
but quickly discovered how far out of my depth I was. :sweat_smile:
Are you aware of any tools that might expose WASM call graphs in some ergonomic form?
twiggy
is the closest approximation that I can think of, but that's a "theoretically this is possible" kind of thing rather than "here's the one liner to render a graphviz file"
twiggy paths
looks super promising.
QuickJS's os
module probably pulls in a lot of WASI stuff by way of wasi-libc
: https://bellard.org/quickjs/quickjs.html#os-module
@Joel Dice AFAICT, rquickjs
shouldn't be pulling that in unless certain crate features are being used. I had he same question w/ quickjs-wasm-rs
from the Javy team and I _know_ that it isn't pulling in the libc library.
Yeah, I don't think it's pulling in the libc os module: https://github.com/DelSkayn/rquickjs/blob/7029b70b75cf9220c16a1ce2768968f3b8bef6fc/sys/build.rs#L123-L134
Well I've gone down this rabbit-hole once again and seem to have found a couple plausible explanations, as described in these issues:
As @Surma found in the first, it seems that some nuance of the way C macros are defined in the wasi-sdk
is forcing NDEBUG
to be unset: https://github.com/WebAssembly/wasi-sdk/issues/190#issuecomment-914682467 and then assert()
resulting in this 'stong' reference to stdout that LTO isn't able to get rid of.
I was able to avoid the imports by defining some stubs like the following:
#[no_mangle]
unsafe extern "C" fn __stdio_write(ptr: *const u8, len: usize) -> usize {
0
}
(also for __stdout_write
, __stdio_seek
and __stdio_close
)
But these all end up becoming unnecessary and unwanted exports of the WASM binary. AFAICT, it's because of this: https://github.com/rust-lang/rust/issues/73958
I'm a bit stumped for the time being on how I can put into place some of the findings from the first wasi-sdk
issue.
It seems like @Sam Clegg was pointing to __stderr_used
as being some sort of strong reference preventing the LTO. I can see __stderr_used
listed in wasi-sdk/share/wasi-sysroot/share/wasm32-wasi/defined-symbols.txt
but no other mentions in the pre-built wasi-sdk
package.
Maybe that's what was meant by needing to rebuild the SDK from source. I'm out of my depth but will give that a spin next.
I've found a sneaky call to printf
in the subset of QuickJS I'm consuming but something sneaky is still pulling in stdout. I see the following in the generated function table:
$__stdio_write $__stdio_close $__stdout_write $__stdio_seek
I can't figure out how to trace calls to those and perhaps nor can the linker.
Have you tried using e.g. wasm-tools print
to inspect the generated Wasm file and find out which functions are calling those imports?
That's how I've been doing most of my manual analysis. Those functions are never called directly, but I can't figure out if they might be called indirectly through br_table
(if I'm even understanding wasi instructions correctly).
You could try adding --trace-symbol=__stdio_write
to the linker flags; that can sometimes identify where the reference is coming from.
Hey @Dan Gohman, sorry for the noob question but where would I put this flag? I have a gut feeling that this might be in the linking of the wasi sdk against one of quickjs's c files but not sure.
It depends on how the quickjs build works. If there's a Makefile involved, you might be able to add -Wl,--trace-symbol=__stdio_write
to something like LDFLAGS
.
If Rust is doing the linking, setting the env var RUSTFLAGS='-C link-args=--trace-symbol=__stdio_write'
I'll try both! :goofy:
Oh, and I always forget how to tell cargo to not swallow linker diagnostic output.
I'm running with -vv
but I'm not seeing anything that looks like obvious traces
LDFLAGS="-Wl,--trace-symbol=__stdio_write" \
RUSTFLAGS="-Zlocation-detail=none -C link-args=--trace-symbol=__stdio_write" \
WASI_SDK="$(pwd)/../wasi-sdk/build/wasi-sdk-21.0.0ga50a641f4b5a+m" \
cargo +nightly build -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target wasm32-wasi --release -vv
I'll try with a symbol that is legitimately being pulled
Ok, the full magic is RUSTC_LOG=rustc_codegen_ssa::back::link=info RUSTFLAGS='-C link-args=-Wl,--trace-symbol=__stdio_write' cargo build
Also seems I had the wrong delimiter in RUSTFLAGS
Hrm, it's complaining with this:
- = note: rust-lld: error: unknown argument: -Wl,--trace-symbol=__stdio_write
Ah, in that case, remove just the -Wl,
(that's the flag needed when using clang
as the "linker", which some things do)
:boom: we have liftoff. Here's what I'm seeing:
INFO rustc_codegen_ssa::back::link linker stderr:
INFO rustc_codegen_ssa::back::link linker stdout:
./wasi-sysroot/lib/wasm32-wasi/libc.a(__stdio_write.o): lazy definition of __stdio_write
./wasi-sysroot/lib/wasm32-wasi/libc.a(__stdout_write.o): reference to __stdio_write
./wasi-sysroot/lib/wasm32-wasi/libc.a(__stdio_write.o): definition of __stdio_write
./wasi-sysroot/lib/wasm32-wasi/libc.a(stderr.o): reference to __stdio_write
So that must be indicative of it being unable to optimize it away from the wasi-libc
libc.a
?
The reference is coming from stderr.o
, so maybe something is pulling in stderr
. Try changing __stdio_write
to stderr
Ahh... meeting time. Will have to dive back in a bit later but big thanks @Dan Gohman and @Joel Dice.
Now that you mention stderr, maybe it's back to the conclusions drawn in this comment: https://github.com/WebAssembly/wasi-sdk/issues/190#issuecomment-916783936
That particular issue is specific to C++; is there C++ code in QuickJS?
Ah, gotcha. AFAICT it's 100% C.
INFO rustc_codegen_ssa::back::link linker stderr:
INFO rustc_codegen_ssa::back::link linker stdout:
.wasi-sysroot/lib/wasm32-wasi/libc.a(stderr.o): lazy definition of stderr
.wasi-sysroot/lib/wasm32-wasi/libc.a(stderr.o): definition of stderr
Hmm, maybe also try __stderr_FILE
or __stderr_used
Both stderr.c
and stdout.c
in wasi-libc
look very similar.
__stderr_used
gives:
INFO rustc_codegen_ssa::back::link linker stderr:
INFO rustc_codegen_ssa::back::link linker stdout:
./wasi-sysroot/lib/wasm32-wasi/libc.a(__stdio_exit.o): lazy definition of __stderr_used
./wasi-sysroot/lib/wasm32-wasi/libc.a(__stdio_exit.o): definition of __stderr_used
./wasi-sysroot/lib/wasm32-wasi/libc.a(stderr.o): definition of __stderr_used
And __stderr_FILE
:
INFO rustc_codegen_ssa::back::link linker stderr:
INFO rustc_codegen_ssa::back::link linker stdout:
./wasi-sysroot/lib/wasm32-wasi/libc.a(stderr.o): lazy definition of __stderr_FILE
./wasi-sysroot/lib/wasm32-wasi/libc.a(vfprintf.o): reference to __stderr_FILE
./wasi-sysroot/lib/wasm32-wasi/libc.a(stderr.o): definition of __stderr_FILE
./wasi-sysroot/lib/wasm32-wasi/libc.a(strtod.o): reference to __stderr_FILE
Quick Q: are there any assert
s in the codebase? That can pull in stderr. Compiling with -DNDEBUG
can disable that.
Hrm, why does strtod.o
reference __stderr_FILE
?
I'll be in a meeting for the next hour or so, but I'll do some investigation when I get back.
There are definitely asserts in QuickJS. I did something terrible to try to work around that here: https://github.com/ggoodman/rquickjs/commit/a95736a505d92219c2bad46664cd18ab88e81c13
I think this might be what is pulling it in: https://github.com/WebAssembly/wasi-libc/blob/6593687e25f07526c4b92a20fe5ddf507599d5b3/libc-top-half/headers/private/printscan.h#L46-L52
__attribute__((__cold__, __noreturn__))
static void long_double_not_supported(void) {
void abort(void) __attribute__((__noreturn__));
fputs("Support for formatting long double values is currently disabled.\n"
"To enable it, " __wasilibc_printscan_full_support_option ".\n", &__stderr_FILE);
abort();
}
AFAICT, the similar construct for floats is not being pulled in.
I tried ripping out the fputs calls in there in my local wasi-sdk build. Maybe holding it wrong but didn't see the reference disappear.
I think I'm either rebuilding the sdk incorrectly or I'm not correctly propagating its path to the rquickjs
crate's build process.
I'm going to try a full rebuild of wasi-libc.
Almost there:
INFO rustc_codegen_ssa::back::link linker stderr:
INFO rustc_codegen_ssa::back::link linker stdout:
.wasi-sysroot/lib/wasm32-wasi/libc.a(stderr.o): lazy definition of __stderr_FILE
So AFAICT, that got rid of the two unexpected references to __stderr_FILE
in vfprintf
and strtod
. They were both because of that pseudo-assertion around long long / long double support.
And yet, the wasi imports persist! They are stubborn.
I notice that in stdout.c
in wasi-libc
, we reference __stdout_write
while in stderr.c
, we reference __stdio_write
. Could there be some circular reference here preventing LTO?
Compare:
Rebuilding wasi-libc and heating the room a bit.
Got rid of the strong reference but still have the imports sneaking in.
Pretty confused by this asymmetry:
./wasi-sysroot/lib/wasm32-wasi/libc.a(stderr.o): lazy definition of stderr
./wasi-sysroot/lib/wasm32-wasi/libc.a(stdout.o): definition of stdout
stderr is never buffered, while stdout is buffered by default. __stdout_write
has extra code for handling line-buffering if it's attached to a terminal.
Here I am coming through with an axe when a scalpel is needed. Here's my hack-job so far: https://github.com/WebAssembly/wasi-libc/compare/main...ggoodman:wasi-libc:prune_stdio
@Dan Gohman do you have a hypothesis as to why one would be lazy and the other not?
Not offhand. "lazy" here is about wasm-ld's handling of archive files, where .o
files are only pulled in if a symbol in them is defined, but I don't know the specifics.
Out of curiosity, is it within reason to compile against the wasi-libc
source instead of built objects?
There aren't Makefiles set up to work that way, unfortunately.
If there were, then it'd be reasonable :-)
Gotcha. I've spent my novelty tokens and taken a line of credit already.
Do you have code you could publish somewhere, so I could take a look?
Good idea. I'm not sure how to get my wasi-libc
fork linked up to my rust project in a consumable way though.
I'm working from the bridge
branch in https://github.com/ggoodman/quicky/tree/bridge. That should already be linked to my fork of rquickjs
but the WASI_SDK
reference in .cargo/config.toml
is relative to my local check-out.
I've been using this script to help w/ the cycle time between shots in the dark :joy: https://github.com/ggoodman/quicky/blob/4d2d859c765bb337a9791a4e1b2dc367925d5916/scripts/build.sh#L6-L8
My fork of wasi-sdk
: https://github.com/ggoodman/wasi-sdk/tree/strip-wasi-stdio
Points to wasi-libc
that has these changes: https://github.com/WebAssembly/wasi-libc/compare/main...ggoodman:wasi-libc:prune_stdio
That would be a lot of work to repro. I might be able to get a setup where only quicky
needs to be pulled.
The workflow is now described and show be reproducible here: https://github.com/ggoodman/quicky/tree/bridge
A new potential lead is __stdio_exit
that I'm seeing getting linked. It seems like __toread
and __towrite
might both cause those weak references to become strong references:
/* atexit.c and __stdio_exit.c override these. the latter is linked
* as a consequence of linking either __toread.c or __towrite.c. */
weak_alias(dummy, __funcs_on_exit);
weak_alias(dummy, __stdio_exit);
I'm seeing this:
.../wasm32-wasi/libc.a(__stdio_close.o): lazy definition of __stdio_close
.../wasm32-wasi/libc.a(__stdio_seek.o): lazy definition of __stdio_seek
.../wasm32-wasi/libc.a(__stdio_write.o): lazy definition of __stdio_write
.../wasm32-wasi/libc.a(__toread.o): lazy definition of __toread
.../wasm32-wasi/libc.a(__towrite.o): lazy definition of __towrite
.../wasm32-wasi/libc.a(__overflow.o): reference to __towrite
.../wasm32-wasi/libc.a(__towrite.o): definition of __towrite
.../wasm32-wasi/libc.a(fwrite.o): reference to __towrite
.../wasm32-wasi/libc.a(stderr.o): lazy definition of stderr
.../wasm32-wasi/libc.a(stdout.o): reference to __stdio_close
.../wasm32-wasi/libc.a(__stdio_close.o): definition of __stdio_close
.../wasm32-wasi/libc.a(stdout.o): reference to __stdio_write
.../wasm32-wasi/libc.a(__stdio_write.o): definition of __stdio_write
.../wasm32-wasi/libc.a(stdout.o): reference to __stdio_seek
.../wasm32-wasi/libc.a(__stdio_seek.o): definition of __stdio_seek
.../wasm32-wasi/libc.a(stdout.o): definition of stdout
.../wasm32-wasi/libc.a(vfprintf.o): reference to __towrite
.../wasm32-wasi/libc.a(__uflow.o): reference to __toread
.../wasm32-wasi/libc.a(__toread.o): definition of __toread
One nuance is that this project is designed to be run through wizer. I only see an obvious reference to __stdio_exit
here:
(func $__wasm_call_dtors (;630;) (type 72)
call $#func629<dummy>
call $__stdio_exit
)
(func $init.command_export (;631;) (type 31) (result i32)
call $init
call $__wasm_call_dtors
)
Is there a way to elect out of calling destructors?
Lots of learning for me going down this rabbit hole but now I'm hoping to rein it back in. I think it's probably worth trying to come up with a minimal repro here. Any hypotheses on the shape of a minimal c library I can wrap in rust bindings to produce what we're seeing here?
OK @Dan Gohman I think I have a pretty sweet, mostly minimal repro here: https://github.com/ggoodman/wasi-import-repro
The key finding is that the very presence of a printf
call in a c library results in those fd_*
imports getting pulled along even if we _know_ that the code calling printf
is being eliminated by the linker.
This is showing that a rust function that references a c binding referencing a call to printf is enough, even if the rust function is omitted by a feature flag.
Filed an issue here: https://github.com/WebAssembly/wasi-sdk/issues/401
Thinking more about this, I have the hypothesis that it's the finalizers that could be holding onto strong references.
That made me wonder (at least for my use-case) if I actually _cared_ about finalizers and clean-up. The main use-case I have at work uses a WASM instance exactly once and throws it away. All work after producing a 'result' is essentially wasted cycles as far as I'm concerned.
So I wonder if tooling could be parametrized for this sort of use-case... Could we tell the wasi-sdk / wasi-libc that we don't actually care about clean-up? In other words, the host and runtime can handle that for us. I understand that this wouldn't be a general-purpose strategy which is why I'm wondering about some opt-in behaviour. Does it sound feasible?
AFAIK, wasi-libc
generally won't clean anything up unless you tell it to (e.g. by closing a file descriptor). So I think the answer in your case is: "don't call close
" (which at the Rust level means e.g. wrapping a File
in ManuallyDrop
and never dropping it).
But maybe you're asking for an optional build of wasi-libc
where close
is a no-op?
I think the solution when using rust is a bit easier but the use-case that confounds me is when building a c library against wasi-libc--and especially one whose source code I don't control.
Last updated: Nov 22 2024 at 17:03 UTC