Stream: rust-toolchain

Topic: Finding and eliminating wasm32-wasi imports


view this post on Zulip Geoff Goodman (Feb 26 2024 at 20:33):

As someone who is trying to compile a rust crate to wasm32-wasi that has dependencies on C libs (QuickJS via rquickjs), how can I go about finding the reason that different WASI preview 1 imports are being pulled in?

This risks being something that seems trivially obvious to some folks so to add extra context: I'm at the start of my rust journey and haven't played in compiled languages since a time when I had no idea what I was doing.

view this post on Zulip Alex Crichton (Feb 26 2024 at 20:35):

This unfortunately generally isn't easy AFAIK, you'd have to trace symbols backwards to some exported function or something in a table for example, and that's a pretty manual process.

If you want zero imports you can use wasm32-unknown-unknown, though, but not all crate will compile for that target

view this post on Zulip Geoff Goodman (Feb 26 2024 at 20:36):

I did try targeting wasm32-unknown-unknown but quickly discovered how far out of my depth I was. :sweat_smile:

view this post on Zulip Geoff Goodman (Feb 26 2024 at 20:36):

Are you aware of any tools that might expose WASM call graphs in some ergonomic form?

view this post on Zulip Alex Crichton (Feb 26 2024 at 20:37):

twiggy is the closest approximation that I can think of, but that's a "theoretically this is possible" kind of thing rather than "here's the one liner to render a graphviz file"

view this post on Zulip Geoff Goodman (Feb 26 2024 at 20:39):

twiggy paths looks super promising.

view this post on Zulip Joel Dice (Feb 26 2024 at 20:40):

QuickJS's os module probably pulls in a lot of WASI stuff by way of wasi-libc: https://bellard.org/quickjs/quickjs.html#os-module

view this post on Zulip Geoff Goodman (Feb 26 2024 at 20:50):

@Joel Dice AFAICT, rquickjs shouldn't be pulling that in unless certain crate features are being used. I had he same question w/ quickjs-wasm-rs from the Javy team and I _know_ that it isn't pulling in the libc library.

view this post on Zulip Geoff Goodman (Feb 26 2024 at 20:58):

Yeah, I don't think it's pulling in the libc os module: https://github.com/DelSkayn/rquickjs/blob/7029b70b75cf9220c16a1ce2768968f3b8bef6fc/sys/build.rs#L123-L134

view this post on Zulip Geoff Goodman (Mar 20 2024 at 14:13):

Well I've gone down this rabbit-hole once again and seem to have found a couple plausible explanations, as described in these issues:

  1. https://github.com/WebAssembly/wasi-sdk/issues/190
  2. https://github.com/WebAssembly/wasi-sdk/issues/220

As @Surma found in the first, it seems that some nuance of the way C macros are defined in the wasi-sdk is forcing NDEBUG to be unset: https://github.com/WebAssembly/wasi-sdk/issues/190#issuecomment-914682467 and then assert() resulting in this 'stong' reference to stdout that LTO isn't able to get rid of.

I was able to avoid the imports by defining some stubs like the following:

#[no_mangle]
unsafe extern "C" fn __stdio_write(ptr: *const u8, len: usize) -> usize {
    0
}

(also for __stdout_write, __stdio_seek and __stdio_close)

But these all end up becoming unnecessary and unwanted exports of the WASM binary. AFAICT, it's because of this: https://github.com/rust-lang/rust/issues/73958

I'm a bit stumped for the time being on how I can put into place some of the findings from the first wasi-sdk issue.

A very minimal code sample like this: #include <memory> int main() { auto a = std::shared_ptr<int>(new int(3)); return *a + 40; } compiled with WASI SDK like this: clang++ --target=wasm32-wasi -O3 ...
Firefox currently builds all its Rust code into a static artifact called gkrust which is then statically linked, with cross-language LTO, with C++ artifacts to form the shippable libxul.so/dylib/dl...
A very minimal code sample like this: #include <memory> int main() { auto a = std::shared_ptr<int>(new int(3)); return *a + 40; } compiled with WASI SDK like this: clang++ --target=wasm32-wasi -O3 ...
I am writing C++ code for the internet computer, and I am a bit stuck at the moment trying to use strings. I studied the discussion in this issue in great detail, and I made good progress after I e...

view this post on Zulip Geoff Goodman (Mar 21 2024 at 00:47):

It seems like @Sam Clegg was pointing to __stderr_used as being some sort of strong reference preventing the LTO. I can see __stderr_used listed in wasi-sdk/share/wasi-sysroot/share/wasm32-wasi/defined-symbols.txt but no other mentions in the pre-built wasi-sdk package.

Maybe that's what was meant by needing to rebuild the SDK from source. I'm out of my depth but will give that a spin next.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 13:31):

I've found a sneaky call to printf in the subset of QuickJS I'm consuming but something sneaky is still pulling in stdout. I see the following in the generated function table:

$__stdio_write $__stdio_close $__stdout_write $__stdio_seek

I can't figure out how to trace calls to those and perhaps nor can the linker.

view this post on Zulip Joel Dice (Mar 21 2024 at 13:33):

Have you tried using e.g. wasm-tools print to inspect the generated Wasm file and find out which functions are calling those imports?

view this post on Zulip Geoff Goodman (Mar 21 2024 at 13:38):

That's how I've been doing most of my manual analysis. Those functions are never called directly, but I can't figure out if they might be called indirectly through br_table (if I'm even understanding wasi instructions correctly).

view this post on Zulip Dan Gohman (Mar 21 2024 at 13:39):

You could try adding --trace-symbol=__stdio_write to the linker flags; that can sometimes identify where the reference is coming from.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 13:41):

Hey @Dan Gohman, sorry for the noob question but where would I put this flag? I have a gut feeling that this might be in the linking of the wasi sdk against one of quickjs's c files but not sure.

view this post on Zulip Dan Gohman (Mar 21 2024 at 13:42):

It depends on how the quickjs build works. If there's a Makefile involved, you might be able to add -Wl,--trace-symbol=__stdio_write to something like LDFLAGS.

view this post on Zulip Dan Gohman (Mar 21 2024 at 13:44):

If Rust is doing the linking, setting the env var RUSTFLAGS='-C link-args=--trace-symbol=__stdio_write'

view this post on Zulip Geoff Goodman (Mar 21 2024 at 13:45):

I'll try both! :goofy:

view this post on Zulip Dan Gohman (Mar 21 2024 at 13:46):

Oh, and I always forget how to tell cargo to not swallow linker diagnostic output.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 13:47):

I'm running with -vv but I'm not seeing anything that looks like obvious traces

view this post on Zulip Geoff Goodman (Mar 21 2024 at 13:48):

LDFLAGS="-Wl,--trace-symbol=__stdio_write" \
RUSTFLAGS="-Zlocation-detail=none -C link-args=--trace-symbol=__stdio_write" \
WASI_SDK="$(pwd)/../wasi-sdk/build/wasi-sdk-21.0.0ga50a641f4b5a+m" \
  cargo +nightly build -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target wasm32-wasi --release -vv

view this post on Zulip Geoff Goodman (Mar 21 2024 at 13:49):

I'll try with a symbol that is legitimately being pulled

view this post on Zulip Dan Gohman (Mar 21 2024 at 13:52):

Ok, the full magic is RUSTC_LOG=rustc_codegen_ssa::back::link=info RUSTFLAGS='-C link-args=-Wl,--trace-symbol=__stdio_write' cargo build

view this post on Zulip Geoff Goodman (Mar 21 2024 at 13:53):

Also seems I had the wrong delimiter in RUSTFLAGS

view this post on Zulip Geoff Goodman (Mar 21 2024 at 13:55):

Hrm, it's complaining with this:

-  = note: rust-lld: error: unknown argument: -Wl,--trace-symbol=__stdio_write

view this post on Zulip Dan Gohman (Mar 21 2024 at 13:56):

Ah, in that case, remove just the -Wl,

view this post on Zulip Dan Gohman (Mar 21 2024 at 13:57):

(that's the flag needed when using clang as the "linker", which some things do)

view this post on Zulip Geoff Goodman (Mar 21 2024 at 13:57):

:boom: we have liftoff. Here's what I'm seeing:

INFO rustc_codegen_ssa::back::link linker stderr:

INFO rustc_codegen_ssa::back::link linker stdout:
./wasi-sysroot/lib/wasm32-wasi/libc.a(__stdio_write.o): lazy definition of __stdio_write
./wasi-sysroot/lib/wasm32-wasi/libc.a(__stdout_write.o): reference to __stdio_write
./wasi-sysroot/lib/wasm32-wasi/libc.a(__stdio_write.o): definition of __stdio_write
./wasi-sysroot/lib/wasm32-wasi/libc.a(stderr.o): reference to __stdio_write

view this post on Zulip Geoff Goodman (Mar 21 2024 at 13:58):

So that must be indicative of it being unable to optimize it away from the wasi-libc libc.a?

view this post on Zulip Dan Gohman (Mar 21 2024 at 13:59):

The reference is coming from stderr.o, so maybe something is pulling in stderr. Try changing __stdio_write to stderr

view this post on Zulip Geoff Goodman (Mar 21 2024 at 14:01):

Ahh... meeting time. Will have to dive back in a bit later but big thanks @Dan Gohman and @Joel Dice.

Now that you mention stderr, maybe it's back to the conclusions drawn in this comment: https://github.com/WebAssembly/wasi-sdk/issues/190#issuecomment-916783936

A very minimal code sample like this: #include <memory> int main() { auto a = std::shared_ptr<int>(new int(3)); return *a + 40; } compiled with WASI SDK like this: clang++ --target=wasm32-wasi -O3 ...

view this post on Zulip Dan Gohman (Mar 21 2024 at 14:05):

That particular issue is specific to C++; is there C++ code in QuickJS?

view this post on Zulip Geoff Goodman (Mar 21 2024 at 15:21):

Ah, gotcha. AFAICT it's 100% C.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 15:22):

INFO rustc_codegen_ssa::back::link linker stderr:

INFO rustc_codegen_ssa::back::link linker stdout:
.wasi-sysroot/lib/wasm32-wasi/libc.a(stderr.o): lazy definition of stderr
.wasi-sysroot/lib/wasm32-wasi/libc.a(stderr.o): definition of stderr

view this post on Zulip Dan Gohman (Mar 21 2024 at 15:23):

Hmm, maybe also try __stderr_FILE or __stderr_used

view this post on Zulip Geoff Goodman (Mar 21 2024 at 15:24):

Both stderr.c and stdout.c in wasi-libc look very similar.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 15:53):

__stderr_used gives:

INFO rustc_codegen_ssa::back::link linker stderr:

INFO rustc_codegen_ssa::back::link linker stdout:
./wasi-sysroot/lib/wasm32-wasi/libc.a(__stdio_exit.o): lazy definition of __stderr_used
./wasi-sysroot/lib/wasm32-wasi/libc.a(__stdio_exit.o): definition of __stderr_used
./wasi-sysroot/lib/wasm32-wasi/libc.a(stderr.o): definition of __stderr_used

view this post on Zulip Geoff Goodman (Mar 21 2024 at 15:54):

And __stderr_FILE:

INFO rustc_codegen_ssa::back::link linker stderr:

INFO rustc_codegen_ssa::back::link linker stdout:
./wasi-sysroot/lib/wasm32-wasi/libc.a(stderr.o): lazy definition of __stderr_FILE
./wasi-sysroot/lib/wasm32-wasi/libc.a(vfprintf.o): reference to __stderr_FILE
./wasi-sysroot/lib/wasm32-wasi/libc.a(stderr.o): definition of __stderr_FILE
./wasi-sysroot/lib/wasm32-wasi/libc.a(strtod.o): reference to __stderr_FILE

view this post on Zulip Dan Gohman (Mar 21 2024 at 15:55):

Quick Q: are there any asserts in the codebase? That can pull in stderr. Compiling with -DNDEBUG can disable that.

view this post on Zulip Dan Gohman (Mar 21 2024 at 15:56):

Hrm, why does strtod.o reference __stderr_FILE?

view this post on Zulip Dan Gohman (Mar 21 2024 at 15:57):

I'll be in a meeting for the next hour or so, but I'll do some investigation when I get back.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 16:04):

There are definitely asserts in QuickJS. I did something terrible to try to work around that here: https://github.com/ggoodman/rquickjs/commit/a95736a505d92219c2bad46664cd18ab88e81c13

High level bindings to the quickjs javascript engine - Test overriding assert · ggoodman/rquickjs@a95736a

view this post on Zulip Geoff Goodman (Mar 21 2024 at 16:13):

I think this might be what is pulling it in: https://github.com/WebAssembly/wasi-libc/blob/6593687e25f07526c4b92a20fe5ddf507599d5b3/libc-top-half/headers/private/printscan.h#L46-L52

__attribute__((__cold__, __noreturn__))
static void long_double_not_supported(void) {
    void abort(void) __attribute__((__noreturn__));
    fputs("Support for formatting long double values is currently disabled.\n"
          "To enable it, " __wasilibc_printscan_full_support_option ".\n", &__stderr_FILE);
    abort();
}

AFAICT, the similar construct for floats is not being pulled in.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 16:27):

I tried ripping out the fputs calls in there in my local wasi-sdk build. Maybe holding it wrong but didn't see the reference disappear.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 16:35):

I think I'm either rebuilding the sdk incorrectly or I'm not correctly propagating its path to the rquickjs crate's build process.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 16:53):

I'm going to try a full rebuild of wasi-libc.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 17:28):

Almost there:

INFO rustc_codegen_ssa::back::link linker stderr:

INFO rustc_codegen_ssa::back::link linker stdout:
.wasi-sysroot/lib/wasm32-wasi/libc.a(stderr.o): lazy definition of __stderr_FILE

view this post on Zulip Geoff Goodman (Mar 21 2024 at 17:48):

So AFAICT, that got rid of the two unexpected references to __stderr_FILE in vfprintf and strtod. They were both because of that pseudo-assertion around long long / long double support.

And yet, the wasi imports persist! They are stubborn.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 19:04):

I notice that in stdout.c in wasi-libc, we reference __stdout_write while in stderr.c, we reference __stdio_write. Could there be some circular reference here preventing LTO?

Compare:

view this post on Zulip Geoff Goodman (Mar 21 2024 at 19:11):

Rebuilding wasi-libc and heating the room a bit.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 19:24):

Got rid of the strong reference but still have the imports sneaking in.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 19:49):

Pretty confused by this asymmetry:

./wasi-sysroot/lib/wasm32-wasi/libc.a(stderr.o): lazy definition of stderr
./wasi-sysroot/lib/wasm32-wasi/libc.a(stdout.o): definition of stdout

view this post on Zulip Dan Gohman (Mar 21 2024 at 19:55):

stderr is never buffered, while stdout is buffered by default. __stdout_write has extra code for handling line-buffering if it's attached to a terminal.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 19:58):

Here I am coming through with an axe when a scalpel is needed. Here's my hack-job so far: https://github.com/WebAssembly/wasi-libc/compare/main...ggoodman:wasi-libc:prune_stdio

WASI libc implementation for WebAssembly. Contribute to WebAssembly/wasi-libc development by creating an account on GitHub.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 20:24):

@Dan Gohman do you have a hypothesis as to why one would be lazy and the other not?

view this post on Zulip Dan Gohman (Mar 21 2024 at 20:29):

Not offhand. "lazy" here is about wasm-ld's handling of archive files, where .o files are only pulled in if a symbol in them is defined, but I don't know the specifics.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 21:33):

Out of curiosity, is it within reason to compile against the wasi-libc source instead of built objects?

view this post on Zulip Dan Gohman (Mar 21 2024 at 21:33):

There aren't Makefiles set up to work that way, unfortunately.

view this post on Zulip Dan Gohman (Mar 21 2024 at 21:33):

If there were, then it'd be reasonable :-)

view this post on Zulip Geoff Goodman (Mar 21 2024 at 21:35):

Gotcha. I've spent my novelty tokens and taken a line of credit already.

view this post on Zulip Dan Gohman (Mar 21 2024 at 21:36):

Do you have code you could publish somewhere, so I could take a look?

view this post on Zulip Geoff Goodman (Mar 21 2024 at 21:37):

Good idea. I'm not sure how to get my wasi-libc fork linked up to my rust project in a consumable way though.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 21:43):

I'm working from the bridge branch in https://github.com/ggoodman/quicky/tree/bridge. That should already be linked to my fork of rquickjs but the WASI_SDK reference in .cargo/config.toml is relative to my local check-out.

Contribute to ggoodman/quicky development by creating an account on GitHub.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 21:44):

I've been using this script to help w/ the cycle time between shots in the dark :joy: https://github.com/ggoodman/quicky/blob/4d2d859c765bb337a9791a4e1b2dc367925d5916/scripts/build.sh#L6-L8

view this post on Zulip Geoff Goodman (Mar 21 2024 at 21:47):

My fork of wasi-sdk: https://github.com/ggoodman/wasi-sdk/tree/strip-wasi-stdio

Points to wasi-libc that has these changes: https://github.com/WebAssembly/wasi-libc/compare/main...ggoodman:wasi-libc:prune_stdio

WASI-enabled WebAssembly C/C++ toolchain. Contribute to ggoodman/wasi-sdk development by creating an account on GitHub.
WASI libc implementation for WebAssembly. Contribute to WebAssembly/wasi-libc development by creating an account on GitHub.

view this post on Zulip Geoff Goodman (Mar 21 2024 at 21:47):

That would be a lot of work to repro. I might be able to get a setup where only quicky needs to be pulled.

view this post on Zulip Geoff Goodman (Mar 22 2024 at 01:21):

The workflow is now described and show be reproducible here: https://github.com/ggoodman/quicky/tree/bridge

Contribute to ggoodman/quicky development by creating an account on GitHub.

view this post on Zulip Geoff Goodman (Mar 22 2024 at 02:14):

A new potential lead is __stdio_exit that I'm seeing getting linked. It seems like __toread and __towrite might both cause those weak references to become strong references:

/* atexit.c and __stdio_exit.c override these. the latter is linked
 * as a consequence of linking either __toread.c or __towrite.c. */
weak_alias(dummy, __funcs_on_exit);
weak_alias(dummy, __stdio_exit);

I'm seeing this:

.../wasm32-wasi/libc.a(__stdio_close.o): lazy definition of __stdio_close
.../wasm32-wasi/libc.a(__stdio_seek.o): lazy definition of __stdio_seek
.../wasm32-wasi/libc.a(__stdio_write.o): lazy definition of __stdio_write
.../wasm32-wasi/libc.a(__toread.o): lazy definition of __toread
.../wasm32-wasi/libc.a(__towrite.o): lazy definition of __towrite
.../wasm32-wasi/libc.a(__overflow.o): reference to __towrite
.../wasm32-wasi/libc.a(__towrite.o): definition of __towrite
.../wasm32-wasi/libc.a(fwrite.o): reference to __towrite
.../wasm32-wasi/libc.a(stderr.o): lazy definition of stderr
.../wasm32-wasi/libc.a(stdout.o): reference to __stdio_close
.../wasm32-wasi/libc.a(__stdio_close.o): definition of __stdio_close
.../wasm32-wasi/libc.a(stdout.o): reference to __stdio_write
.../wasm32-wasi/libc.a(__stdio_write.o): definition of __stdio_write
.../wasm32-wasi/libc.a(stdout.o): reference to __stdio_seek
.../wasm32-wasi/libc.a(__stdio_seek.o): definition of __stdio_seek
.../wasm32-wasi/libc.a(stdout.o): definition of stdout
.../wasm32-wasi/libc.a(vfprintf.o): reference to __towrite
.../wasm32-wasi/libc.a(__uflow.o): reference to __toread
.../wasm32-wasi/libc.a(__toread.o): definition of __toread

view this post on Zulip Geoff Goodman (Mar 22 2024 at 02:18):

One nuance is that this project is designed to be run through wizer. I only see an obvious reference to __stdio_exit here:

  (func $__wasm_call_dtors (;630;) (type 72)
    call $#func629<dummy>
    call $__stdio_exit
  )
  (func $init.command_export (;631;) (type 31) (result i32)
    call $init
    call $__wasm_call_dtors
  )

view this post on Zulip Geoff Goodman (Mar 22 2024 at 02:27):

Is there a way to elect out of calling destructors?

view this post on Zulip Geoff Goodman (Mar 22 2024 at 13:42):

Lots of learning for me going down this rabbit hole but now I'm hoping to rein it back in. I think it's probably worth trying to come up with a minimal repro here. Any hypotheses on the shape of a minimal c library I can wrap in rust bindings to produce what we're seeing here?

view this post on Zulip Geoff Goodman (Mar 22 2024 at 17:24):

OK @Dan Gohman I think I have a pretty sweet, mostly minimal repro here: https://github.com/ggoodman/wasi-import-repro

The key finding is that the very presence of a printf call in a c library results in those fd_* imports getting pulled along even if we _know_ that the code calling printf is being eliminated by the linker.

Contribute to ggoodman/wasi-import-repro development by creating an account on GitHub.

view this post on Zulip Geoff Goodman (Mar 22 2024 at 17:33):

This is showing that a rust function that references a c binding referencing a call to printf is enough, even if the rust function is omitted by a feature flag.

view this post on Zulip Geoff Goodman (Mar 22 2024 at 17:52):

Filed an issue here: https://github.com/WebAssembly/wasi-sdk/issues/401

In this repo, I've shown two scenarios where code that is functionally identical produces very different WASM binaries. Setup The two scenarios I'm testing are for a rust cdylib whose code looks li...

view this post on Zulip Geoff Goodman (May 09 2024 at 15:46):

Thinking more about this, I have the hypothesis that it's the finalizers that could be holding onto strong references.

That made me wonder (at least for my use-case) if I actually _cared_ about finalizers and clean-up. The main use-case I have at work uses a WASM instance exactly once and throws it away. All work after producing a 'result' is essentially wasted cycles as far as I'm concerned.

So I wonder if tooling could be parametrized for this sort of use-case... Could we tell the wasi-sdk / wasi-libc that we don't actually care about clean-up? In other words, the host and runtime can handle that for us. I understand that this wouldn't be a general-purpose strategy which is why I'm wondering about some opt-in behaviour. Does it sound feasible?

view this post on Zulip Joel Dice (May 09 2024 at 15:53):

AFAIK, wasi-libc generally won't clean anything up unless you tell it to (e.g. by closing a file descriptor). So I think the answer in your case is: "don't call close" (which at the Rust level means e.g. wrapping a File in ManuallyDrop and never dropping it).

view this post on Zulip Joel Dice (May 09 2024 at 15:55):

But maybe you're asking for an optional build of wasi-libc where close is a no-op?

view this post on Zulip Geoff Goodman (May 10 2024 at 19:09):

I think the solution when using rust is a bit easier but the use-case that confounds me is when building a c library against wasi-libc--and especially one whose source code I don't control.


Last updated: Nov 22 2024 at 17:03 UTC