Stream: git-wasmtime

Topic: wasmtime / PR #4623 Implement strings in adapter modules


view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 15:17):

alexcrichton opened PR #4623 from adapt-string to main:

This commit is a hefty addition to Wasmtime's support for the component
model. This implements the final remaining type (in the current type
hierarchy) unimplemented in adapter module trampolines: strings. Strings
are the most complicated type to implement in adapter trampolines
because they are highly structured chunks of data in memory (according
to specific encodings). Additionally each lift/lower operation can
choose its own encoding for strings meaning that Wasmtime, the host, may
have to convert between any pairwise ordering of string encodings.

The CanonicalABI.md in the component-model repo in general specifies
all the fiddly bits of string encoding so there's not a ton of wiggle
room for Wasmtime to get creative. This PR largely "just" implements
that. The high-level architecture of this implementation is:

This means that the Module::translate_string in adapter modules is by
far the largest translation method. The main reason for this is due to
the management around calling the imported transcoder functions in the
face of validating string pointer/lengths and performing the dance of
realloc-vs-transcode at the right time. I've tried to ensure that each
individual case in transcoding is documented well enough to understand
what's going on as well.

Additionally in this PR is a full implementation in the host for the
latin1+utf16 encoding which means that both lifting and lowering host
strings now works with this encoding.

Currently the implementation of each transcoder function is likely far
from optimal. Where possible I've leaned on the standard library itself
and for latin1-related things I'm leaning on the encoding_rs crate. I
initially tried to implement everything with encoding_rs but was
unable to uniformly do so easily. For now I settled on trying to get a
known-correct (even in the face of endianness) implementation for all of
these transcoders. If an when performance becomes an issue it should be
possible to implement more optimized versions of each of these
transcoding operations.

Testing this commit has been somewhat difficult and my general plan,
like with the (list T) type, is to rely heavily on fuzzing to cover
the various cases here. In this PR though I've added a simple test that
pushes some statically known strings through all the pairs of encodings
between source and destination. I've attempted to pick "interesting"
strings that one way or another stress the various paths in each
transcoding operation to ideally get full branch coverage there.
Additionally a suite of "negative" tests have also been added to ensure
that validity of encoding is actually checked.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

Closes #4309

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 15:17):

alexcrichton requested fitzgen for a review on PR #4623.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 15:33):

alexcrichton updated PR #4623 from adapt-string to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 17:26):

alexcrichton updated PR #4623 from adapt-string to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 18:09):

fitzgen created PR review comment:

This isn't us being optimistic tho, right? This optimism is prescribed by the spec, correct?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 18:09):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 18:09):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 18:09):

fitzgen created PR review comment:

I feel like this method is already super large, so adding a bunch of large closures into it is not helping. How much stuff is this really closing over? Can it and the other closures be a separate method?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 18:09):

fitzgen created PR review comment:

I thought that we settled on bounds checks regardless if the string is empty or not?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 08 2022 at 15:29):

alexcrichton updated PR #4623 from adapt-string to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 08 2022 at 15:29):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 08 2022 at 15:29):

alexcrichton created PR review comment:

Sure, I've split out everything now.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 08 2022 at 15:29):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 08 2022 at 15:29):

alexcrichton created PR review comment:

Indeed!

view this post on Zulip Wasmtime GitHub notifications bot (Aug 08 2022 at 15:30):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 08 2022 at 15:30):

alexcrichton created PR review comment:

This was written before the spec changed, and I was originally going to land this first and then fixup all our handling of zero-length arrays and things but I went ahead and updated this.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 08 2022 at 15:30):

alexcrichton has enabled auto merge for PR #4623.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 08 2022 at 16:01):

alexcrichton merged PR #4623.


Last updated: Dec 23 2024 at 12:05 UTC