wasmtime / PR #8627 Use WASM function names in compiled o... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #8627 Use WASM function names in compiled o...

Wasmtime GitHub notifications bot (May 15 2024 at 17:16):

lpereira opened PR #8627 from lpereira:wasm-func-names-in-objs to bytecodealliance:main:

Instead of generating symbol names in the format
"wasm[$MODULE_ID]::function[$FUNCTION_INDEX]", generate (if possible) something more readable, such as "wasm[$MODULE_ID]::$FUNCTION_NAME". This helps when debugging or profiling the generated code.

Wasmtime GitHub notifications bot (May 15 2024 at 17:16):

lpereira requested alexcrichton for a review on PR #8627.

Wasmtime GitHub notifications bot (May 15 2024 at 17:16):

lpereira requested wasmtime-core-reviewers for a review on PR #8627.

Wasmtime GitHub notifications bot (May 15 2024 at 17:21):

bjorn3 commented on PR #8627:

Is it guaranteed that the names section only contains unique names?

Wasmtime GitHub notifications bot (May 15 2024 at 17:23):

alexcrichton commented on PR #8627:

Thanks! This makes sense to me and I've often thought this would be nice as well. A hesitation I've had in the past though is that I'm a bit wary about exposing user-provided data to native tools like perf and objdump and such. The name section has completely arbitrary strings and I while I can't say for certainty that native tooling behaves well in the face of arbitrary strings I suspect that there might be a security issue waiting to happen along the lines of "if you run ldd over a binary it can run arbitrary code" or something like that.

(a good example being @bjorn3's suggestion where the name section has no guarantees of uniqueness and I'm not sure what tooling would do with same-named-symbols)

Wasmtime GitHub notifications bot (May 15 2024 at 17:37):

cfallin commented on PR #8627:

I wonder if we could define suitable "cleanup" for names: (i) rewrite duplicates with numeric suffixes, (ii) replace or remove any characters not in the usual set ([A-Za-z0-9-_] -- maybe spaces too? can tools handle those?), (iii) cap the length at something reasonable like 32 or 64 characters?

Wasmtime GitHub notifications bot (May 15 2024 at 17:46):

lpereira commented on PR #8627:

Could also add the function_index for the cases where uniqueness isn't guaranteed. Alright, I'll cook up a patch with the suggestions.

Wasmtime GitHub notifications bot (May 15 2024 at 17:47):

alexcrichton commented on PR #8627:

Yeah I think that'd work well (albeit bit a bit more involved in the implementation). While I can't say for certainty that spaces are not supported my guess is that tools would likely eventually choke on non-C-looking symbols so I'd be tempted to leave them out.

Wasmtime GitHub notifications bot (May 15 2024 at 18:02):

fitzgen commented on PR #8627:

How about

wasm[N]::func[M] if no name section entry

wasm[N]::func[m]: <name> where <name> is sanitized and has a capped length if there is an entry

?

Wasmtime GitHub notifications bot (May 15 2024 at 18:03):

fitzgen edited a comment on PR #8627:

How about

wasm[N]::func[M] if no name section entry

wasm[N]::func[M]: <name> where <name> is sanitized and has a capped length if there is an entry

?

This avoids needing to either add numbers into the sanitized name and then worrying about removing those numbers when capping the length, or needing to account for the length of the numbers that might be added after sanitizing and capping the name.

Wasmtime GitHub notifications bot (May 15 2024 at 18:41):

lpereira updated PR #8627.

Wasmtime GitHub notifications bot (May 15 2024 at 18:46):

lpereira commented on PR #8627:

Alright, pushed some changes that:

Will filter symbols to only include alphanumeric characters and others that might be generated by name mangling and whatnot, and replace runs of them with a single '?'

Always include the function index in the symbol name

Truncate the symbol name to 96 characters

I think this should be sufficient, although I'm not really sold on how good the string filtering code is. I wish there was a String::translate() that would work similarly to String::retain(), but replace the character with what the callback function returns. I'm a newbie in Rust, though, so maybe there's a way I couldn't figure out?

Wasmtime GitHub notifications bot (May 15 2024 at 18:50):

lpereira edited a comment on PR #8627:

Alright, pushed some changes that:

Will filter symbols to only include alphanumeric characters and others that might be generated by name mangling and whatnot, and replace runs of them with a single '?'

Always include the function index in the symbol name

Truncate the symbol name to 96 characters

I think this should be sufficient, although I'm not really sold on how good the string filtering code is (in fact, it's pretty bad, as it always goes back to the beginning of the string to find the next match; I'll force-push a new thing soon). I wish there was a String::translate() that would work similarly to String::retain(), but replace the character with what the callback function returns. I'm a newbie in Rust, though, so maybe there's a way I couldn't figure out?

Wasmtime GitHub notifications bot (May 15 2024 at 19:05):

lpereira updated PR #8627.

Wasmtime GitHub notifications bot (May 15 2024 at 19:06):

lpereira edited a comment on PR #8627:

Alright, pushed some changes that:

Will filter symbols to only include alphanumeric characters and others that might be generated by name mangling and whatnot, and replace runs of them with a single '?'

Always include the function index in the symbol name

Truncate the symbol name to 96 characters

I think this should be sufficient. Can anyone please take a look again?

Wasmtime GitHub notifications bot (May 15 2024 at 19:07):

lpereira updated PR #8627.

Wasmtime GitHub notifications bot (May 15 2024 at 19:08):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (May 15 2024 at 19:08):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (May 15 2024 at 19:08):

cfallin created PR review comment:

I think you can do something better with iterators, and also we'll want to avoid allocating a string in the common case -- perhaps something like:

Define a predicate fn bad_char(c: char) -> bool { ... } inside this function

Guard on no bad chars, to allow the common case without allocations; if any bad chars, do an iterator stream filter
let name = if name.chars().any(|c| bad_char(c)) {
  Cow::Owned(name.chars().map(|c| if bad_char(c) { '?' } else { c }).collect::<String>())
} else {
  Cow::Borrowed(&name[..])
}
Thoughts?

Wasmtime GitHub notifications bot (May 15 2024 at 19:09):

lpereira updated PR #8627.

Wasmtime GitHub notifications bot (May 15 2024 at 19:16):

lpereira updated PR #8627.

Wasmtime GitHub notifications bot (May 15 2024 at 19:18):

lpereira submitted PR review.

Wasmtime GitHub notifications bot (May 15 2024 at 19:18):

lpereira created PR review comment:

Yup, right after I pushed this I found about the map() method. Let me use Cow too, as this should reduce quite a bit of writes in the happy path.

Wasmtime GitHub notifications bot (May 15 2024 at 20:28):

lpereira updated PR #8627.

Wasmtime GitHub notifications bot (May 15 2024 at 20:29):

lpereira submitted PR review.

Wasmtime GitHub notifications bot (May 15 2024 at 20:29):

lpereira created PR review comment:

Done, thanks!

Wasmtime GitHub notifications bot (May 15 2024 at 20:30):

lpereira updated PR #8627.

Wasmtime GitHub notifications bot (May 15 2024 at 20:37):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (May 15 2024 at 20:37):

cfallin created PR review comment:

.take(N) in the iterator chain should allow the truncation to happen inline (and stop iteration early if so, since the whole chain is lazy)

Wasmtime GitHub notifications bot (May 15 2024 at 20:47):

lpereira updated PR #8627.

Wasmtime GitHub notifications bot (May 15 2024 at 20:47):

lpereira submitted PR review.

Wasmtime GitHub notifications bot (May 15 2024 at 20:47):

lpereira created PR review comment:

Done.

Wasmtime GitHub notifications bot (May 15 2024 at 20:59):

cfallin submitted PR review:

Looks reasonable to me, thanks for the iterations!

Wasmtime GitHub notifications bot (May 15 2024 at 20:59):

lpereira updated PR #8627.

Wasmtime GitHub notifications bot (May 15 2024 at 20:59):

lpereira submitted PR review.

Wasmtime GitHub notifications bot (May 15 2024 at 20:59):

lpereira created PR review comment:

Actually, no, I'm not sure how to do this; I want to truncate the string after trim_matches('?') removed repeated occurrences of '?' in the cleaned string, not before.

Wasmtime GitHub notifications bot (May 15 2024 at 20:59):

cfallin has enabled auto merge for PR #8627.

Wasmtime GitHub notifications bot (May 15 2024 at 21:00):

cfallin has disabled auto merge for PR #8627.

Wasmtime GitHub notifications bot (May 15 2024 at 21:00):

cfallin commented on PR #8627:

Ah, force-pushed after my review; could you do subsequent updates as additional commits? Let me know when stable and I can re-review.

Wasmtime GitHub notifications bot (May 15 2024 at 23:17):

lpereira commented on PR #8627:

It's stable now. (I'm not using the take() method for the reason I stated above, so I reverted to the previous version.)

Wasmtime GitHub notifications bot (May 16 2024 at 05:11):

jameysharp submitted PR review:

This is going to be such a big usability improvement for people using tools like perf! I'm excited.

Wasmtime GitHub notifications bot (May 16 2024 at 05:11):

jameysharp submitted PR review:

This is going to be such a big usability improvement for people using tools like perf! I'm excited.

Wasmtime GitHub notifications bot (May 16 2024 at 05:11):

jameysharp created PR review comment:

Maybe you wanted is_ascii_alphanumeric here instead, rather than allowing all Unicode alpha/numeric codepoints?
        let bad_char = |c: char| !c.is_ascii_alphanumeric() && !r"<>[]_-:@$".contains(c);

Wasmtime GitHub notifications bot (May 16 2024 at 05:11):

jameysharp created PR review comment:

I guess the name should still be truncated even if there are no bad characters, right?
            Cow::Borrowed(&name[..Self::MAX_SYMBOL_LEN])

Wasmtime GitHub notifications bot (May 16 2024 at 05:11):

jameysharp created PR review comment:

You mention that you intended to turn any run of multiple ? into a single ?. Note that trim_matches doesn't do that. Instead, it deletes all copies of the given pattern from both ends of the string. That's why it can return a borrowed &str instead of a heap-allocated String: it doesn't change the middle of the string, so it can just adjust the start pointer and the length.

I think this should do more or less what you want (though I'm typing it in GitHub's web editor without testing it so who knows). It deletes all ? at the beginning of the string, then compresses any run of multiple ? into a single ?. It can leave a single ? at the end of the string but I couldn't think of a simple way to fix that and I don't think it matters. Also it uses take like Chris suggested.
            let mut last_char = '?';
            let name = name
                .chars()
                .map(|c| if bad_char(c) { '?' } else { c })
                .filter(|c| { let keep = last_char != c; last_char = c; keep })
                .take(Self::MAX_SYMBOL_LEN)
                .collect::<String>();

Wasmtime GitHub notifications bot (May 16 2024 at 05:33):

jameysharp commented on PR #8627:

Ah, the CI test failures are because tests/disas.rs is specifically looking for symbol names which contain both wasm and function in their names, and you've shortened the latter to func. This is near the beginning of disas_elf.

I don't like that this test is relying on this implementation detail of Wasmtime, but I suppose it's fine. So you can either change this PR to continue to use function, or change disas.rs to search for func.

Either way, you'll also need to run WASMTIME_TEST_BLESS=1 cargo test --test disas to update the tests to reflect this change. Some of them do have function names, and those now show up in the expected test output, which is a fantastic bonus.

Wasmtime GitHub notifications bot (May 16 2024 at 16:25):

lpereira created PR review comment:

Done!

Wasmtime GitHub notifications bot (May 16 2024 at 16:25):

lpereira submitted PR review.

Wasmtime GitHub notifications bot (May 16 2024 at 16:26):

lpereira submitted PR review.

Wasmtime GitHub notifications bot (May 16 2024 at 16:26):

lpereira created PR review comment:

Yup, had this change locally and forgot to push. Good catch!

Wasmtime GitHub notifications bot (May 16 2024 at 16:29):

lpereira updated PR #8627.

Wasmtime GitHub notifications bot (May 16 2024 at 16:39):

lpereira updated PR #8627.

Wasmtime GitHub notifications bot (May 16 2024 at 16:40):

lpereira commented on PR #8627:

Alright, thanks for the review! This should be the last round.

(@jameysharp, that filter wouldn't work -- words like "keep", which are fine, would be transformed to "kep" -- but I tweaked it to consider that case.)

Wasmtime GitHub notifications bot (May 16 2024 at 17:42):

jameysharp commented on PR #8627:

(@jameysharp, that filter wouldn't work -- words like "keep", which are fine, would be transformed to "kep" -- but I tweaked it to consider that case.)

Oh, you're absolutely right. :sweat_smile:

Looks good to me, let's ship it!

Wasmtime GitHub notifications bot (May 16 2024 at 17:58):

jameysharp merged PR #8627.

Last updated: Apr 16 2025 at 22:03 UTC