lpereira opened PR #8627 from lpereira:wasm-func-names-in-objs
to bytecodealliance:main
:
Instead of generating symbol names in the format
"wasm[$MODULE_ID]::function[$FUNCTION_INDEX]", generate (if possible) something more readable, such as "wasm[$MODULE_ID]::$FUNCTION_NAME". This helps when debugging or profiling the generated code.
lpereira requested alexcrichton for a review on PR #8627.
lpereira requested wasmtime-core-reviewers for a review on PR #8627.
Is it guaranteed that the names section only contains unique names?
alexcrichton commented on PR #8627:
Thanks! This makes sense to me and I've often thought this would be nice as well. A hesitation I've had in the past though is that I'm a bit wary about exposing user-provided data to native tools like
perf
andobjdump
and such. The name section has completely arbitrary strings and I while I can't say for certainty that native tooling behaves well in the face of arbitrary strings I suspect that there might be a security issue waiting to happen along the lines of "if you runldd
over a binary it can run arbitrary code" or something like that.(a good example being @bjorn3's suggestion where the
name
section has no guarantees of uniqueness and I'm not sure what tooling would do with same-named-symbols)
cfallin commented on PR #8627:
I wonder if we could define suitable "cleanup" for names: (i) rewrite duplicates with numeric suffixes, (ii) replace or remove any characters not in the usual set (
[A-Za-z0-9-_]
-- maybe spaces too? can tools handle those?), (iii) cap the length at something reasonable like 32 or 64 characters?
lpereira commented on PR #8627:
Could also add the function_index for the cases where uniqueness isn't guaranteed. Alright, I'll cook up a patch with the suggestions.
alexcrichton commented on PR #8627:
Yeah I think that'd work well (albeit bit a bit more involved in the implementation). While I can't say for certainty that spaces are not supported my guess is that tools would likely eventually choke on non-C-looking symbols so I'd be tempted to leave them out.
fitzgen commented on PR #8627:
How about
wasm[N]::func[M]
if no name section entrywasm[N]::func[m]: <name>
where<name>
is sanitized and has a capped length if there is an entry?
fitzgen edited a comment on PR #8627:
How about
wasm[N]::func[M]
if no name section entrywasm[N]::func[M]: <name>
where<name>
is sanitized and has a capped length if there is an entry?
This avoids needing to either add numbers into the sanitized name and then worrying about removing those numbers when capping the length, or needing to account for the length of the numbers that might be added after sanitizing and capping the name.
lpereira updated PR #8627.
lpereira commented on PR #8627:
Alright, pushed some changes that:
- Will filter symbols to only include alphanumeric characters and others that might be generated by name mangling and whatnot, and replace runs of them with a single '?'
- Always include the function index in the symbol name
- Truncate the symbol name to 96 characters
I think this should be sufficient, although I'm not really sold on how good the string filtering code is. I wish there was a
String::translate()
that would work similarly toString::retain()
, but replace the character with what the callback function returns. I'm a newbie in Rust, though, so maybe there's a way I couldn't figure out?
lpereira edited a comment on PR #8627:
Alright, pushed some changes that:
- Will filter symbols to only include alphanumeric characters and others that might be generated by name mangling and whatnot, and replace runs of them with a single '?'
- Always include the function index in the symbol name
- Truncate the symbol name to 96 characters
I think this should be sufficient, although I'm not really sold on how good the string filtering code is (in fact, it's pretty bad, as it always goes back to the beginning of the string to find the next match; I'll force-push a new thing soon). I wish there was a
String::translate()
that would work similarly toString::retain()
, but replace the character with what the callback function returns. I'm a newbie in Rust, though, so maybe there's a way I couldn't figure out?
lpereira updated PR #8627.
lpereira edited a comment on PR #8627:
Alright, pushed some changes that:
- Will filter symbols to only include alphanumeric characters and others that might be generated by name mangling and whatnot, and replace runs of them with a single '?'
- Always include the function index in the symbol name
- Truncate the symbol name to 96 characters
I think this should be sufficient. Can anyone please take a look again?
lpereira updated PR #8627.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
I think you can do something better with iterators, and also we'll want to avoid allocating a string in the common case -- perhaps something like:
- Define a predicate
fn bad_char(c: char) -> bool { ... }
inside this function- Guard on no bad chars, to allow the common case without allocations; if any bad chars, do an iterator stream filter
let name = if name.chars().any(|c| bad_char(c)) { Cow::Owned(name.chars().map(|c| if bad_char(c) { '?' } else { c }).collect::<String>()) } else { Cow::Borrowed(&name[..]) }
Thoughts?
lpereira updated PR #8627.
lpereira updated PR #8627.
lpereira submitted PR review.
lpereira created PR review comment:
Yup, right after I pushed this I found about the
map()
method. Let me useCow
too, as this should reduce quite a bit of writes in the happy path.
lpereira updated PR #8627.
lpereira submitted PR review.
lpereira created PR review comment:
Done, thanks!
lpereira updated PR #8627.
cfallin submitted PR review.
cfallin created PR review comment:
.take(N) in the iterator chain should allow the truncation to happen inline (and stop iteration early if so, since the whole chain is lazy)
lpereira updated PR #8627.
lpereira submitted PR review.
lpereira created PR review comment:
Done.
cfallin submitted PR review:
Looks reasonable to me, thanks for the iterations!
lpereira updated PR #8627.
lpereira submitted PR review.
lpereira created PR review comment:
Actually, no, I'm not sure how to do this; I want to truncate the string after
trim_matches('?')
removed repeated occurrences of'?'
in the cleaned string, not before.
cfallin has enabled auto merge for PR #8627.
cfallin has disabled auto merge for PR #8627.
cfallin commented on PR #8627:
Ah, force-pushed after my review; could you do subsequent updates as additional commits? Let me know when stable and I can re-review.
lpereira commented on PR #8627:
It's stable now. (I'm not using the
take()
method for the reason I stated above, so I reverted to the previous version.)
jameysharp submitted PR review:
This is going to be such a big usability improvement for people using tools like perf! I'm excited.
jameysharp submitted PR review:
This is going to be such a big usability improvement for people using tools like perf! I'm excited.
jameysharp created PR review comment:
Maybe you wanted is_ascii_alphanumeric here instead, rather than allowing all Unicode alpha/numeric codepoints?
let bad_char = |c: char| !c.is_ascii_alphanumeric() && !r"<>[]_-:@$".contains(c);
jameysharp created PR review comment:
I guess the name should still be truncated even if there are no bad characters, right?
Cow::Borrowed(&name[..Self::MAX_SYMBOL_LEN])
jameysharp created PR review comment:
You mention that you intended to turn any run of multiple
?
into a single?
. Note thattrim_matches
doesn't do that. Instead, it deletes all copies of the given pattern from both ends of the string. That's why it can return a borrowed&str
instead of a heap-allocatedString
: it doesn't change the middle of the string, so it can just adjust the start pointer and the length.I think this should do more or less what you want (though I'm typing it in GitHub's web editor without testing it so who knows). It deletes all
?
at the beginning of the string, then compresses any run of multiple?
into a single?
. It can leave a single?
at the end of the string but I couldn't think of a simple way to fix that and I don't think it matters. Also it usestake
like Chris suggested.let mut last_char = '?'; let name = name .chars() .map(|c| if bad_char(c) { '?' } else { c }) .filter(|c| { let keep = last_char != c; last_char = c; keep }) .take(Self::MAX_SYMBOL_LEN) .collect::<String>();
jameysharp commented on PR #8627:
Ah, the CI test failures are because
tests/disas.rs
is specifically looking for symbol names which contain bothwasm
andfunction
in their names, and you've shortened the latter tofunc
. This is near the beginning ofdisas_elf
.I don't like that this test is relying on this implementation detail of Wasmtime, but I suppose it's fine. So you can either change this PR to continue to use
function
, or changedisas.rs
to search forfunc
.Either way, you'll also need to run
WASMTIME_TEST_BLESS=1 cargo test --test disas
to update the tests to reflect this change. Some of them do have function names, and those now show up in the expected test output, which is a fantastic bonus.
lpereira created PR review comment:
Done!
lpereira submitted PR review.
lpereira submitted PR review.
lpereira created PR review comment:
Yup, had this change locally and forgot to push. Good catch!
lpereira updated PR #8627.
lpereira updated PR #8627.
lpereira commented on PR #8627:
Alright, thanks for the review! This should be the last round.
(@jameysharp, that filter wouldn't work -- words like "keep", which are fine, would be transformed to "kep" -- but I tweaked it to consider that case.)
jameysharp commented on PR #8627:
(@jameysharp, that filter wouldn't work -- words like "keep", which are fine, would be transformed to "kep" -- but I tweaked it to consider that case.)
Oh, you're absolutely right. :sweat_smile:
Looks good to me, let's ship it!
jameysharp merged PR #8627.
Last updated: Nov 22 2024 at 16:03 UTC