Stream: git-wasmtime

Topic: wasmtime / PR #7746 unionfind: robustly avoid changing `I...


view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2024 at 11:01):

meithecatte requested wasmtime-compiler-reviewers for a review on PR #7746.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2024 at 11:01):

meithecatte opened PR #7746 from meithecatte:gvn-reasoned-merge to bytecodealliance:main:

Stop hoping that keeping the smallest Idx as canonical will yield good results, and instead explicitly inform the UnionFind of what we expect not to move.

Fixes #6126

I attempted running the sightglass benchmark on this, but I think my setup was too noisy – any differences in performance disappeared when I re-ran the specific benchmark on which the difference was reported (though I only did it manually for the few first reported differences).

view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2024 at 11:01):

meithecatte requested fitzgen for a review on PR #7746.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2024 at 11:06):

meithecatte updated PR #7746.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2024 at 11:09):

meithecatte commented on PR #7746:

cc @cfallin

view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2024 at 20:58):

fitzgen submitted PR review:

I don't feel super confident reviewing this on my own. I think it needs a second pair of eyes from either @cfallin (on parental leave till end of February IIRC) or @jameysharp (too busy with other things to review stuff at the moment). So I think this PR might need to wait for a bit before it can be merged.

That said, the pinning feels a bit funky to me, in that it breaks down when we union two pinned ids. Would it be possible to turn that warning into a panic? Or is it expected that this will happen and we just have to deal with it as best as we can?

view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2024 at 20:59):

fitzgen commented on PR #7746:

I attempted running the sightglass benchmark on this, but I think my setup was too noisy – any differences in performance disappeared when I re-ran the specific benchmark on which the difference was reported (though I only did it manually for the few first reported differences).

Are you on linux? If so you can try measuring instructions retired (pass -m insts-retired), which is much much less noisy. I can usually get good results with just a single process and ten iterations.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2024 at 22:03):

jameysharp commented on PR #7746:

Although I haven't carefully reviewed this PR, I'm cautiously optimistic about it. The filetest changes look promising and the general approach feels plausible. Making use of a magic value for "rank" is a clever way to get this behavior almost for free out of an otherwise-standard union-find implementation.

I agree with @fitzgen that it feels unsatisfying to have the case of merging two pinned IDs just print a warning. But it's better than the _status quo_ where we get the same result without any indication that something went wrong.

At worst, this PR would be a good starting point for evaluating how often we have this problem in practice. My guess is we don't see #6126 come up very often in normal use of Cranelift, in which case it's good that this PR is a very small change, algorithmically speaking. But maybe my guess is wrong and it'd be nice to discover that.

I would also be interested to know what fraction of nodes in the union-find do _not_ get pinned with this PR. If we pinned everything before merging, this approach would just warn on every merge. If we merge things and then pin them afterward though, that's fine. I can't really focus right now to picture how this works out but it'd be nice to develop a better understanding.

@meithecatte, if you're using Linux and want lower-variance performance measurements, I've documented the rather extreme measures I use to isolate a CPU core for benchmarks. You don't need performance measurements to justify merging this PR, but if there's a measurable improvement, that's definitely a nice extra argument in favor.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2024 at 22:09):

elliottt requested elliottt for a review on PR #7746.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 05 2024 at 00:02):

meithecatte commented on PR #7746:

I agree with @fitzgen that it feels unsatisfying to have the case of merging two pinned IDs just print a warning. But it's better than the _status quo_ where we get the same result without any indication that something went wrong.

If it's any consolation, this happening is in the "missed optimization" category, and won't cause a miscompilation or ICE. See https://github.com/bytecodealliance/wasmtime/issues/6126#issuecomment-1490776800

I would also be interested to know what fraction of nodes in the union-find do _not_ get pinned with this PR. If we pinned everything before merging, this approach would just warn on every merge. If we merge things and then pin them afterward though, that's fine. I can't really focus right now to picture how this works out but it'd be nice to develop a better understanding.

The test suite passes if I replace the log::warn with a panic. I've convinced myself that the only situation in which we union two pinned eclasses, is if there's a rewrite rule (or a chain of them) that lets us derive A -> B, and separately A -> C, and yet there's no way to go B -> C nor C -> B in isolation. Intuitively, it seems unlikely that we would have rules where this could arise, but the problem of analyzing a ruleset to reliably decide if it's possible seems like a thesis.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 05 2024 at 00:21):

meithecatte commented on PR #7746:

So, it looks like the warn! isn't shown during cargo test, and in normal runs needs an explicit RUST_LOG=warn or higher. Sounds like the likelyhood of the warning being seen is quite low. I suppose that one option would be to have this be a debug_assert! – scream if we're in debug mode (i.e. someone's probably observing), carry on if we're in release mode (i.e. probably deployed somewhere).

I also tried running the benchmarks as -m insts-retired, as you suggested. This ended as follows:

~/src/sightglass$ time cargo run -- benchmark -m insts-retired --small-workloads --processes 1 --engine /tmp/wasmtime_main.so --engine ~/dev/wasmtime/target/release/libwasmtime_bench_api.so -- benchmarks/all.suite
warning: virtual workspace defaulting to `resolver = "1"` despite one or more workspace members being on edition 2021 which implies `resolver = "2"`
note: to keep the current resolver, specify `workspace.resolver = "1"` in the workspace root's manifest
note: to use the edition 2021 resolver, specify `workspace.resolver = "2"` in the workspace root's manifest
note: for more details see https://doc.rust-lang.org/cargo/reference/resolver.html#resolver-versions
    Finished dev [unoptimized + debuginfo] target(s) in 0.06s
     Running `target/debug/sightglass-cli benchmark -m insts-retired --small-workloads --processes 1 --engine /tmp/wasmtime_main.so --engine /home/maya/dev/wasmtime/target/release/libwasmtime_bench_api.so -- benchmarks/all.suite`
Error: The variance of one of the samples is zero

real    8m15.667s
user    13m56.404s
sys 4m19.762s

view this post on Zulip Wasmtime GitHub notifications bot (Jan 05 2024 at 01:01):

elliottt submitted PR review:

I like this approach! The trick of using the rank to pick the parent is great. I agree with the comments about using a warning when merging two pinned eclasses, and really wish that we could confidently make that a panic instead.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 05 2024 at 01:01):

elliottt submitted PR review:

I like this approach! The trick of using the rank to pick the parent is great. I agree with the comments about using a warning when merging two pinned eclasses, and really wish that we could confidently make that a panic instead.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 05 2024 at 01:01):

elliottt created PR review comment:

It would be nice if we could change this name to something like pin_index or pin_eclass, as pin already has a somewhat special meaning in rust, and skimming this code could give the wrong idea about what's going on.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 05 2024 at 01:01):

elliottt created PR review comment:

Given the discussion on this PR, should this comment change? Maybe pointing out that
1) two pinned eclasses aren't likely to be merged, 2) a rough outline of when that might happen, and 3) that the effect of that would be at worst the behavior prior to pinning.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 05 2024 at 01:01):

elliottt edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 05 2024 at 01:04):

cfallin commented on PR #7746:

Thanks for tackling this @meithecatte ! As @fitzgen noted, I'm still on parental leave until Feb 12 so I can't review anything properly until then, but I'm watching email and will try not to block things when my input is useful!

I'd agree that this is generally a good approach, with the caveat that we must not warn, and especially not debug-assert, if two pinned values are union'd. As @meithecatte noted, analyzing the rules to prove this can't occur is a hard problem and above the abstraction level we likely want to work at. This case shouldn't happen in normal use (values numbered sequentially as built, rules generally rewriting toward better expressions), but compilation is still correct if it does happen (as noted in this comment that @meithecatte linked), and moreover the entire point of aegraphs is to approximate egraphs with lower cost -- this means that we can expect suboptimal outcomes in some cases (this is the price we pay for no fixpoint loop).

So, since it can happen in normal operation, and is not incorrect, we must not panic (even in a debug build), and we probably shouldn't warn -- that has the potential to create a log-storm issue at the most inopportune time (e.g. a DoS vector in production where the compiler was happy before). My vote would be to choose the lower ID to break the tie, as we do today, but either choice should be valid I think.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 05 2024 at 01:17):

meithecatte commented on PR #7746:

So, since it can happen in normal operation, and is not incorrect, we must not panic (even in a debug build), and we probably shouldn't warn -- that has the potential to create a log-storm issue at the most inopportune time (e.g. a DoS vector in production where the compiler was happy before).

Good point. Perhaps something like the Linux kernel's WARN_ONCE?

My vote would be to choose the lower ID to break the tie, as we do today, but either choice should be valid I think.

Perhaps it indeed makes sense to use this as a heuristic.

I'd agree that this is generally a good approach, with the caveat that we must not warn, and especially not debug-assert, if two pinned values are union'd. As @meithecatte noted, analyzing the rules to prove this can't occur is a hard problem and above the abstraction level we likely want to work at. This case shouldn't happen in normal use (values numbered sequentially as built, rules generally rewriting toward better expressions), but compilation is still correct if it does happen (as noted in https://github.com/bytecodealliance/wasmtime/issues/6126#issuecomment-1490776800 that @meithecatte linked), and moreover the entire point of aegraphs is to approximate egraphs with lower cost -- this means that we can expect suboptimal outcomes in some cases (this is the price we pay for no fixpoint loop).

Indeed, though I'd argue that this case triggering would probably mean that there's a reasonable rewrite rule that we haven't implemented. I don't know about you, but I'd like to know if it happened.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 05 2024 at 01:29):

cfallin commented on PR #7746:

Indeed, though I'd argue that this case triggering would _probably_ mean that there's a reasonable rewrite rule that we haven't implemented. I don't know about you, but I'd like to know if it happened.

Sure, agreed it would be nice to get this information! In my mind, it looks more like a "compiler stat", like a regalloc spill, or a worst-case fallback isel rule firing: regrettable, but not rising to the level of warn!. Remember that we have a lot of embedders of wasmtime in, e.g., server runtimes and game engines and such, and folks may, e.g., pipe warnings to devops folks (Slack channels and such); an "I should address this" missing optimization to the compiler engineer is a far cry from a "disk is almost full" or "frame dropped" message or something like that.

Or another way perhaps: it is an implementation quality signal rather than a runtime urgency signal. Warnings should occur when the person operating the software might want to take some action, not when the compiler engineer might want to add an optimization.

We have a nascent "stats framework" idea, which appears in the aegraphs framework and in regalloc2 as well; perhaps "two pinned values union'd" is another stat event and whenever we go fishing for optimization work to do, we notice if the value is nonzero?

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2024 at 06:05):

meithecatte updated PR #7746.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2024 at 07:09):

meithecatte updated PR #7746.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2024 at 07:10):

meithecatte submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2024 at 07:10):

meithecatte created PR review comment:

I expanded upon the comments I've added. Let me know if they're up to your expectations now.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2024 at 07:12):

meithecatte commented on PR #7746:

We have a nascent "stats framework" idea, which appears in the aegraphs framework and in regalloc2 as well; perhaps "two pinned values union'd" is another stat event and whenever we go fishing for optimization work to do, we notice if the value is nonzero?

Okay, you've convinced me, I made it a stat. It's a bit of a hack at the moment – should I pass in a &mut Stats as a separate argument to union instead? Doing it this way seemed less ugly...

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2024 at 07:12):

meithecatte edited a comment on PR #7746:

We have a nascent "stats framework" idea, which appears in the aegraphs framework and in regalloc2 as well; perhaps "two pinned values union'd" is another stat event and whenever we go fishing for optimization work to do, we notice if the value is nonzero?

Okay, you've convinced me, I made it a stat. It's a bit of a hack – should I pass in a &mut Stats as a separate argument to union instead? Doing it this way seemed less ugly...

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2024 at 17:41):

fitzgen submitted PR review:

I don't feel super confident reviewing this on my own. I think it needs a second pair of eyes from @cfallin (on parental leave till end of February IIRC). So I think this PR might need to wait for a bit before it can be merged.

That said, the pinning feels a bit funky to me, in that it breaks down when we union two pinned ids. Would it be possible to turn that warning into a panic? Or is it expected that this will happen and we just have to deal with it as best as we can?

view this post on Zulip Wasmtime GitHub notifications bot (Jan 09 2024 at 22:37):

jameysharp commented on PR #7746:

Okay, you've convinced me, I made it a stat. It's a bit of a hack – should I pass in a &mut Stats as a separate argument to union instead? Doing it this way seemed less ugly...

Yeah, I think your approach is perfectly reasonable, and better than trying to pass the whole stats struct in repeatedly.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 11 2024 at 16:22):

cfallin commented on PR #7746:

Looks good to me too! I'll allow someone else to do the detailed review since I'm still technically on leave but all my concerns are addressed :-)

view this post on Zulip Wasmtime GitHub notifications bot (Feb 11 2024 at 19:13):

cfallin commented on PR #7746:

@meithecatte would you mind rebasing on the latest main, just to be sure everything is still good (since some other egraphs-related changes have been made in the meantime)?

Sorry for the delay on this; I'm back from leave officially tomorrow and will review and merge if tests are green.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 12 2024 at 03:04):

meithecatte updated PR #7746.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 12 2024 at 03:46):

meithecatte commented on PR #7746:

@meithecatte would you mind rebasing on the latest main, just to be sure everything is still good (since some other egraphs-related changes have been made in the meantime)?

Sure thing! No conflicts, and the tests passed on my end.

Sorry for the delay on this; I'm back from leave officially tomorrow and will review and merge if tests are green.

Don't worry, I don't mind.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 12 2024 at 16:20):

cfallin submitted PR review:

LGTM, thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Feb 12 2024 at 17:00):

cfallin merged PR #7746.


Last updated: Nov 22 2024 at 16:03 UTC