cfallin commented on issue #3645:
Thanks! I ran a quick sanity-check with
bz2
and found no perf regressions. A full sightglass run is usually an overnight ordeal if I give it enough iterations to be meaningful and last time I tried it there were still some weird bimodal-performance gremlins, so I'm hesitant to block anything on that. It's certainly the case that this change could regress FP-heavy code that spills registers, though; no benchmarks needed for that conclusion.The reason I feel pretty strongly about doing it anyway is that there are "unknown unknowns": once I realized that regalloc.rs merges all vregs linked by move-elision into one equivalence class and gives us the type of an arbitrary one, that means we need to carefully audit all of our uses of move and move-like instructions for type safety.
raw_bitcast
is one example, but there may be others. In short, it's an invariant we didn't stick to initially, so now we've lost it and we need to do a lot of work before we can rely on it again.This is a safety issue that @julian-seward1 actually pointed out back in Jan 2020 when we were designing the regalloc.rs API; I pushed for using the vreg type to use smaller spillslots where possible, but given that this has now caused one real CVE and would have caused another CVE here if we had released SIMD support, I think that was a mistake, or at least we didn't adequately consider how to maintain the type-safety in the IR-to-registers mapping.
One could reasonably ask: how do other compilers get this right, and use spillslots only as large as necessary? The answer is better discipline in all of the lowering patterns. I think one way of getting this would be to actually require an IR-level type to be passed down into the register allocator for each vreg; it would be opaque to the regalloc, but would be checked for every move (and would cause a panic if a cross-type move occurs). If we do that and fuzz all the panics out, we could probably move back to "spillslot only as large as actual type". I'll file an issue for this, and probably the right time to consider it is at the same time as we move over to regalloc2.
Hopefully that seems reasonable; thoughts?
alexcrichton commented on issue #3645:
A full sightglass run is usually an overnight ordeal [..] so I'm hesitant to block anything on that
cc @fitzgen and @jlb6740 while not super relevant to this PR per-se this seems relevant to sightglass.
Hopefully that seems reasonable; thoughts?
That all sound reasonable to me, yeah, thanks for writing that up! I like the idea of passing opaque types to regalloc and regalloc could also possibly automatically determine that a move between two vregs isn't an equivalence class of vregs if the vregs have different types, assigning them separate live ranges and such.
fitzgen commented on issue #3645:
It isn't an overnight thing for me (at least for the realistic benchmarks, I don't run the shootout ones) usually just half an hour or so.
cfallin commented on issue #3645:
It isn't an overnight thing for me (at least for the realistic benchmarks, I don't run the shootout ones) usually just half an hour or so.
Ah, OK, it's certainly possible I was holding it wrong; I remember cranking up the iterations and letting it run for a long time to try to get reasonable confidence intervals; and I remember you mentioning something similar (... on some searching, this comment) but maybe it's improved. Anyway, a separate topic; I'd love to be able to just ping a GitHub bot for this :-)
abrown commented on issue #3645:
I'm not a big fan of using extra stack space but I guess it makes sense as a temporary measure.
that means we need to carefully audit all of our uses of move and move-like instructions for type safety.
raw_bitcast
is one example, but there may be others.I think the existence of
raw_bitcast
is an indicator of a leaky abstraction in the IR and APIs. Ideally we could resolve that issue and get rid of it. Having to carefully audit moves reinforces this impression for me: whoever is lowering should not need to also worry about regalloc beyond reserving a temporary register once in a while.One could reasonably ask: how do other compilers get this right, and use spillslots only as large as necessary? The answer is better discipline in all of the lowering patterns.
I don't know if it is "programmer discipline" as much as "abstraction discipline." In V8, there are surely ways to make a mistake during lowering but my opinion is that this area is designed to be easier to use (no/less invariants to remember).
I think one way of getting this would be to actually require an IR-level type to be passed down into the register allocator for each vreg; it would be opaque to the regalloc, but would be checked for every move (and would cause a panic if a cross-type move occurs).
An alternate suggestion: perhaps regalloc should be smarter about
RegClass
and have a way to express that, e.g., "RegClass::F64
fits in the lower bits ofRegClass::V128
" and the like. That way the IR types can map 1-to-1 toRegClass
and the abstraction will not leak.
cfallin commented on issue #3645:
I just ran a few Sightglass benchmarks (
blake3
,pulldown-cmark
,bz2
,meshoptimizer
) and saw no difference in runtime. I suspect we'd see more if we had benchmarks with lots of live FP values, but we don't seem to have any like that at the moment.
cfallin commented on issue #3645:
I don't know if it is "programmer discipline" as much as "abstraction discipline."
Yes, absolutely, "discipline in lowering patterns" was ambiguous, sorry! We should build guardrails into the abstractions that prevent us from doing the wrong thing by mistake or unknowingly; we clearly didn't do enough here.
Last updated: Nov 22 2024 at 16:03 UTC