cfallin commented on issue #1453:
I'm doing some gardening on old issues and I think the above may not be as relevant anymore: we've removed some of the old-backend-isms from the emission and sink design, IIRC. Updated suggestions based on the current state of our implementation are certainly welcome! In the meantime I will go ahead and close this issue.
cfallin closed issue #1453:
Cranelift is used in Spidermonkey, as part of an integration called Baldrdash. In Baldrdash, we're currently barely using the
RelocSink
capabilities, because the information it provides us is incomplete:
- when tracking call relocations, we'd need to get the source loc information. This is useful for stack frame iteration, e.g. displaying the wasm bytecode offset in stack traces, when an error occurs.
- indirect calls don't have relocations, but Spidermonkey needs to track the call site during stack frame iteration, again (the return address is used as a key for a mapping from call sites to call metadata, including the wasm bytecode offset).
- stack maps in Spidermonkey are keyed by the address of the instruction following the instruction to which the stack map relates. This is because when we want to get a stack map for a call/software interrupt, we're usually in the middle of stack frame iteration, and we only have access to the return address for calls, or the address of the next value of PC for software interrupts. Currently, the StackMapSink doesn't offer the information of the next instruction's address, so we have to do something very manual for this (iterate over every instruction, if it's a safepoint remember it for the next instruction, if there was a remembered safepoint add the stackmap with the instruction+size offset to the list of stackmaps). An example may help understanding here:
; Cranelift IR safepoint call $0; for instance iadd ; Machine code ; safepoint generates nothing call {reloc} iadd ; effectively the return address of the above call, so the address we have access to when looking up a stack map for the call during stack frame iteration.
For the new backend integration, we won't be able to do this anymore, for several reasons:
- in general, the new backend will be much more likely to generate several machine instructions for a single Cranelift IR opcode, which means we can't use the
InstOffsetIter
correctly anymore, and rely on this to give us precise offset information.- the new backend doesn't implement
encinfo
, it doesn't have a notion ofDiversions
(regalloc does the right thing now!), etc.My proposal is the following:
- Pass the current SourceLoc to
MemoryCodeSink::reloc_external
, so it addresses item 1.- Add a method
RelocSink::track_call_site
(or create a new sink for this), that is called with: IR opcode, source location, address of next instruction (= return address). This will be sufficient to address item 2, and could be generally useful for all kinds of calls. At the limit, we don't even need to pass the SourceLoc toreloc_external
as proposed in the previous bullet point, since theRelocSink
impl could take care of realizing it's notified with both a direct call site here and an external reloc for this call site, but it feels less ergonomic an API to use.- Among wasmtime + cranelift crates and Spidermonkey, all users are NullStackmapSink, except for Spidermonkey. So I'd propose that we do something very ad-hoc here, by adding the address of the next instruction following the instruction to which the stack map applies. This would address item 3 in the above requirements.
Alternatives I've considered:
- for all the bullet points, add a generic
InstSink
which would be called for each instruction with the IR opcode, machine function-start relative offset, machine instruction size, source location. It feels overly generic for the problem we have at hand, but then we have a very general mechanism that might help for more use cases. I'm not sure it'd deal well with the 1 -> N mapping between IR opcode and machine instructions (in the new backend), unless the client had some knowledge of the machine instructions (which doesn't seem very ergonomic), or we passed the machine encoding (which we don't have in the new backend).- for 3, spidermonkey could remember the stackmap, and when inserting it in our own Spidermonkey relocations, pattern-match the machine instruction's encoding + size. It means one pattern-matching for each possible machine instruction after the stack map (currently, this means 3: software trap, call, call indirect), times the number of different architectures (currently 1, x86_64). That seems awful.
I will implement my proposal locally, but I'd be happy to read what people think of it!
Last updated: Dec 23 2024 at 12:05 UTC