cfallin commented on Issue #2614:
Sorry, in this case the doc-comment is inaccurate (my fault for the ambiguity!) -- the requirement imposed by the emission code, and what should be documented, is that
gen_nop()
returns a nonzero-sized NOP if the requested size is nonzero. An infinite loop could arise only if that is violated.I don't think it makes sense to otherwise artificially limit the interface with an assert -- a zero-sized NOP is a legitimate need in some cases (e.g. when one must return an instruction but no machine code is needed).
Would you mind changing the PR to update the doc-comment on the
MachInst
interface instead? Thanks!
cfallin commented on Issue #2614:
Ah, actually, there is one thing that this change fixes:
gen_nop(x)
withx == 16*n
would produce a zero-sized NOP. That won't arise with current alignment constants, but it's a potential future issue. I think what we actually want isstd::cmp::min(x, 16)
(i.e., clamp to an upper bound of 16 bytes). Thanks for the scrutiny here!
kaseyc commented on Issue #2614:
Thanks! Updated the condition and the machinst comment.
Given that the 16*x edge case is currently not possible to generate, I imagine that writing a test for it is not possible.
Both the aarch64 and arm32 backends have asserted minimum sizes. Should they also be updated to allow a zero-sized return (in a follow up commit)?
bnjbvr commented on Issue #2614:
Is it possible it's actually
min(x, 15)
, note the 15 and not 16, since we want something meaningful modulo 16? A 16-sized nop would be the same as 0 (mod 16), and there's no preferred encoding for such a nop.
cfallin commented on Issue #2614:
So I'm paging this back in more now: I note that we also had
gen_zero_len_nop()
. We could eliminate that and just do agen_nop(0)
; I'm not sure why we had the separation in the first place.Re: x86 max nop size -- yes, 15, not 16, agreed.
kaseyc commented on Issue #2614:
Re 15 vs 16, the nop constructor also seems to think 16 is a valid value:
https://github.com/bytecodealliance/wasmtime/blob/d1c1cb6a25c7256c82811d6b9ddc60c2a60da075/cranelift/codegen/src/isa/x64/inst/mod.rs#L569Although the actual emitter only handles up to 9 bytes in a single instruction src
Going above 9 seems to require more special casing. AMD provides a table for up to 15, and newer Intel processors may also be able to handle it:
AMD Manual
LLVM #1
LLVM #2This is getting slightly off topic . But it seems like 15 is a reasonable max size.
It also looks like there are no actual uses of gen_zero_len_nop, so if you prefer I could fold that into gen_nop everywhere.
kaseyc commented on Issue #2614:
I've updated the code to change the max size to 15.
gen_zero_len_nop can't be entirely removed, as it is part of the regalloc crate's Function trait. Though the MachInst implementations could dropped in favor of gen_nop(0).
Last updated: Jan 24 2025 at 00:11 UTC