Stream: git-wasmtime

Topic: wasmtime / PR #10209 Introduce verification of integer ad...


view this post on Zulip Wasmtime GitHub notifications bot (Feb 08 2025 at 18:53):

iwanders opened PR #10209 from iwanders:issue-10118-improve-verifier-for-load-store to bytecodealliance:main:

Resolves #10118, fyi @cfallin.

This should ensure that the verifier correctly detects situations where an incorrectly sized integer is passed as an address type. This ensures that issues like these are caught at the verifier level instead of by asserts in the machine instruction generation.

Filing as a draft PR and making some comments throughout the diff for discussion.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 08 2025 at 19:11):

iwanders updated PR #10209.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 08 2025 at 19:14):

iwanders updated PR #10209.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 08 2025 at 19:15):

iwanders submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 08 2025 at 19:15):

iwanders created PR review comment:

Most of the changes from the unit tests are very mechanical. I kept as many test cases in the existing file, and split out 32 and 64 bit-specific unit tests to fdemote_32.clif and fdemote_64.clif respectively.

Most often the change is the number in the stack_addr.<size> function I think... this could definitely do with a bit of scrutiny as I'm pretty inexperienced with these very low level constructs.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 08 2025 at 19:15):

iwanders created PR review comment:

I was a bit surprised that there's an atomic_store in the instruction builder, but it seems to just go through the normal Store instruction.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 08 2025 at 19:15):

iwanders created PR review comment:

I initially assumed p was the first value of (undocumented) Store was the pointer, just like the other operations. Only after I checked store did I realise it uses the second argument as the pointer. I wonder if the capitalised flavours should be hidden from the documentation?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 08 2025 at 19:15):

iwanders created PR review comment:

This match statement is currently not ordered alphabetically, so I kept all the changes at the bottom.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 08 2025 at 19:15):

iwanders created PR review comment:

This was stated in this comment, I think it's good to put this in the automatically created documentation as well?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 08 2025 at 19:15):

iwanders created PR review comment:

The v1 input argument to this block seems to be unused, I did make that platform specific.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 08 2025 at 19:15):

iwanders created PR review comment:

I introduced this file, and its 64 bit counterpart to check the verifier is now able to detect these bad cases. Most are adapted from functions in the existing clif files.

I recommend opening this file and its counterpart in two tabs and switching between them, they're identical save for 64 and 32 swapped.

I hope this covers all the cases we'd like to check for.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 08 2025 at 19:15):

iwanders edited PR #10209:

Resolves #10118, fyi @cfallin.

This should ensure that the verifier correctly detects situations where an incorrectly sized integer is passed as an address type. This ensures that issues like these are caught at the verifier level instead of by asserts in the machine instruction generation.

Filing as a draft PR and making some comments throughout the diff for discussion.

Edit, I recommend starting review with the cranelift/filetests/filetests/verifier/pointer_width_32.clif file.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 12 2025 at 22:34):

iwanders has marked PR #10209 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 12 2025 at 22:34):

iwanders requested abrown for a review on PR #10209.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 12 2025 at 22:34):

iwanders requested wasmtime-compiler-reviewers for a review on PR #10209.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 12 2025 at 22:35):

iwanders commented on PR #10209:

I've moved this out of draft to ensure it gets a reviewer, I couldn't figure out how to assign @cfallin to review it while it was a draft :)

view this post on Zulip Wasmtime GitHub notifications bot (Feb 13 2025 at 00:03):

cfallin commented on PR #10209:

@iwanders it's on my queue, thanks! I'm attending the Wasm standards group in-person meeting this week so I'll likely get to review this next week or end of this week.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 13 2025 at 00:18):

iwanders commented on PR #10209:

Ah perfect @cfallin, thanks, I just wanted to make sure it wasn't in limbo. I have some travel coming up at the end of next week myself, so may go quiet for a bit myself as well.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 13 2025 at 18:02):

abrown requested cfallin for a review on PR #10209.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 13 2025 at 18:17):

cfallin submitted PR review:

Thanks -- I think this is good incremental progress and I think we should land it, given my minor nits below are addressed!

I have a few thoughts after seeing the diff (which you diligently worked out in the shape we requested, so these are just thoughts on the shape of the problem, not your work) that could lead to some followup cleanups:

view this post on Zulip Wasmtime GitHub notifications bot (Feb 13 2025 at 18:17):

cfallin created PR review comment:

Let's keep the original nested match here -- it's cleaner than ad-hoc ifs inside the match arm. Something like

LoadNoOffset { opcode: Opcode::Bitcast, flags, arg } => { /* original */ }
LoadNoOffset { opcode, flags, arg } if opcode.can_load() => { /* verify address */ }

and likewise below: pull the ifs that guard with can_load()/can_store() into match guards.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 13 2025 at 18:17):

cfallin created PR review comment:

That's somewhat surprising, but I think this is probably outside the scope of this PR -- ideally we'd unify the arg order, not try to selectively hide documentation.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 13 2025 at 18:17):

cfallin created PR review comment:

Atomic stores are separate opcodes so I think this is fine?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 13 2025 at 18:17):

cfallin created PR review comment:

No need to borrow here -- it's a Type which wraps a u16 internally (ie, it's tiny and Copy).

view this post on Zulip Wasmtime GitHub notifications bot (Feb 13 2025 at 18:17):

cfallin created PR review comment:

Yes, I agree; let's say "when an address is specified to a load or store is specified" to (i) avoid using an otherwise-undefined/ambiguous concept "integer address type" (do we have float address types? is it the address of an integer in memory?); and (ii) make it clear that this has to do with memory accesses.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 14 2025 at 00:09):

iwanders updated PR #10209.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 14 2025 at 00:09):

iwanders submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 14 2025 at 00:09):

iwanders created PR review comment:

I dropped one is specified from your suggestion and made it into:

        When an address to a load or store is specified, its integer
        size is required to be equal to the platform's pointer width.

the 'its integer size' still feels a bit off, but hopefully it conveys that the address is represented as an integer?

Happy to adjust further to ensure we get it right!

view this post on Zulip Wasmtime GitHub notifications bot (Feb 14 2025 at 00:09):

iwanders submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 14 2025 at 00:09):

iwanders created PR review comment:

Yes, I think so :+1:

view this post on Zulip Wasmtime GitHub notifications bot (Feb 14 2025 at 00:10):

iwanders submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 14 2025 at 00:10):

iwanders created PR review comment:

Hmm, so my confusion was that store and Store exist, but only store has useful docs and parameter names. Clicking Source for store shows:

    /// Store ``x`` to memory at ``p + Offset``.
    ///
    /// This is a polymorphic instruction that can store any value type with a
    /// memory representation.
    ///
    /// Inputs:
    ///
    /// - MemFlags: Memory operation flags
    /// - x: Value to be stored
    /// - p: An integer address type
    /// - Offset: Byte offset from base address
    #[allow(non_snake_case)]
    fn store<T1: Into<ir::MemFlags>, T2: Into<ir::immediates::Offset32>>(self, MemFlags: T1, x: ir::Value, p: ir::Value, Offset: T2) -> Inst {
        let MemFlags = MemFlags.into();
        let Offset = Offset.into();
        let ctrl_typevar = self.data_flow_graph().value_type(x);
        self.Store(Opcode::Store, ctrl_typevar, MemFlags, Offset, x, p).0
    }

Which calls into the uppercase Store, I needed to look at this generated code to confirm that the second argument (arg1) of Store is the address, it's not the end of the world, because the unit tests just failed the other way around, and most people that need to know the order are likely spelunking in the code anyway?

I see now that the lowercase store does determine the ctrl_typevar, so store and Store do different things, I glossed over that initially, maybe Store should just get more extensive documentation at some point, or link to store for it. Anyways, definitely beyond the scope of this PR, but hopefully this clarifies my train of thought as an outsider :)

view this post on Zulip Wasmtime GitHub notifications bot (Feb 14 2025 at 00:10):

iwanders submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 14 2025 at 00:10):

iwanders created PR review comment:

I really like this feedback, this is something I'll adopt in my own code!

view this post on Zulip Wasmtime GitHub notifications bot (Feb 14 2025 at 00:12):

iwanders submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 14 2025 at 00:12):

iwanders created PR review comment:

:+1: removed, definitely not necessary if it's Copy and fits in a register.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 14 2025 at 01:44):

iwanders commented on PR #10209:

that could lead to some followup cleanups:

I currently don't have time to pursue these but here's my two cents :)

I wonder if there's a better way to handle stackslot addresses to avoid the test duplication. You had mentioned a separate pointer type before, which we've resisted so far in order to avoid a lot of complexity and duplication. Perhaps we can have a surface-level syntax sugar in the textual CLIF parser, though: stack_addr.ptr rather than stack_addr.{i32,i64}. What do you think?

I guess one could just instantiate both flavours for each function, and store them in separate fields, or in an enum type in the functions field that's currently in TestFile? But then it may also make sense to split the isa_spec field into the two flavours and group the respective isa's with the functions that were instantiated for them? It does make the struct more complex though.

Another option, which is not great because it splits files, but it's worth considering because it may allow more reuse. If one clif file can refer to another file, it's easy to do the syntactic sugar approach, for something like fdemote_32.clif:

test interpret
test run
target pulley32
target pulley32be
include fdemote.clif

For; fdemote_64.clif:

test interpret
test run
target pulley64
include fdemote.clif

The included file fdemote.cliff contains entries that use stack_addr.ptr.

When the reader reads files, it keeps track of which isa's it has seen, as long as it has only seen 32 bit isa's stack_addr.ptr would be i32, if it has only seen 64 bit isa's the value of stack_addr.ptr would be i64, if both isa's are seen and stack_addr.ptr is used, an error is raised. This also allows for reuse of functions defined in one file by other files, if that would be convenient (I don't know if any of the current tests would benefit from that).

The current situation of splitting out some files is not that problematic though, it is simple and straightforward. I wouldn't be surprised if I'm in the first person to run into this thing as I used that example from the documentation AND used a debug build to hit that assert. Personally I'd lean towards not doing anything on this front until either the split files become a burden or other reasons arise to pursue a change, at least, I don't see a solution that reduces complexity.

Perhaps we should have a helper on InstructionData that is something like fn addr(&self) -> Option<Value>, returning the address operand, if any, of the instruction; so we can centralize that and don't have to enumerate the cases in the verifier.

The Option<Value> is a bit of a limiting interface though, because it implies that each function has zero or one address operand, the stack_switch has three, so then you'd end up with having to make a decision which of the arguments is the address, for most it may work though, or change it to a Vec? Thinking about the generalization here, this would allow InstructionData to 'opt in' to an address verifier check, I wonder if there would be a way to generalise this further and allow the instructions to specify which checks should be ran on its arguments... Not sure if this would just 'move code' instead of 'reduce code' though, I don't know enough about cranelift to answer that :)

view this post on Zulip Wasmtime GitHub notifications bot (Feb 15 2025 at 02:22):

cfallin created PR review comment:

Yeah, the distinction is that Store names an instruction format, and store-the-opcode is one of the instructions that uses Store-the-format. Perhaps we could clean this up later, thanks.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 15 2025 at 02:22):

cfallin submitted PR review:

Updates look good, thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Feb 15 2025 at 02:24):

cfallin commented on PR #10209:

[suggestions about CLIF and textual inclusion]

Hmm, interesting; I think all things considered we probably don't want to build that much infrastructure. I'll keep kicking around the idea of a pointer-specific type that resolves early and maybe we can iterate on it later if the test duplication increases significantly.

it implies that each function has zero or one address operand, the stack_switch has three

Ah, right! That's a relatively recent addition; I had forgotten about that. Anyway, not a critical refactor either way, so fine to leave it how it is for now I think.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 15 2025 at 02:42):

cfallin merged PR #10209.


Last updated: Feb 28 2025 at 03:10 UTC