Stream: git-wasmtime

Topic: wasmtime / issue #7237 Partially out-of-bounds writes on ARM


view this post on Zulip Wasmtime GitHub notifications bot (Oct 13 2023 at 18:29):

fitzgen opened issue #7237:

Apparently on ARM when

then there is no guarantee that the part of the store to the page that didn't fault will not succeed, despite the other part raising a signal.

This means that partially out-of-bounds Wasm stores that trigger a trap can potentially mutate memory for the in-bounds portion of the write, which is not spec compliant.

Apparently it is implementation-defined behavior, so it may or may not be an issue on any given ARM machine.

Thus far, @cfallin tested on the ARM machines he has access to and none of the following have failed the attached test case:

Test Case

(module
  (memory 1)
  (func (export "i64.store") (param i32 i64)
    local.get 0
    local.get 1
    i64.store)
  (func (export "i32.load8_u") (param i32) (result i32)
    local.get 0
    i32.load8_u))

(assert_trap (invoke "i64.store"
                     (i32.const 65529)
                     (i64.const 0xffffffffffffffff))
             "out of bounds memory access")

;; Partial bytes were not written.
(assert_return (invoke "i32.load8_u" (i32.const 65529))
               (i32.const 0))
(assert_return (invoke "i32.load8_u" (i32.const 65530))
               (i32.const 0))
(assert_return (invoke "i32.load8_u" (i32.const 65531))
               (i32.const 0))
(assert_return (invoke "i32.load8_u" (i32.const 65532))
               (i32.const 0))
(assert_return (invoke "i32.load8_u" (i32.const 65533))
               (i32.const 0))
(assert_return (invoke "i32.load8_u" (i32.const 65534))
               (i32.const 0))
(assert_return (invoke "i32.load8_u" (i32.const 65535))
               (i32.const 0))

See Also

view this post on Zulip Wasmtime GitHub notifications bot (Oct 13 2023 at 18:31):

fitzgen commented on issue #7237:

Modulo relaxing the spec, the only 100% sure way to fix this that I am aware of would be to use explicit bounds checks for all stores on ARM.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 13 2023 at 18:35):

fitzgen commented on issue #7237:

Oh also, we should double check whether risc-v and s390x give us the stronger guarantees we need to match wasm semantics.

cc @uweigand: happen to know the answer for s390x?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 13 2023 at 18:42):

cfallin commented on issue #7237:

I've been trying to figure out the state of RISC-V's specs on this matter. The privileged-mode spec includes details on page translation, but the translation algorithm and surrounding text (section 4.3.2) talk about a single address being translated, and don't seem to mention what happens when a single usermode instruction requires multiple translations. The user-mode spec is pretty high-level and only talks about "invisible traps", including page-faults, that can happen during execution of an instruction but are unobservable. So it seems to be a bit underspecified.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 13 2023 at 18:54):

cfallin commented on issue #7237:

Two other possible fixes:

view this post on Zulip Wasmtime GitHub notifications bot (Oct 15 2023 at 22:19):

uweigand commented on issue #7237:

Oh also, we should double check whether risc-v and s390x give us the stronger guarantees we need to match wasm semantics.

cc @uweigand: happen to know the answer for s390x?

For most instructions on s390x, the architecture guarantees that they either complete successfully, or else are "nullified" or "suppressed", which means none of the output operands (either in memory or in registers) are changed at all. This also applies to the case of an unaligned store crossing a page boundary. [ The one side effect allowed by the architecture is that a nullified or suppressed instruction may cause the hardware change bit of a page containing (part of) a memory output operand to be set, even if the output actually is not modified. This doesn't have any observable effect on a wasm program, though. ]

The exception to this rule are so-called "interruptible" instructions; these are long-running operations (e.g. copying a large block of memory). Those are allowed to "partially complete", e.g. after copying only part of the total memory block. They will indicate this outcome by modifying the output registers and condition code accordingly. If such an instruction incurs a page fault halfway through, partial completion will usually result. But those instructions are not generated by the wasmtime s390x backend (certainly not to implement a single wasm store).

view this post on Zulip Wasmtime GitHub notifications bot (Oct 15 2023 at 22:36):

afonso360 commented on issue #7237:

For RISC-V it looks like this was recently clarified in the ISA (see https://github.com/riscv/riscv-isa-manual/pull/1119). The ISA now states:

On some implementations, misaligned loads, stores, and instruction
fetches may also be decomposed into multiple accesses, some of which may
succeed before a page-fault exception occurs. In particular, a
portion of a misaligned store that passes the exception check may become
visible, even if another portion fails the exception check. The same behavior
may manifest for stores wider than XLEN bits (e.g., the FSD instruction
in RV32D), even when the store address is naturally aligned.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 16 2023 at 09:38):

sparker-arm commented on issue #7237:

Somehow detect if the underlying platform actually tears partially-faulting writes, and only enable bounds-checks if so.

Just FYI, this is what V8 does, which is on arm64 MacOS.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 16 2023 at 10:06):

afonso360 edited a comment on issue #7237:

For RISC-V it looks like this was recently clarified in the ISA (see https://github.com/riscv/riscv-isa-manual/pull/1119). The ISA now states:

On some implementations, misaligned loads, stores, and instruction
fetches may also be decomposed into multiple accesses, some of which may
succeed before a page-fault exception occurs. In particular, a
portion of a misaligned store that passes the exception check may become
visible, even if another portion fails the exception check. The same behavior
may manifest for stores wider than XLEN bits (e.g., the FSD instruction
in RV32D), even when the store address is naturally aligned.

Edit: By the way, I ran the above test on a JH7110 (SiFive u74 core) and it failed with:

Error: failed to run script file './test.wast'

Caused by:
    0: failed directive on ./test.wast:17:1
    1: result 0 didn't match
    2: expected                  0 / 0x0000000000000000
       actual                  255 / 0x00000000000000ff

So it looks like this actually happens for RISC-V cores.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 16 2023 at 16:03):

cfallin commented on issue #7237:

Somehow detect if the underlying platform actually tears partially-faulting writes, and only enable bounds-checks if so.

Just FYI, this is what V8 does, which is on arm64 MacOS.

It seems like given @akirilov-arm's reply here, V8's behavior is also technically non-conforming then: if the platform doesn't guarantee consistency in its behavior, then one can't test and choose a strategy at startup and rely on it to remain working.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 16 2023 at 16:12):

fitzgen commented on issue #7237:

So it looks like this actually happens for RISC-V cores.

Thanks for testing @afonso360!

Will update the issue title to reflect that this is both an ARM and riscv issue.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 16 2023 at 16:12):

fitzgen commented on issue #7237:

For most instructions on s390x, the architecture guarantees that they either complete successfully, or else are "nullified" or "suppressed",

Thanks @uweigand!


Last updated: Oct 23 2024 at 20:03 UTC