Stream: git-wasmtime

Topic: wasmtime / issue #4000 Cranelift: JIT relocations depend ...


view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2022 at 13:44):

Mrmaxmeier opened issue #4000:

Hey,

I'm seeing crashes during finalize_definitions calls related to x86\_64 call relocations:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: TryFromIntError(())', cranelift/jit/src/compiled_blob.rs:55:80

Cranelift emits 32-bit relocations for calls on x86\_64, and thus can "only" address in the relative ±2GB range. Code memory is allocated with the normal system allocator, which might place different allocations in distant parts of the address space.
I'm seeing irregular crashes in a heavily multithreaded program, but the problem can be reproduced with this abridged jit-minimal.rs example:

use cranelift::prelude::*;
use cranelift_codegen::settings;
use cranelift_jit::{JITBuilder, JITModule};
use cranelift_module::{default_libcall_names, Linkage, Module};

fn main() {
    let isa_builder = cranelift_native::builder().unwrap();
    let isa = isa_builder
        .finish(settings::Flags::new(settings::builder()))
        .unwrap();

    let mut m = JITModule::new(JITBuilder::with_isa(isa, default_libcall_names()));

    let mut ctx = m.make_context();
    let mut func_ctx = FunctionBuilderContext::new();

    let func_a = m
        .declare_function("a", Linkage::Local, &m.make_signature())
        .unwrap();
    let func_b = m
        .declare_function("b", Linkage::Local, &m.make_signature())
        .unwrap();

    // Define a dummy function `func_a`
    ctx.func.name = ExternalName::user(0, func_a.as_u32());
    {
        let mut bcx: FunctionBuilder = FunctionBuilder::new(&mut ctx.func, &mut func_ctx);
        let block = bcx.create_block();

        bcx.switch_to_block(block);
        bcx.ins().return_(&[]);
        bcx.seal_all_blocks();
        bcx.finalize();
    }
    m.define_function(func_a, &mut ctx).unwrap();
    m.clear_context(&mut ctx);

    // Allocate a bunch (~4GB) to stretch address space
    let mut allocations: Vec<Vec<u8>> = Vec::new();
    for _ in 0..999999 {
        allocations.push(Vec::with_capacity(4096));
    }

    // Define `func_b` in a new allocation and reference `func_a`
    ctx.func.name = ExternalName::user(0, func_b.as_u32());
    {
        let mut bcx: FunctionBuilder = FunctionBuilder::new(&mut ctx.func, &mut func_ctx);
        let block = bcx.create_block();

        bcx.switch_to_block(block);
        let local_func = m.declare_func_in_func(func_a, &mut bcx.func);
        // Emit a call with a relocation for func_a
        bcx.ins().call(local_func, &[]);
        // Make sure that this function's body is larger than page_size and will require a new allocation.
        for _ in 0..1024 {
            bcx.ins().call(local_func, &[]);
        }
        bcx.ins().return_(&[]);

        bcx.seal_all_blocks();
        bcx.finalize();
    }

    m.define_function(func_b, &mut ctx).unwrap();
    m.clear_context(&mut ctx);

    // Perform linking
    m.finalize_definitions();
}

It might be possible to trigger this from small-ish WebAssembly modules with glibc's mmap threshold that places >128kb allocations outside of the heap, though I haven't had any luck reproducing that because glibc's dynamic threshold scaling raises this limit before code is emitted.

Possible approaches:

Aarch64 runs into a related issue with 26-bit relative jumps: https://github.com/bytecodealliance/wasmtime/issues/3277
I'm not sure veneers are applicable for x86\_64, but they seems like an interesting and more general approach to relative jump range limits.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2022 at 13:44):

Mrmaxmeier labeled issue #4000:

Hey,

I'm seeing crashes during finalize_definitions calls related to x86\_64 call relocations:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: TryFromIntError(())', cranelift/jit/src/compiled_blob.rs:55:80

Cranelift emits 32-bit relocations for calls on x86\_64, and thus can "only" address in the relative ±2GB range. Code memory is allocated with the normal system allocator, which might place different allocations in distant parts of the address space.
I'm seeing irregular crashes in a heavily multithreaded program, but the problem can be reproduced with this abridged jit-minimal.rs example:

use cranelift::prelude::*;
use cranelift_codegen::settings;
use cranelift_jit::{JITBuilder, JITModule};
use cranelift_module::{default_libcall_names, Linkage, Module};

fn main() {
    let isa_builder = cranelift_native::builder().unwrap();
    let isa = isa_builder
        .finish(settings::Flags::new(settings::builder()))
        .unwrap();

    let mut m = JITModule::new(JITBuilder::with_isa(isa, default_libcall_names()));

    let mut ctx = m.make_context();
    let mut func_ctx = FunctionBuilderContext::new();

    let func_a = m
        .declare_function("a", Linkage::Local, &m.make_signature())
        .unwrap();
    let func_b = m
        .declare_function("b", Linkage::Local, &m.make_signature())
        .unwrap();

    // Define a dummy function `func_a`
    ctx.func.name = ExternalName::user(0, func_a.as_u32());
    {
        let mut bcx: FunctionBuilder = FunctionBuilder::new(&mut ctx.func, &mut func_ctx);
        let block = bcx.create_block();

        bcx.switch_to_block(block);
        bcx.ins().return_(&[]);
        bcx.seal_all_blocks();
        bcx.finalize();
    }
    m.define_function(func_a, &mut ctx).unwrap();
    m.clear_context(&mut ctx);

    // Allocate a bunch (~4GB) to stretch address space
    let mut allocations: Vec<Vec<u8>> = Vec::new();
    for _ in 0..999999 {
        allocations.push(Vec::with_capacity(4096));
    }

    // Define `func_b` in a new allocation and reference `func_a`
    ctx.func.name = ExternalName::user(0, func_b.as_u32());
    {
        let mut bcx: FunctionBuilder = FunctionBuilder::new(&mut ctx.func, &mut func_ctx);
        let block = bcx.create_block();

        bcx.switch_to_block(block);
        let local_func = m.declare_func_in_func(func_a, &mut bcx.func);
        // Emit a call with a relocation for func_a
        bcx.ins().call(local_func, &[]);
        // Make sure that this function's body is larger than page_size and will require a new allocation.
        for _ in 0..1024 {
            bcx.ins().call(local_func, &[]);
        }
        bcx.ins().return_(&[]);

        bcx.seal_all_blocks();
        bcx.finalize();
    }

    m.define_function(func_b, &mut ctx).unwrap();
    m.clear_context(&mut ctx);

    // Perform linking
    m.finalize_definitions();
}

It might be possible to trigger this from small-ish WebAssembly modules with glibc's mmap threshold that places >128kb allocations outside of the heap, though I haven't had any luck reproducing that because glibc's dynamic threshold scaling raises this limit before code is emitted.

Possible approaches:

Aarch64 runs into a related issue with 26-bit relative jumps: https://github.com/bytecodealliance/wasmtime/issues/3277
I'm not sure veneers are applicable for x86\_64, but they seems like an interesting and more general approach to relative jump range limits.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2022 at 13:44):

Mrmaxmeier labeled issue #4000:

Hey,

I'm seeing crashes during finalize_definitions calls related to x86\_64 call relocations:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: TryFromIntError(())', cranelift/jit/src/compiled_blob.rs:55:80

Cranelift emits 32-bit relocations for calls on x86\_64, and thus can "only" address in the relative ±2GB range. Code memory is allocated with the normal system allocator, which might place different allocations in distant parts of the address space.
I'm seeing irregular crashes in a heavily multithreaded program, but the problem can be reproduced with this abridged jit-minimal.rs example:

use cranelift::prelude::*;
use cranelift_codegen::settings;
use cranelift_jit::{JITBuilder, JITModule};
use cranelift_module::{default_libcall_names, Linkage, Module};

fn main() {
    let isa_builder = cranelift_native::builder().unwrap();
    let isa = isa_builder
        .finish(settings::Flags::new(settings::builder()))
        .unwrap();

    let mut m = JITModule::new(JITBuilder::with_isa(isa, default_libcall_names()));

    let mut ctx = m.make_context();
    let mut func_ctx = FunctionBuilderContext::new();

    let func_a = m
        .declare_function("a", Linkage::Local, &m.make_signature())
        .unwrap();
    let func_b = m
        .declare_function("b", Linkage::Local, &m.make_signature())
        .unwrap();

    // Define a dummy function `func_a`
    ctx.func.name = ExternalName::user(0, func_a.as_u32());
    {
        let mut bcx: FunctionBuilder = FunctionBuilder::new(&mut ctx.func, &mut func_ctx);
        let block = bcx.create_block();

        bcx.switch_to_block(block);
        bcx.ins().return_(&[]);
        bcx.seal_all_blocks();
        bcx.finalize();
    }
    m.define_function(func_a, &mut ctx).unwrap();
    m.clear_context(&mut ctx);

    // Allocate a bunch (~4GB) to stretch address space
    let mut allocations: Vec<Vec<u8>> = Vec::new();
    for _ in 0..999999 {
        allocations.push(Vec::with_capacity(4096));
    }

    // Define `func_b` in a new allocation and reference `func_a`
    ctx.func.name = ExternalName::user(0, func_b.as_u32());
    {
        let mut bcx: FunctionBuilder = FunctionBuilder::new(&mut ctx.func, &mut func_ctx);
        let block = bcx.create_block();

        bcx.switch_to_block(block);
        let local_func = m.declare_func_in_func(func_a, &mut bcx.func);
        // Emit a call with a relocation for func_a
        bcx.ins().call(local_func, &[]);
        // Make sure that this function's body is larger than page_size and will require a new allocation.
        for _ in 0..1024 {
            bcx.ins().call(local_func, &[]);
        }
        bcx.ins().return_(&[]);

        bcx.seal_all_blocks();
        bcx.finalize();
    }

    m.define_function(func_b, &mut ctx).unwrap();
    m.clear_context(&mut ctx);

    // Perform linking
    m.finalize_definitions();
}

It might be possible to trigger this from small-ish WebAssembly modules with glibc's mmap threshold that places >128kb allocations outside of the heap, though I haven't had any luck reproducing that because glibc's dynamic threshold scaling raises this limit before code is emitted.

Possible approaches:

Aarch64 runs into a related issue with 26-bit relative jumps: https://github.com/bytecodealliance/wasmtime/issues/3277
I'm not sure veneers are applicable for x86\_64, but they seems like an interesting and more general approach to relative jump range limits.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2022 at 13:50):

bjorn3 commented on issue #4000:

Don't allocate on the heap. Cranelift's selinux-fix features uses mmap allocations. The underlying issue still persists, though as mmap allocations are separate from the heap, they're mostly sequential and would need >2GB of generated machine code to cause problems.

I think this is the best fix. Possibly in combination with reserving the full 2GB as PROT_NONE. Allowing the GOT to be split between each such 2GB chunk should also allow more code to be used when PIC is enabled.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2022 at 17:01):

cfallin commented on issue #4000:

This is covered I think by the colocated flag on external function definitions: the intent is to denote that a function is in the same module (hence can use near calls) or elsewhere (hence needs an absolute 64-bit relocation). This flag in the ExtFuncData controls which kind of call is generated. It looks like this may not be surfaced in the JITModule API; we'd be happy to take a PR to fix that if so!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 06 2022 at 17:44):

bjorn3 commented on issue #4000:

That is not the problem here. The problem is that a function and the GOT or PLT it accesses may end up more than 2GB from each other due to memory fragmentation. All calls already go through the GOT and PLT anyway so for as long as those are within 2GB it doesn't matter where the function is, independent of the colocated flag.


Last updated: Nov 22 2024 at 17:03 UTC