Stream: cranelift

Topic: miscompilation with opt_level=speed_and_size


view this post on Zulip bjorn3 (Dec 10 2020 at 11:50):

Because LICM has finally been fixed with jump tables, I tried to use it for cg_clif when enabling optimizations. It compiled fine and basic programs work, but simple-raytracer gave a SIGSEGV once and panicks the rest of the time. According to valgrind there are several reads of memory at different locations that is all freed at the same place. I will investigate further today or tomorrow.

==21057== Invalid read of size 1
==21057==    at 0x8BF885: core::str::validations::run_utf8_validation (validations.rs:143)
==21057==    by 0x8BF368: core::str::converts::from_utf8 (converts.rs:86)
==21057==    by 0x6FCCCF: std::sys_common::os_str_bytes::Slice::to_str (os_str_bytes.rs:165)
==21057==    by 0x6FB6BC: std::ffi::os_str::OsStr::to_str (os_str.rs:543)
==21057==    by 0x28351F: image::dynimage::save_buffer_impl::{{closure}} (dynimage.rs:787)
==21057==    by 0x33F3AC: core::option::Option<T>::and_then (option.rs:692)
==21057==    by 0x2828B7: image::dynimage::save_buffer_impl (dynimage.rs:786)
==21057==    by 0x1497E6: image::dynimage::save_buffer (dynimage.rs:775)
==21057==    by 0x148382: image::buffer::ImageBuffer<P,Container>::save (buffer.rs:441)
==21057==    by 0x147C51: raytracer::scene::Scene::render (scene.rs:42)
==21057==    by 0x143FBF: main::main (main.rs:138)
==21057==    by 0x14220B: core::ops::function::FnOnce::call_once (function.rs:227)
==21057==  Address 0x4c10067 is 39 bytes inside a block of size 552 free'd
==21057==    at 0x48369AB: free (vg_replace_malloc.c:530)
==21057==    by 0x4ABB880: fclose@@GLIBC_2.2.5 (iofclose.c:77)
==21057==    by 0x48AAE1C: pthread_getattr_np (pthread_getattr_np.c:161)
==21057==    by 0x75D4D1: std::sys::unix::thread::guard::get_stack_start (thread.rs:288)
==21057==    by 0x75D8A0: std::sys::unix::thread::guard::get_stack_start_aligned (thread.rs:305)
==21057==    by 0x75DA33: std::sys::unix::thread::guard::init (thread.rs:336)
==21057==    by 0x7465D0: std::rt::lang_start_internal (rt.rs:37)
==21057==    by 0x145143: std::rt::lang_start (rt.rs:65)
==21057==    by 0x14429C: main (in /home/bjorn/Documenten/cg_clif3/simple-raytracer/raytracer_cg_clif)
==21057==  Block was alloc'd at
==21057==    at 0x483577F: malloc (vg_replace_malloc.c:299)
==21057==    by 0x4ABC0FA: __fopen_internal (iofopen.c:65)
==21057==    by 0x48AACCA: pthread_getattr_np (pthread_getattr_np.c:81)
==21057==    by 0x75D4D1: std::sys::unix::thread::guard::get_stack_start (thread.rs:288)
==21057==    by 0x75D8A0: std::sys::unix::thread::guard::get_stack_start_aligned (thread.rs:305)
==21057==    by 0x75DA33: std::sys::unix::thread::guard::init (thread.rs:336)
==21057==    by 0x7465D0: std::rt::lang_start_internal (rt.rs:37)
==21057==    by 0x145143: std::rt::lang_start (rt.rs:65)
==21057==    by 0x14429C: main (in /home/bjorn/Documenten/cg_clif3/simple-raytracer/raytracer_cg_clif)

view this post on Zulip Julian Seward (Dec 10 2020 at 12:32):

@bjorn3 One thing you could do is back out the LICM fix, but leave LICM enabled. Then there are several possible scenarios:

I too would like to know whether this fix is really OK, since I'm trying to stabilise a branch of CL to ship in Firefox.

view this post on Zulip bjorn3 (Dec 10 2020 at 13:20):

Before the LICM fix opt_level=speed and opt_level=speed_and_size would simply panic when compiling. The LICM fix is what makes it possible to enable optimizations at all. I am pretty sure that the LICM fix didn't cause the issue.

view this post on Zulip Chris Fallin (Dec 10 2020 at 16:09):

@bjorn3 thanks for this -- eek. In addition to the questions from @Julian Seward above, it should be possible to instrument the licm pass with println's to work out where it's running; the CLIF for all functions on which LICM did any code motion would be super-helpful for us to debug

view this post on Zulip Julian Seward (Dec 10 2020 at 16:10):

@Chris Fallin moin! But note also, this doesn't actually imply that LICM is buggy. It's just "some part of the optimising pipeline".

view this post on Zulip Chris Fallin (Dec 10 2020 at 16:10):

yep, indeed

view this post on Zulip Chris Fallin (Dec 10 2020 at 16:11):

@bjorn3 another thing to try would be to enable optimizations but explicitly remove LICM

view this post on Zulip Chris Fallin (Dec 10 2020 at 16:11):

then we avoid the panic that hid this before but hopefully expose whatever is causing the issue

view this post on Zulip bjorn3 (Dec 10 2020 at 16:12):

I normally debug miscompilations by removing code and adding println!() to show where the corruption happens.

view this post on Zulip Julian Seward (Dec 10 2020 at 16:13):

I had wondered from the backtraces if this was some kind of threading problem. And in particular whether supposedly atomic loads/stores are not being labelled correctly (or some such) by @bjorn3's front end, and hence are getting moved around (GVNd, LICMd) when they shouldn't.

view this post on Zulip bjorn3 (Dec 10 2020 at 16:15):

I use plain loads/stores protected by a global mutex. I am using an empty MemFlags so GVN and LICM should leave it alone. In addition the mutex lock/unlock should in itself function as a fence as it is a call instruction.

view this post on Zulip bjorn3 (Dec 10 2020 at 16:17):

The pthread_getattr_np call is used by libstd as part of adding a stack guard page. It is done at the start of all threads before passing execution to the user. This includes the main thread.

view this post on Zulip Chris Fallin (Dec 10 2020 at 16:18):

This looks to me more like a standard use-after-free (file read tries to read FILE* after fclose), maybe from a control-flow corruption somewhere?

view this post on Zulip Julian Seward (Dec 10 2020 at 16:20):

I'll build a full Fx with this fix in, so I can test it with really large inputs (eg Google Earth), but per @bjorn3's comments above, it seems like the fix itself isn't at fault.

view this post on Zulip Julian Seward (Dec 10 2020 at 16:21):

@bjorn3 I assume this is on x64; is this the new or old BE?

view this post on Zulip bjorn3 (Dec 10 2020 at 16:21):

A pointer to the block alloc'd inside libc shouldn't have been passed to user code even when it got corrupted. I think it is more likely that somewhere a pointer is created that just so happens to overlap with the malloc'd block.

view this post on Zulip bjorn3 (Dec 10 2020 at 16:21):

x86_64 with the old backend. The new backend doesn't support 128bit ints yet.

view this post on Zulip Chris Fallin (Dec 10 2020 at 16:23):

(tangent, that's on my roadmap and I hope to fix it soon :-) )

view this post on Zulip bjorn3 (Dec 11 2020 at 09:26):

Reduced to

fn main() {
    let ext = std::path::Path::new("result.png").extension().unwrap().to_str();
    assert_eq!(ext, Some("png"));
}

This doesn't SIGSEGV anymore but ext becomes None incorrectly.

view this post on Zulip bjorn3 (Dec 11 2020 at 09:27):

Using println!() the extension before the .to_str() call shows as varying garbage bytes.

view this post on Zulip bjorn3 (Dec 11 2020 at 09:27):

Valgrind doesn't trigger anymore though.

view this post on Zulip Julian Seward (Dec 11 2020 at 09:28):

Does V complain even with the println! in place?

view this post on Zulip bjorn3 (Dec 11 2020 at 09:30):

It doesn't complain with or without println().

view this post on Zulip Julian Seward (Dec 11 2020 at 09:32):

But you get different garbage bytes on different runs, or are they always the same?

view this post on Zulip bjorn3 (Dec 11 2020 at 09:32):

Different bytes

view this post on Zulip bjorn3 (Dec 11 2020 at 09:33):

Path::new("result.png").as_os_str().as_bytes() doesn't show any corruption. Path::new("result.png").extension().unwrap().as_bytes() does.

view this post on Zulip Julian Seward (Dec 11 2020 at 09:39):

@bjorn3 as a side thought, can you work around the lack of 128 bit int support in the newBE by just translating it all into 64 bit arithmetic at your level?

view this post on Zulip Julian Seward (Dec 11 2020 at 09:40):

Just on the basis that then you'd be using the pipeline under active development.

view this post on Zulip bjorn3 (Dec 11 2020 at 09:41):

That is hard. cg_clif expects that all primitive rust types correspond to cranelift types. I do translate some 128bit operations to 64bit operations, but I really need one rust scalar to correspond to one Value.

view this post on Zulip bjorn3 (Dec 11 2020 at 09:44):

I reduced it to

use std::ffi::OsStr;
use std::os::unix::ffi::OsStrExt;

fn os_str_as_u8_slice(s: &OsStr) -> &[u8] {
    unsafe { &*(s as *const OsStr as *const [u8]) }
}

unsafe fn u8_slice_as_os_str(s: &[u8]) -> &OsStr {
    // SAFETY: see the comment of `os_str_as_u8_slice`
    unsafe { &*(s as *const [u8] as *const OsStr) }
}

fn split_file_at_dot(file: &OsStr) -> (Option<&OsStr>, Option<&OsStr>) {
    if os_str_as_u8_slice(file) == b".." {
        return (Some(file), None);
    }

    // The unsafety here stems from converting between &OsStr and &[u8]
    // and back. This is safe to do because (1) we only look at ASCII
    // contents of the encoding and (2) new &OsStr values are produced
    // only from ASCII-bounded slices of existing &OsStr values.
    let mut iter = os_str_as_u8_slice(file).rsplitn(2, |b| *b == b'.');
    let after = iter.next();
    let before = iter.next();
    if before == Some(b"") {
        (Some(file), None)
    } else {
        unsafe { (before.map(|s| u8_slice_as_os_str(s)), after.map(|s| u8_slice_as_os_str(s))) }
    }
}

fn main() {
    let ext = Some(OsStr::new("result.png")).map(split_file_at_dot).and_then(|(_, after)| after).unwrap();
    assert_eq!(ext.as_bytes(), b"png");
}

It is very sensitive to changes now.

view this post on Zulip bjorn3 (Dec 11 2020 at 09:50):

Removed all unsafe code and the OsStr dependency:

fn main() {
    let ext = Some((Some(b"" as &[u8]), Some(b"png" as &[u8]))).and_then(|(_, after)| after).unwrap();
    assert_eq!(ext, b"png");
}

view this post on Zulip bjorn3 (Dec 11 2020 at 09:54):

No unwrap:

fn main() {
    let ext = Some((Some(b"" as &[u8]), Some(b"png" as &[u8]))).and_then(|(_, after)| after);
    assert_eq!(ext, Some(b"png" as &[u8]));
}

view this post on Zulip bjorn3 (Dec 11 2020 at 10:33):

I can't find any relevant changes in Map::and_then between unoptimized and optimized. Only icmp + brnz -> ifcmp + brif, folding some pointer arithmetic into memory operations, rex prefix removal and some regalloc changes.

view this post on Zulip bjorn3 (Dec 11 2020 at 10:54):

I noticed that on multiple occasions GVN resulted in worse regalloc as it merges multiple stack_addr instructions which need to be spilled. Still searching for a relevant change.

view this post on Zulip bjorn3 (Dec 11 2020 at 11:10):

I found the optimization pass that causes the problem: shrink_instructions.

view this post on Zulip bjorn3 (Dec 11 2020 at 11:48):

I was trying to diff a normalized disassembly of the correct and miscompiled binaries. I found the following:

-push %rdi
+(bad)
+stc

view this post on Zulip bjorn3 (Dec 11 2020 at 11:48):

And a little bit later:

-dec %ebx
+callq *%rsi

view this post on Zulip Julian Seward (Dec 11 2020 at 11:49):

Are you sure you're disassembling as if it was 64-bit, not 32-bit? I ask because IIRC, dec %ebx in 32 bit code can be interpreted as a REX prefix in 64-bit code.

view this post on Zulip bjorn3 (Dec 11 2020 at 11:50):

I use objdump -d.

view this post on Zulip bjorn3 (Dec 11 2020 at 11:51):

Excerpt from _ZN4core6option15Option$LT$T$GT$8and_then17hde19356b2b9e6feeE:

   182c0:       44 8b bc 24 3c 00 00    mov    0x3c(%rsp),%r15d
   182c7:       00
   182c8:       41 0f b6 c7             movzbl %r15b,%eax
   182cc:       85 c0                   test   %eax,%eax
   182ce:       74 ce                   je     1829e <_ZN4core6option15Option$LT$T$GT$8and_then17hde19356b2b9e6feeE+0xd8>
   182d0:       eb cc                   jmp    1829e <_ZN4core6option15Option$LT$T$GT$8and_then17hde19356b2b9e6feeE+0xd8>
   182d2:       39 ff                   cmp    %edi,%edi
   182d4:       ff                      (bad)
   182d5:       ff 5c ff ff             lcall  *-0x1(%rdi,%rdi,8)
   182d9:       ff                      .byte 0xff

view this post on Zulip bjorn3 (Dec 11 2020 at 11:53):

Err, that is just padding I think.

view this post on Zulip bjorn3 (Dec 11 2020 at 12:01):

@Julian Seward xor %dh,%dh and xor %sil,%sil zero different registers, right? (the x86 register overlapping is a bit confusing)

view this post on Zulip Julian Seward (Dec 11 2020 at 12:03):

Well, those are different both because they reference different underlying regs (rdx vs rsi) and because they reference different bit indices (15:8 vs 7:0)

view this post on Zulip Julian Seward (Dec 11 2020 at 12:04):

I think it's the case that if you add an 0x40 prefix to the first then you get the second.

view this post on Zulip Julian Seward (Dec 11 2020 at 12:04):

That is to say .. this is one of those few places where a REX prefix of 0x40 isn't redundant.

view this post on Zulip bjorn3 (Dec 11 2020 at 12:07):

Ok, thanks. It is [RexOp1u_id_z#30,%rsi] v53 = iconst.i8 0 before shrinking and [Op1u_id_z#30,%rsi] v53 = iconst.i8 0 after. Are there any other exceptions for this recipe where the rex prefix needs to be preserved?

view this post on Zulip Julian Seward (Dec 11 2020 at 12:09):

Pass. I know close to zero about the oldBE.

view this post on Zulip Julian Seward (Dec 11 2020 at 12:10):

If anyone can tell you, it might be @Andrew Brown.

view this post on Zulip bjorn3 (Dec 11 2020 at 12:10):

To phrase it differently. Are there any cases other cases where xor needs the rex prefix?

view this post on Zulip Julian Seward (Dec 11 2020 at 12:13):

This isn't specifically about xor. It relates to any basic integer insn that involves an 8-bit register.

view this post on Zulip bjorn3 (Dec 11 2020 at 12:16):

I guess I could try to not allow the rex-less version for all 8bit integer operations.

view this post on Zulip bjorn3 (Dec 11 2020 at 16:21):

Opened https://github.com/bytecodealliance/wasmtime/pull/2496 with a fix.

This fixes the cg_clif miscompilation I wrote about at https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/miscompilation.20with.20opt_level.3Dspeed_and_size/near/219461997...

view this post on Zulip Chris Fallin (Dec 11 2020 at 17:14):

Nice find -- thanks for debugging this!


Last updated: Nov 22 2024 at 17:03 UTC