miscompilation with opt_level=speed_and_size · cranelift

Because LICM has finally been fixed with jump tables, I tried to use it for cg_clif when enabling optimizations. It compiled fine and basic programs work, but simple-raytracer gave a SIGSEGV once and panicks the rest of the time. According to valgrind there are several reads of memory at different locations that is all freed at the same place. I will investigate further today or tomorrow.

==21057== Invalid read of size 1
==21057==    at 0x8BF885: core::str::validations::run_utf8_validation (validations.rs:143)
==21057==    by 0x8BF368: core::str::converts::from_utf8 (converts.rs:86)
==21057==    by 0x6FCCCF: std::sys_common::os_str_bytes::Slice::to_str (os_str_bytes.rs:165)
==21057==    by 0x6FB6BC: std::ffi::os_str::OsStr::to_str (os_str.rs:543)
==21057==    by 0x28351F: image::dynimage::save_buffer_impl::{{closure}} (dynimage.rs:787)
==21057==    by 0x33F3AC: core::option::Option<T>::and_then (option.rs:692)
==21057==    by 0x2828B7: image::dynimage::save_buffer_impl (dynimage.rs:786)
==21057==    by 0x1497E6: image::dynimage::save_buffer (dynimage.rs:775)
==21057==    by 0x148382: image::buffer::ImageBuffer<P,Container>::save (buffer.rs:441)
==21057==    by 0x147C51: raytracer::scene::Scene::render (scene.rs:42)
==21057==    by 0x143FBF: main::main (main.rs:138)
==21057==    by 0x14220B: core::ops::function::FnOnce::call_once (function.rs:227)
==21057==  Address 0x4c10067 is 39 bytes inside a block of size 552 free'd
==21057==    at 0x48369AB: free (vg_replace_malloc.c:530)
==21057==    by 0x4ABB880: fclose@@GLIBC_2.2.5 (iofclose.c:77)
==21057==    by 0x48AAE1C: pthread_getattr_np (pthread_getattr_np.c:161)
==21057==    by 0x75D4D1: std::sys::unix::thread::guard::get_stack_start (thread.rs:288)
==21057==    by 0x75D8A0: std::sys::unix::thread::guard::get_stack_start_aligned (thread.rs:305)
==21057==    by 0x75DA33: std::sys::unix::thread::guard::init (thread.rs:336)
==21057==    by 0x7465D0: std::rt::lang_start_internal (rt.rs:37)
==21057==    by 0x145143: std::rt::lang_start (rt.rs:65)
==21057==    by 0x14429C: main (in /home/bjorn/Documenten/cg_clif3/simple-raytracer/raytracer_cg_clif)
==21057==  Block was alloc'd at
==21057==    at 0x483577F: malloc (vg_replace_malloc.c:299)
==21057==    by 0x4ABC0FA: __fopen_internal (iofopen.c:65)
==21057==    by 0x48AACCA: pthread_getattr_np (pthread_getattr_np.c:81)
==21057==    by 0x75D4D1: std::sys::unix::thread::guard::get_stack_start (thread.rs:288)
==21057==    by 0x75D8A0: std::sys::unix::thread::guard::get_stack_start_aligned (thread.rs:305)
==21057==    by 0x75DA33: std::sys::unix::thread::guard::init (thread.rs:336)
==21057==    by 0x7465D0: std::rt::lang_start_internal (rt.rs:37)
==21057==    by 0x145143: std::rt::lang_start (rt.rs:65)
==21057==    by 0x14429C: main (in /home/bjorn/Documenten/cg_clif3/simple-raytracer/raytracer_cg_clif)

Julian Seward (Dec 10 2020 at 12:32):

@bjorn3 One thing you could do is back out the LICM fix, but leave LICM enabled. Then there are several possible scenarios:

I too would like to know whether this fix is really OK, since I'm trying to stabilise a branch of CL to ship in Firefox.

bjorn3 (Dec 10 2020 at 13:20):

Before the LICM fix opt_level=speed and opt_level=speed_and_size would simply panic when compiling. The LICM fix is what makes it possible to enable optimizations at all. I am pretty sure that the LICM fix didn't cause the issue.

Chris Fallin (Dec 10 2020 at 16:09):

@bjorn3 thanks for this -- eek. In addition to the questions from @Julian Seward above, it should be possible to instrument the licm pass with println's to work out where it's running; the CLIF for all functions on which LICM did any code motion would be super-helpful for us to debug

Julian Seward (Dec 10 2020 at 16:10):

@Chris Fallin moin! But note also, this doesn't actually imply that LICM is buggy. It's just "some part of the optimising pipeline".

Chris Fallin (Dec 10 2020 at 16:10):

Chris Fallin (Dec 10 2020 at 16:11):

@bjorn3 another thing to try would be to enable optimizations but explicitly remove LICM

Chris Fallin (Dec 10 2020 at 16:11):

then we avoid the panic that hid this before but hopefully expose whatever is causing the issue

bjorn3 (Dec 10 2020 at 16:12):

I normally debug miscompilations by removing code and adding println!() to show where the corruption happens.

Julian Seward (Dec 10 2020 at 16:13):

I had wondered from the backtraces if this was some kind of threading problem. And in particular whether supposedly atomic loads/stores are not being labelled correctly (or some such) by @bjorn3's front end, and hence are getting moved around (GVNd, LICMd) when they shouldn't.

bjorn3 (Dec 10 2020 at 16:15):

I use plain loads/stores protected by a global mutex. I am using an empty MemFlags so GVN and LICM should leave it alone. In addition the mutex lock/unlock should in itself function as a fence as it is a call instruction.

bjorn3 (Dec 10 2020 at 16:17):

The pthread_getattr_np call is used by libstd as part of adding a stack guard page. It is done at the start of all threads before passing execution to the user. This includes the main thread.

Chris Fallin (Dec 10 2020 at 16:18):

This looks to me more like a standard use-after-free (file read tries to read FILE* after fclose), maybe from a control-flow corruption somewhere?

Julian Seward (Dec 10 2020 at 16:20):

I'll build a full Fx with this fix in, so I can test it with really large inputs (eg Google Earth), but per @bjorn3's comments above, it seems like the fix itself isn't at fault.

Julian Seward (Dec 10 2020 at 16:21):

bjorn3 (Dec 10 2020 at 16:21):

A pointer to the block alloc'd inside libc shouldn't have been passed to user code even when it got corrupted. I think it is more likely that somewhere a pointer is created that just so happens to overlap with the malloc'd block.

bjorn3 (Dec 10 2020 at 16:21):

Chris Fallin (Dec 10 2020 at 16:23):

bjorn3 (Dec 11 2020 at 09:26):

fn main() {
    let ext = std::path::Path::new("result.png").extension().unwrap().to_str();
    assert_eq!(ext, Some("png"));
}

bjorn3 (Dec 11 2020 at 09:27):

Using println!() the extension before the .to_str() call shows as varying garbage bytes.

bjorn3 (Dec 11 2020 at 09:27):

Julian Seward (Dec 11 2020 at 09:28):

bjorn3 (Dec 11 2020 at 09:30):

Julian Seward (Dec 11 2020 at 09:32):

But you get different garbage bytes on different runs, or are they always the same?

bjorn3 (Dec 11 2020 at 09:32):

bjorn3 (Dec 11 2020 at 09:33):

Path::new("result.png").as_os_str().as_bytes() doesn't show any corruption. Path::new("result.png").extension().unwrap().as_bytes() does.

Julian Seward (Dec 11 2020 at 09:39):

@bjorn3 as a side thought, can you work around the lack of 128 bit int support in the newBE by just translating it all into 64 bit arithmetic at your level?

Julian Seward (Dec 11 2020 at 09:40):

Just on the basis that then you'd be using the pipeline under active development.

bjorn3 (Dec 11 2020 at 09:41):

That is hard. cg_clif expects that all primitive rust types correspond to cranelift types. I do translate some 128bit operations to 64bit operations, but I really need one rust scalar to correspond to one Value.

bjorn3 (Dec 11 2020 at 09:44):

use std::ffi::OsStr;
use std::os::unix::ffi::OsStrExt;

fn os_str_as_u8_slice(s: &OsStr) -> &[u8] {
    unsafe { &*(s as *const OsStr as *const [u8]) }
}

unsafe fn u8_slice_as_os_str(s: &[u8]) -> &OsStr {
    // SAFETY: see the comment of `os_str_as_u8_slice`
    unsafe { &*(s as *const [u8] as *const OsStr) }
}

fn split_file_at_dot(file: &OsStr) -> (Option<&OsStr>, Option<&OsStr>) {
    if os_str_as_u8_slice(file) == b".." {
        return (Some(file), None);
    }

    // The unsafety here stems from converting between &OsStr and &[u8]
    // and back. This is safe to do because (1) we only look at ASCII
    // contents of the encoding and (2) new &OsStr values are produced
    // only from ASCII-bounded slices of existing &OsStr values.
    let mut iter = os_str_as_u8_slice(file).rsplitn(2, |b| *b == b'.');
    let after = iter.next();
    let before = iter.next();
    if before == Some(b"") {
        (Some(file), None)
    } else {
        unsafe { (before.map(|s| u8_slice_as_os_str(s)), after.map(|s| u8_slice_as_os_str(s))) }
    }
}

fn main() {
    let ext = Some(OsStr::new("result.png")).map(split_file_at_dot).and_then(|(_, after)| after).unwrap();
    assert_eq!(ext.as_bytes(), b"png");
}

bjorn3 (Dec 11 2020 at 09:50):

fn main() {
    let ext = Some((Some(b"" as &[u8]), Some(b"png" as &[u8]))).and_then(|(_, after)| after).unwrap();
    assert_eq!(ext, b"png");
}

bjorn3 (Dec 11 2020 at 09:54):

fn main() {
    let ext = Some((Some(b"" as &[u8]), Some(b"png" as &[u8]))).and_then(|(_, after)| after);
    assert_eq!(ext, Some(b"png" as &[u8]));
}

bjorn3 (Dec 11 2020 at 10:33):

I can't find any relevant changes in Map::and_then between unoptimized and optimized. Only icmp + brnz -> ifcmp + brif, folding some pointer arithmetic into memory operations, rex prefix removal and some regalloc changes.

bjorn3 (Dec 11 2020 at 10:54):

I noticed that on multiple occasions GVN resulted in worse regalloc as it merges multiple stack_addr instructions which need to be spilled. Still searching for a relevant change.

bjorn3 (Dec 11 2020 at 11:10):

bjorn3 (Dec 11 2020 at 11:48):

I was trying to diff a normalized disassembly of the correct and miscompiled binaries. I found the following:

-push %rdi
+(bad)
+stc

bjorn3 (Dec 11 2020 at 11:48):

-dec %ebx
+callq *%rsi

Julian Seward (Dec 11 2020 at 11:49):

Are you sure you're disassembling as if it was 64-bit, not 32-bit? I ask because IIRC, dec %ebx in 32 bit code can be interpreted as a REX prefix in 64-bit code.

bjorn3 (Dec 11 2020 at 11:50):

bjorn3 (Dec 11 2020 at 11:51):

   182c0:       44 8b bc 24 3c 00 00    mov    0x3c(%rsp),%r15d
   182c7:       00
   182c8:       41 0f b6 c7             movzbl %r15b,%eax
   182cc:       85 c0                   test   %eax,%eax
   182ce:       74 ce                   je     1829e <_ZN4core6option15Option$LT$T$GT$8and_then17hde19356b2b9e6feeE+0xd8>
   182d0:       eb cc                   jmp    1829e <_ZN4core6option15Option$LT$T$GT$8and_then17hde19356b2b9e6feeE+0xd8>
   182d2:       39 ff                   cmp    %edi,%edi
   182d4:       ff                      (bad)
   182d5:       ff 5c ff ff             lcall  *-0x1(%rdi,%rdi,8)
   182d9:       ff                      .byte 0xff

bjorn3 (Dec 11 2020 at 11:53):

bjorn3 (Dec 11 2020 at 12:01):

@Julian Seward xor %dh,%dh and xor %sil,%sil zero different registers, right? (the x86 register overlapping is a bit confusing)

Julian Seward (Dec 11 2020 at 12:03):

Well, those are different both because they reference different underlying regs (rdx vs rsi) and because they reference different bit indices (15:8 vs 7:0)

Julian Seward (Dec 11 2020 at 12:04):

I think it's the case that if you add an 0x40 prefix to the first then you get the second.

Julian Seward (Dec 11 2020 at 12:04):

That is to say .. this is one of those few places where a REX prefix of 0x40 isn't redundant.

bjorn3 (Dec 11 2020 at 12:07):

Ok, thanks. It is [RexOp1u_id_z#30,%rsi] v53 = iconst.i8 0 before shrinking and [Op1u_id_z#30,%rsi] v53 = iconst.i8 0 after. Are there any other exceptions for this recipe where the rex prefix needs to be preserved?

Julian Seward (Dec 11 2020 at 12:09):

Julian Seward (Dec 11 2020 at 12:10):

bjorn3 (Dec 11 2020 at 12:10):

To phrase it differently. Are there any cases other cases where xor needs the rex prefix?

Julian Seward (Dec 11 2020 at 12:13):

This isn't specifically about xor. It relates to any basic integer insn that involves an 8-bit register.

bjorn3 (Dec 11 2020 at 12:16):

I guess I could try to not allow the rex-less version for all 8bit integer operations.

Stream: cranelift

Topic: miscompilation with opt_level=speed_and_size

bjorn3 (Dec 10 2020 at 11:50):

Julian Seward (Dec 10 2020 at 12:32):

bjorn3 (Dec 10 2020 at 13:20):

Chris Fallin (Dec 10 2020 at 16:09):

Julian Seward (Dec 10 2020 at 16:10):

Chris Fallin (Dec 10 2020 at 16:10):

Chris Fallin (Dec 10 2020 at 16:11):

Chris Fallin (Dec 10 2020 at 16:11):

bjorn3 (Dec 10 2020 at 16:12):

Julian Seward (Dec 10 2020 at 16:13):

bjorn3 (Dec 10 2020 at 16:15):

bjorn3 (Dec 10 2020 at 16:17):

Chris Fallin (Dec 10 2020 at 16:18):

Julian Seward (Dec 10 2020 at 16:20):

Julian Seward (Dec 10 2020 at 16:21):

bjorn3 (Dec 10 2020 at 16:21):

bjorn3 (Dec 10 2020 at 16:21):

Chris Fallin (Dec 10 2020 at 16:23):

bjorn3 (Dec 11 2020 at 09:26):

bjorn3 (Dec 11 2020 at 09:27):

bjorn3 (Dec 11 2020 at 09:27):

Julian Seward (Dec 11 2020 at 09:28):

bjorn3 (Dec 11 2020 at 09:30):

Julian Seward (Dec 11 2020 at 09:32):

bjorn3 (Dec 11 2020 at 09:32):

bjorn3 (Dec 11 2020 at 09:33):

Julian Seward (Dec 11 2020 at 09:39):

Julian Seward (Dec 11 2020 at 09:40):

bjorn3 (Dec 11 2020 at 09:41):

bjorn3 (Dec 11 2020 at 09:44):

bjorn3 (Dec 11 2020 at 09:50):

bjorn3 (Dec 11 2020 at 09:54):

bjorn3 (Dec 11 2020 at 10:33):

bjorn3 (Dec 11 2020 at 10:54):

bjorn3 (Dec 11 2020 at 11:10):

bjorn3 (Dec 11 2020 at 11:48):

bjorn3 (Dec 11 2020 at 11:48):

Julian Seward (Dec 11 2020 at 11:49):

bjorn3 (Dec 11 2020 at 11:50):

bjorn3 (Dec 11 2020 at 11:51):

bjorn3 (Dec 11 2020 at 11:53):

bjorn3 (Dec 11 2020 at 12:01):

Julian Seward (Dec 11 2020 at 12:03):

Julian Seward (Dec 11 2020 at 12:04):

Julian Seward (Dec 11 2020 at 12:04):

bjorn3 (Dec 11 2020 at 12:07):

Julian Seward (Dec 11 2020 at 12:09):

Julian Seward (Dec 11 2020 at 12:10):

bjorn3 (Dec 11 2020 at 12:10):

Julian Seward (Dec 11 2020 at 12:13):

bjorn3 (Dec 11 2020 at 12:16):

bjorn3 (Dec 11 2020 at 16:21):

Chris Fallin (Dec 11 2020 at 17:14):