Howdy! I'm trying to run a fairly large C/C++ test suite built on top of Catch2, compiled with wasi-sdk v10 and running with wasmtime. I'm getting a fail on exit:
===============================================================================
All tests passed (142033 assertions in 116 test cases)
Error: failed to run main module `/var/folders/zs/m4nj7dw15v54xf284j66fbth0000gp/T/Pram_IfXpoFVmPpbidgbO9vvp/g==/WasiRunner/16149EC0-pram-baselib-tests-debug.wasm/app/baselib-tests-debug.wasm`
Caused by:
0: failed to invoke `_start`
1: wasm trap: indirect call type mismatch, source location: @9e416a
wasm backtrace:
0: <unknown>!__funcs_on_exit
1: <unknown>!__prepare_for_exit
2: <unknown>!_start
There's no call to atexit in the code, so I'm guessing it's due to some generated __cxa_atexit with the wrong function signature. Is there any reasonable way for me to figure out which function has the wrong signature?
Is it possible to recompile the test suite with debug info and use a debugger with wasmtime?
mm. Good idea; I do have debug info, I didn't actually think to use a debugger here. Let me try that
/me is interested in assisting with troubleshooting any problems with running that setup
wasmtime transforms wasm DWARF if -g
is provided (works by default with lldb/linux, though there is a PR for gdb)
do I just lldb on wasmtime itself, or is there a gdbserver-like thing to attach to? Also running with -g
, I'm getting:
Error: failed to run main module `...`
Caused by:
0: Debug information error
1: Invalid opcode in DWARF expression
okay, looks like an old wasmtime
yeah, it's 0.15 -- I'm building from git right now
I recommend ;) https://github.com/bytecodealliance/wasmtime/pull/1572
do I want your branch, or master -- or merge of the two?
I rebased on top of master today
ah there's a merge conflict with master still
right, I'll resolve that shortly, the branch can be used as is
erm, right. sorry, operating on too little sleep :)
/me rebased PRs just to have that clean from conflicts
Arg, I need a newer lldb on mac
that might be a problem, only version of brew's lldb can do it, I think
and just for MacOSX, ".lldbinit" needs settings set plugin.jit-loader.gdb.enable on
/me has lldb version 9.0.0
on mac
ah it was the lldbinit piece I was missing! lldb 10.0 from brew works:
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
* frame #0: 0x000000016ec53806 JIT(0x17d700000)`__funcs_on_exit at <gen-1>.wasm:10371434
frame #1: 0x000000016ec53695 JIT(0x17d700000)`__prepare_for_exit at <gen-1>.wasm:10371335
frame #2: 0x000000016da007be JIT(0x17d700000)`_start at <gen-1>.wasm:47534
frame #3: 0x000000016ec6ace3
frame #4: 0x000000010013b33a wasmtime`wasmtime::func::Func::call::_$u7b$$u7b$closure$u7d$$u7d$::h519bce4b59bf0f7c at func.rs:566:20
frame #5: 0x0000000100161ba8 wasmtime`wasmtime_runtime::traphandlers::catch_traps::call_closure::h991a454e37488ebb(payload="P\x95���") at traphandlers.rs:385:17
frame #6: 0x00000001005d0fd6 wasmtime`RegisterSetjmp(buf_storage=0x00007ffeefbf9370, body=(wasmtime`wasmtime_runtime::traphandlers::catch_traps::call_closure::h991a454e37488ebb at traphandlers.rs:381), payload=0x00007ffeefbf9230) at helpers.c:12:3
of course now I need a libc with debug info
because it's not clear how to figure out what it's actually trying to call -- I'm guessing with debug symbols, I'll be able to inspect the actual function pointer value, which should tell me the actual bad function?
yeah, that's a hope
you can also dis
and check registers: maybe some value from params will pop
this is actually a little annoying. cranelift knows it's a bad signature. would be handy if it could show me what the "bad" one was (and expected, though in this case I know what's expected)
under the hood it is just an (signature) ID
sure, but that id is described in the wasm module
not really, more like engine assigned id/handle
I wonder if frame var
s are inspectable
they don't seem to be
in wasmtime you can choose cranelift optimization e.g. via --opt-level 0
function "pointer" indices are direct indices into a table, right? funcptr value "1" == index 1
I think I can use the wat output and figure out all the calls to cxa_atexit
right
fwiw, 10371335 is bytecode offset in wasm in <gen-1>.wasm:10371335
/me is not sure https://wasdk.github.io/wasmcodeexplorer/ can handle that large file though
how do I get the wasm heap base address in lldb?
is there __vmctx defined?
if yes, __vmctx->memory
shall do it
mm I can try, I was just using wasm2wat
__vmctx wasn't available, trying with opt-level 0 though
(thanks for your help btw!)
heh, --opt-level 0
made things worse :) (I have vmctx from a different frame turns out) With it I lose some local vars:
(int) var0 = <variable not available>
(int) var1 = 211424
(int) var2 = <variable not available>
without (default):
(int) var0 = 52080
(int) var1 = 211424
(int) var2 = 277217280
hmm... yeah... okay, I need to fix more stuff then
you can try setting a breakpoint (on symbols?), see if non-trap will give you more info
my strategy of "let me read ints from memory at various local var addresses (also double-dereferenced) in hopes of finding the function pointer index" did not succeed
right, having source code / DWARF for wasi-sdk might give you more clues for its location
I just realized I can also just.. put a printf inside funcs_atexit to print the indices as they're called
printf
: my favourite debugging technique when debugging miscompilations of cg_clif.
It would be _really nice_ to have some kind of wasm intrinsics for working with function signatures. I'd love to be able to add an assert(__wasm_signature_type(funcptr) == __wasm_signature_type(KnownGoodFunction))
in __cxa_atexit
and give the user a good warning
That was extremely painful, but it looks like the culprit is $std::__2::vector<Baselib_Memory_PageState__std::__2::allocator<Baselib_Memory_PageState>_>::~vector__
which returns an i32, instead of returning void
// Under some ABIs, destructors return this instead of void, and cannot be
// passed directly to __cxa_atexit if the target does not allow this
// mismatch.
er.. this seems like a llvm bug, because canCallMismatchedFunctionType
should definitely be false for wasm
now I have 3hrs of meetings but I will fix this :)
@Vladimir Vukicevic it should be returning false already: https://github.com/llvm/llvm-project/blob/master/clang/lib/CodeGen/ItaniumCXXABI.cpp#L517
does wasm/wasi use the itanium ABI?
i'm trying to create a small repro
it uses a variant of the Itanium ABI
https://github.com/llvm/llvm-project/blob/master/clang/include/clang/Basic/TargetCXXABI.h#L91 if anyone is curious :-)
@Dan Gohman you may know this (hi, btw!) -- in my broken case, the bad __cxa_atexit
destructor call setup is happening from a __cxx_global_var_init.77
function. Does that seem sane for global_var_init to do that?
Yeah. Wasm doesn't have an equivalent of .fini or .dtors in other platforms, so the global var inits have to register their associated destructor calls with __cxa_atexit
Also, hi! :grinning:
but I see global dtors also being called from __cxx_global_array_dtor
functions too
__cxx_global_array_dtor is the function that gets registered with __cxa_atexit, which calls the dtors
well -- no, in my case I see a dtor being _directly_ registered with __cxa_atexit from a global_var_init
(func $__cxx_global_var_init.77 (type $t9)
(local $l0 i32) (local $l1 i32) (local $l2 i32) (local $l3 i32) (local $l4 i32)
i32.const 2171
local.set $l0
i32.const 0
local.set $l1
i32.const 1024
local.set $l2
i32.const 211420
local.set $l3
i32.const 151088
local.set $l4
local.get $l3
local.get $l4
call $std::__2::basic_string<char__std::__2::char_traits<char>__std::__2::allocator<char>_>::basic_string<std::nullptr_t>_char_const*_
drop
local.get $l0
local.get $l1
local.get $l2
call $__cxa_atexit
drop
return)
in my case $l0 = 2171
is the bad-signature vector dtor being passed to cxa_atexit
Hmm. That does seem odd. wasm destructors return i32, so I wouldn't think they could be registered with __cxa_atexit like that
yep, that's the bug :)
Is that a std::string? When I try a simple testcase with std::string, it registers __cxx_global_array_dtor
with __cxa_atexit
, and __cxx_global_array_dtor
calls the dtor
I need to try -fregister-global-dtors-with-atexit
but I really want to get to a small repro, this massive unified test runner is not great to work with :)
it's also totally possible that the wrong function pointer is being used here somehow too
Oh, awkward. It looks like -fregister-global-dtors-with-atexit does the same thing as what the wasm backend does with global dtors
Ooh. ignore my WAT paste above, the actual number is 2170 (which is still the dtor)
ok, 2170 is the same problem, just in a different __cxx_global_var_init.5.37
. the ctor right before it is call $std::__2::vector<Baselib_Memory_PageState__std::__2::allocator<Baselib_Memory_PageState>_>::vector_std::initializer_list<Baselib_Memory_PageState>_
, let me see if I can create something like this
Baselib_Memory_PageState is just an enum, nothing crazy.
Yeah, I can't reproduce this in a small testcase. The thing that's wrong is the only vector dtor that's in the function ptr table (lots of other dtors, I assume due to vtables). In a clang link command that outputs wasm, how do I get it to output bitcode or LLMV IR?
I don't know of a way to get it to emit llvm ir when producing wasm. It doesn't produce a linked LLVM IR image unless you're using -flto
I'm now at a bit of a loss how to proceed. I _think_ the most useful next steps will be one of:
Can you determine which source file defines the object with the destructor in question?
Yeah I'm pretty sure I can
If you can compile that source file to .o, we could see if the problem is present there, or if it gets introduced at link time
define internal void @__cxx_global_var_init.27() #0 !dbg !13235 {
%1 = alloca %"class.std::initializer_list", align 4
%2 = alloca [6 x i32], align 4
%3 = getelementptr inbounds [6 x i32], [6 x i32]* %2, i32 0, i32 0, !dbg !13236
%4 = bitcast [6 x i32]* %2 to i8*, !dbg !13236
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %4, i8* align 4 bitcast ([6 x i32]* @constinit to i8*), i32 24, i1 false), !dbg !13236
%5 = getelementptr inbounds %"class.std::initializer_list", %"class.std::initializer_list"* %1, i32 0, i32 0, !dbg !13236
%6 = getelementptr inbounds [6 x i32], [6 x i32]* %2, i32 0, i32 0, !dbg !13236
store i32* %6, i32** %5, align 4, !dbg !13236
%7 = getelementptr inbounds %"class.std::initializer_list", %"class.std::initializer_list"* %1, i32 0, i32 1, !dbg !13236
store i32 6, i32* %7, align 4, !dbg !13236
%8 = call %"class.std::__2::vector.6"* @_ZNSt3__26vectorI24Baselib_Memory_PageStateNS_9allocatorIS1_EEEC2ESt16initializer_listIS1_E(%"class.std::__2::vector.6"* @pageStatesAll, %"class.std::initializer_list"* byval(%"class.std::initializer_list") align 4 %1), !dbg !13236
%9 = call i32 @__cxa_atexit(void (i8*)* @__cxx_global_array_dtor.28, i8* null, i8* @__dso_handle) #4, !dbg !13236
ret void, !dbg !13236
}
pretty sure that's it, and it looks fine :|
Is this an optimized build?
And if so, do you have wasm-opt in your PATH?
debug build (-g -O0)
Can you do wasm2wat
on the .o file?
tried, I get "0001e60: error: invalid section code: 12"
(brb, meeting, sigh)
or llvm-objdump -d
(which works now \o/)
I can do llvm-objdump.. but not sure how to tell if the result is useful, I don't get any symbol/function names
wasm2wat may need --enable-bulk-memory
Or if your wasm2wat is new enough, --enable-all should do it
yep that worked. let's see
(func $__cxx_global_var_init.27 (type $t0)
(local $l0 i32) (local $l1 i32) (local $l2 i32) (local $l3 i32) (local $l4 i32) (local $l5 i32) (local $l6 i32) (local $l7 i32) (local $l8 i32) (local $l9 i32) (local $l10 i64) (local $l11 i32) (local $l12 i32) (local $l13 i64) (local $l14 i64) (local $l15 i64) (local $l16 i32) (local $l17 i32) (local $l18 i32) (local $l19 i32) (local $l20 i32) (local $l21 i32) (local $l22 i32) (local $l23 i32)
global.get $env.__stack_pointer
local.set $l0
i32.const 48
local.set $l1
local.get $l0
local.get $l1
i32.sub
local.set $l2
local.get $l2
global.set $env.__stack_pointer
i32.const 6
local.set $l3
i32.const 16
local.set $l4
local.get $l2
local.get $l4
i32.add
local.set $l5
local.get $l5
local.set $l6
i32.const 16
local.set $l7
local.get $l6
local.get $l7
i32.add
local.set $l8
i32.const 0
local.set $l9
local.get $l9
i64.load offset=648 align=4
local.set $l10
local.get $l8
local.get $l10
i64.store align=4
i32.const 8
local.set $l11
local.get $l6
local.get $l11
i32.add
local.set $l12
local.get $l9
i64.load offset=640 align=4
local.set $l13
local.get $l12
local.get $l13
i64.store align=4
local.get $l9
i64.load offset=632 align=4
local.set $l14
local.get $l6
local.get $l14
i64.store align=4
local.get $l2
local.get $l6
i32.store offset=40
local.get $l2
local.get $l3
i32.store offset=44
local.get $l2
i64.load offset=40
local.set $l15
local.get $l2
local.get $l15
i64.store offset=8
i32.const 620
local.set $l16
i32.const 8
local.set $l17
local.get $l2
local.get $l17
i32.add
local.set $l18
local.get $l16
local.get $l18
call $_ZNSt3__26vectorI24Baselib_Memory_PageStateNS_9allocatorIS1_EEEC2ESt16initializer_listIS1_E
drop
i32.const 16
local.set $l19
i32.const 0
local.set $l20
i32.const 0
local.set $l21
i32.const 620
drop
local.get $l19
local.get $l20
local.get $l21
call $env.__cxa_atexit
drop
i32.const 48
local.set $l22
local.get $l2
local.get $l22
i32.add
local.set $l23
local.get $l23
global.set $env.__stack_pointer
return)
gah sorry, long paste
ok, so it has a 0, so there's presumably a relocation to fill it iin
llvm-objdump -d -r
on the .o file should show the relocations
(ok meetings all done) What am I looking for in the -d -r output? the tail end looks like
f598: 10 d4 81 80 80 00 call 212
0000f599: R_WASM_FUNCTION_INDEX_LEB _ZNSt3__26vectorI24Baselib_Memory_PageStateNS_9allocatorIS1_EEEC2ESt16initializer_listIS1_E+0
f59e: 1a drop
f59f: 41 90 80 80 80 00 i32.const 16
0000f5a0: R_WASM_TABLE_INDEX_SLEB __cxx_global_array_dtor.28+0
f5a5: 21 13 local.set 19
f5a7: 41 00 i32.const 0
f5a9: 21 14 local.set 20
f5ab: 41 80 80 80 80 00 i32.const 0
0000f5ac: R_WASM_MEMORY_ADDR_SLEB __dso_handle+0
f5b1: 21 15 local.set 21
f5b3: 41 ec 84 80 80 00 i32.const 620
0000f5b4: R_WASM_MEMORY_ADDR_SLEB pageStatesAll+0
f5b9: 1a drop
f5ba: 20 13 local.get 19
f5bc: 20 14 local.get 20
f5be: 20 15 local.get 21
f5c0: 10 80 80 80 80 00 call 0
0000f5c1: R_WASM_FUNCTION_INDEX_LEB __cxa_atexit+0
f5c6: 1a drop
f5c7: 41 30 i32.const 48
f5c9: 21 16 local.set 22
f5cb: 20 02 local.get 2
f5cd: 20 16 local.get 22
f5cf: 6a i32.add
f5d0: 21 17 local.set 23
f5d2: 20 17 local.get 23
f5d4: 24 80 80 80 80 00 global.set 0
0000f5d5: R_WASM_GLOBAL_INDEX_LEB __stack_pointer+0
f5da: 0f return
f5db: 0b end
which looks correct unfortunately
I just audited all __cxa_atexit calls in this file and everything looks fine. This is the only file that the relevant type (the Baselib_Memory_PageState) enum/vector appears in
Here's what I've got. All the global_var_inits from this function set up a cxa_atexit with a global_array_dtor function, except for the bad one. The bad one is the "right" (consistent) index:
(func $_GLOBAL__sub_I_Baselib_Memory_Tests_Wasi.cpp (type $t9)
call $__cxx_global_var_init.76 -> cxa_atexit with 2167 __cxx_global_array_dtor.78
call $__cxx_global_var_init.1.46 -> cxa_atexit with 2168 __cxx_global_array_dtor.2.46
call $__cxx_global_var_init.3.46 -> cxa_atexit with 2169 __cxx_global_array_dtor.4.46
call $__cxx_global_var_init.5.37 -> cxa_atexit with ... 2170 $std::__2::vector<Baselib_Memory_PageState__std::__2::allocator<Baselib_Memory_PageState>_>::~vector__
return)
(with annotations of what funcptr index they call and what the actual thing is)
Ok, I had the wrong things earlier. Based on that, the issue is in var_init.5
in that file. The .o file indeed has the wrong thing for var_init.5
:
...
705: 41 84 80 80 80 00 i32.const 4
00000706: R_WASM_TABLE_INDEX_SLEB _ZNSt3__26vectorI24Baselib_Memory_PageStateNS_9allocatorIS1_EEED2Ev+0
70b: 21 0d local.set 13
70d: 20 0d local.get 13
70f: 21 0e local.set 14
711: 20 0c local.get 12
713: 21 0f local.set 15
715: 41 80 80 80 80 00 i32.const 0
00000716: R_WASM_MEMORY_ADDR_SLEB __dso_handle+0
71b: 21 10 local.set 16
71d: 20 0e local.get 14
71f: 20 0f local.get 15
721: 20 10 local.get 16
723: 10 80 80 80 80 00 call 0
00000724: R_WASM_FUNCTION_INDEX_LEB __cxa_atexit+0
And the LLVM IR looks wrong (extra indentation in the cxa_atexit call):
define internal void @__cxx_global_var_init.5() #0 !dbg !3799 {
%1 = alloca %"class.std::initializer_list", align 4
%2 = alloca [1 x i32], align 4
%3 = getelementptr inbounds [1 x i32], [1 x i32]* %2, i32 0, i32 0, !dbg !3800
store i32 4, i32* %3, align 4, !dbg !3800
%4 = getelementptr inbounds %"class.std::initializer_list", %"class.std::initializer_list"* %1, i32 0, i32 0, !dbg !3800
%5 = getelementptr inbounds [1 x i32], [1 x i32]* %2, i32 0, i32 0, !dbg !3800
store i32* %5, i32** %4, align 4, !dbg !3800 %6 = getelementptr inbounds %"class.std::initializer_list", %"class.std::initializer_list"* %1, i32 0, i32 1, !dbg !3800
store i32 1, i32* %6, align 4, !dbg !3800
%7 = call %"class.std::__2::vector.6"* @_ZNSt3__26vectorI24Baselib_Memory_PageStateNS_9allocatorIS1_EEEC2ESt16initializer_listIS1_E(%"class.std::__2::vector.6"* @_ZGR39Baselib_Test_Memory_SupportedPageStates_, %"class.std::initializer_list"* byval(%"class.std::initializer_list") align 4 %1), !dbg !3800
%8 = call i32 @__cxa_atexit(
void (i8*)* bitcast (%"class.std::__2::vector.6"* (%"class.std::__2::vector.6"*)* @_ZNSt3__26vectorI24Baselib_Memory_PageStateNS_9allocatorIS1_EEED2Ev to void (i8*)*),
i8* bitcast (%"class.std::__2::vector.6"* @_ZGR39Baselib_Test_Memory_SupportedPageStates_ to i8*),
i8* @__dso_handle) #4, !dbg !3800 store %"class.std::__2::vector.6"* @_ZGR39Baselib_Test_Memory_SupportedPageStates_, %"class.std::__2::vector.6"** @Baselib_Test_Memory_SupportedPageStates, align 4, !dbg !3800
ret void, !dbg !3801
}
It looks like it might be related to global reference temporaries
struct B { B(); ~B(); };
namespace test {
const B b1 = B();
const B &b2 = B();
}
shows the problem
@Dan Gohman yep. I literally just tracked it down to here: https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGExpr.cpp#L347
That call doesn't go through all the same code that happens https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGDeclCXX.cpp#L143
@Dan Gohman can you fix, or want me to give it a go?
I'm not at a computer where I can debug clang easily, so if yyou want to have a go, go for ito
I may poke you tomorrow for help in actually getting a patch submitted :put_litter_in_its_place:
@Dan Gohman I take it back. I don't have anywhere near enough context to do this :)
Yeah definitely can't do this, not without spending a large chunk of time on it that I wish I had. Let me know if I should file it somewhere.
And armed with all this info, I see this code in our tests:
const std::vector<Baselib_Memory_PageState>& Baselib_Test_Memory_SupportedPageStates = { Baselib_Memory_PageState_ReadWrite };
which, if I explicitly declare a vector with the right value and then assign that to the reference, the problem goes away.
Alternatively, could you just remove the &
there and create a regular global instead of a global reference to a temporary?
not in this case, this is a thing that's an extern elsewhere and it's a per-platform vector where most other platforms are initialized from a global vector
also, hooray for tracking down something that likely would have been a pain in the butt at some random point later on :)
Looking at the code in clang, I have an idea of what might be wrong, but I'll need to debug more to confirm. But I'll have to pick this up tomorrow.
Yep, no problem. Thanks for looking into it!
@Dan Gohman any luck on this? (I was out yesterday)
Not yet; I got busy with other things yesterday, but I'm going to take another look soon
np, thanks!
The dtor bitcast is coming from here:
https://github.com/llvm/llvm-project/blob/master/clang/lib/CodeGen/ItaniumCXXABI.cpp#L2403
So one option is to make it call EmitDeclDestroy
, or factor out code from EmitDeclDestroy
for it to use.
That would let it use a __cxx_global_array_dtor
wrapper, which should fix the problem.
There's also a plausible workaround:
It turns out we already have code in the LLVM wasm backend which attempts to emulate function pointer casts by inserting wrapper functions.
The only reason that code isn't saving us here is that it's looking for casted functions which are immediately called,
and here we have a function being casted and passed as an argument to __cxa_atexit
.
So we could tweak this:
to recognize uses which are "the first operand of a __cxa_atexit
call"
which would then cause it to insert a wrapper, which ought to generate working code.
It seems like both paths to __cxa_atexit
should go through something common
But maybe not. Either one of your suggestions seems pretty reasonable, but fixing it in a non-WASM specific way seems better because this is technically a problem on any ABI that can't call mismatched functions. Which is probably just wasm, but still.
@Dan Gohman any luck on this? (Or, alternatively, is there a bug filed I can track -- or do you want me to file one somewhere? :)
No, I've not made any further progress yet.
I poked at it a bit, but I didn't see any super easy paths here. If you wanted to file a bug with the small testcase above and the findings about it, that'd be great!
Where's the best place to file the bug? Against wasi-sdk or llvm?
For wasi-sdk
, that'd be here: https://github.com/webassembly/wasi-sdk/issues
Yeah, the issue is a llvm bug though for the wasm backend; wasn't sure if filing that in wasi-sdk was the right place
For clang/llvm bugs, use the llvm tracker: https://bugs.llvm.org/
Filed https://bugs.llvm.org/show_bug.cgi?id=45876
Last updated: Nov 22 2024 at 16:03 UTC