Stream: general

Topic: CPP Wasmtime parsing big json string


view this post on Zulip cdvl (Aug 21 2023 at 15:26):

Hi,

I have a cpp wasmtime that parse a jsonstring. My problem is that my cpp code runs well on both small and large json string input but fails with wasm on large string only. No error is printed. I suppose a memory limitation but I don't know how to change it. Could you help me please?

view this post on Zulip Alex Crichton (Aug 21 2023 at 15:52):

Is it possible to share a wasm module that reproduces this behavior? For example are you running the wasmtime CLI? Are you using the Rust embedding API? Is the guest wasm module in C++? (e.g. details like this can help folks dig in, but a reproduction would be most useful)

view this post on Zulip cdvl (Aug 22 2023 at 07:26):

I'm running the .wasm file with Rust API but I tried with wasmtime CLI first. Both work on small json but fail on large inputs. It's neither due to my C++ code nor the big json file, because when I build and run the C++ code, everything is fine. The problem appears when I switch to wasm.

I will ask if it possible to share a wasm module.

view this post on Zulip cdvl (Aug 22 2023 at 08:24):

Here the example :
json-parsing.zip

You can run the cpp code on big json file via "make cpp".
I tried two different json lib in c++ (nlohmann and simdjson). You can test them with:

view this post on Zulip cdvl (Aug 22 2023 at 12:47):

It's well a memory limitation. I just write a c++ code with a vector growing over iterations and the same thing occurred. What are wasm memory limitation? How to increase these limitations?

view this post on Zulip Alex Crichton (Aug 22 2023 at 14:12):

Can you share the actual wasm module you're running? I don't have emscripten installed myself so I compiled the example with wasi-sdk after some modifications and it works fine for the small/big examples you provided. One thing you can try is passing ---trap-on-grow-failure to the CLI which will cause a trap to happen on OOM rather than returning -1 which can turn a failure that may be ignored into a loud failure

view this post on Zulip cdvl (Aug 22 2023 at 14:33):

test.wasm

--trap-on-grow-failure does not change anything. I will try with wasi-sdk! What modification did you make?

view this post on Zulip Alex Crichton (Aug 22 2023 at 15:07):

I changed a few things like removing usage of <filesystem> in json.hpp and removing some try/catch and otherwise just fixing a few compile errors, mostly differences in runtimes I think between emscripten/wasi-sdk

view this post on Zulip Alex Crichton (Aug 22 2023 at 15:10):

Hm running wasmtime run ./test.wasm < input doesn't seem to do anything with any input with the module you gave me. Could you perhaps open an issue on Wasmtime with detailed steps/reproduction/outputs/etc to help further debug this?

view this post on Zulip cdvl (Aug 22 2023 at 15:12):

I tried to use wasi-sdk but it fails to compile.

export WASI_SDK_PATH=/home/celine/Téléchargements/wasi-sdk-20.0
${WASI_SDK_PATH}/bin/clang++ --sysroot=wasi-sdk-20.0/share/wasi-sysroot -o test.wasm -Wl,--export-all -Wl,--no-entry -Iinclude/ -isystem /usr/include/ -Wl,--export=test_nlohmann src/main.cpp -v

I get the "fatal error: 'iostream' file not found". I don't understand because I include the system headers too.

view this post on Zulip Dan Gohman (Aug 22 2023 at 15:13):

iostream is a C++ header; it's in $sysroot/include/c++/v1

view this post on Zulip cdvl (Aug 22 2023 at 15:15):

You need to specify the function name to invoke
wasmtime test.wasm --invoke test_simdjson < input_small.json

view this post on Zulip Alex Crichton (Aug 22 2023 at 15:18):

ok I can reproduce the failure, I see that something is calling an exit explicitly with a 1 argument, so something is explicitly exiting

view this post on Zulip Alex Crichton (Aug 22 2023 at 15:19):

I didn't pass --sysroot to wasi-sdk myself, I let it find it itself

view this post on Zulip Alex Crichton (Aug 22 2023 at 15:21):

can you build the original wasm file with debug information with emscripten?

view this post on Zulip Alex Crichton (Aug 22 2023 at 15:21):

modifying Wasmtime I can get:

Caused by:
    0: failed to invoke `test_simdjson`
    1: error while executing at wasm backtrace:
           0: 0x5acee - <unknown>!<wasm function 2012>
           1: 0x5acbb - <unknown>!<wasm function 2005>
           2: 0x7af39 - <unknown>!<wasm function 3327>
           3: 0x7af4d - <unknown>!<wasm function 3329>
           4: 0x7af55 - <unknown>!<wasm function 3330>
           5: 0x1591e - <unknown>!<wasm function 295>
           6: 0x152d9 - <unknown>!<wasm function 284>
           7: 0x14f5b - <unknown>!<wasm function 283>
           8: 0x14e98 - <unknown>!<wasm function 281>
           9: 0x11722 - <unknown>!<wasm function 166>
          10: 0x11213 - <unknown>!<wasm function 154>
          11: 0x10f02 - <unknown>!<wasm function 151>
    2: Exited with i32 exit status 1

view this post on Zulip Alex Crichton (Aug 22 2023 at 15:21):

but that's not too helpful without debug information

view this post on Zulip cdvl (Aug 22 2023 at 15:27):

test.wasm Here the wasm compiled with the -g option

view this post on Zulip cdvl (Aug 22 2023 at 15:37):

I recompiled the example keeping only the simdjson lib (removing nlohmann one) with wasi-sdk. Same problem: when I run it with wasmtime it works on small json file but failed on big file.

view this post on Zulip Alex Crichton (Aug 22 2023 at 15:45):

ah yeah this looks like OOM?

stdin length: 3544203
Try to parse3544203
Error: failed to run main module `/Users/alex/Downloads/test(1).wasm`

Caused by:
    0: failed to invoke `test_simdjson`
    1: error while executing at wasm backtrace:
           0: 0x6dfb2 - _Exit
                           at /build/emscripten-buVz5q/emscripten-3.1.5~dfsg/system/lib/libc/musl/src/exit/_Exit.c:7:2
           1: 0x6df7f - abort
                           at /build/emscripten-buVz5q/emscripten-3.1.5~dfsg/system/lib/standalone/standalone.c:33:3
           2: 0x8e789 - operator new(unsigned long)
                           at /build/emscripten-buVz5q/emscripten-3.1.5~dfsg/system/lib/libcxx/src/new.cpp:84:13
           3: 0x8e79d - operator new[](unsigned long)
                           at /build/emscripten-buVz5q/emscripten-3.1.5~dfsg/system/lib/libcxx/src/new.cpp:116:12
           4: 0x8e7a5 - operator new[](unsigned long, std::nothrow_t const&)
                           at /build/emscripten-buVz5q/emscripten-3.1.5~dfsg/system/lib/libcxx/src/new.cpp:128:13
           5: 0x1a134 - simdjson::dom::document::allocate(unsigned long)
                           at /home/celine/Téléchargements/json-parsing/include/simdjson.h:8430:21
           6: 0x19988 - simdjson::dom::parser::ensure_capacity(simdjson::dom::document&, unsigned long)
                           at /home/celine/Téléchargements/json-parsing/include/simdjson.h:7627:87
           7: 0x19570 - simdjson::dom::parser::parse_into_document(simdjson::dom::document&, unsigned char const*, unsigned long, bool) &
                           at /home/celine/Téléchargements/json-parsing/include/simdjson.h:7514:23
           8: 0x1941f - simdjson::dom::parser::parse(unsigned char const*, unsigned long, bool) &
                           at /home/celine/Téléchargements/json-parsing/include/simdjson.h:7546:10
           9: 0x150df - simdjson::dom::parser::parse(char const*, unsigned long, bool) &
                           at /home/celine/Téléchargements/json-parsing/include/simdjson.h:7550:10
          10: 0x14ad4 - simdjson::dom::parser::parse(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> > const&) &
                           at /home/celine/Téléchargements/json-parsing/include/simdjson.h:7553:10
          11: 0x1476f - test_simdjson
                           at /home/celine/Téléchargements/json-parsing/src/main.cpp:46:45
    2: Exited with i32 exit status 1

view this post on Zulip cdvl (Aug 22 2023 at 15:45):

I think that is a memory limitation from wasm. I added this c++ function:
void stress(){
std::string jsonString;
std::getline(std::cin, jsonString);

    // create a vector growing over iteration
    std::vector<std::string> dynamicVector;
    for (size_t i = 0; i < 100; i++){
        dynamicVector.push_back(jsonString);
        std::cout << "i " << i << " - length: " << jsonString.length()*(i+1) << std::endl;
    }
}

The wasm exit with an error 1 after some iteration while cpp program continu.

view this post on Zulip cdvl (Aug 22 2023 at 15:47):

Alex Crichton said:

ah yeah this looks like OOM?

stdin length: 3544203
Try to parse3544203
Error: failed to run main module `/Users/alex/Downloads/test(1).wasm`

Caused by:
    0: failed to invoke `test_simdjson`
    1: error while executing at wasm backtrace:
           0: 0x6dfb2 - _Exit
                           at /build/emscripten-buVz5q/emscripten-3.1.5~dfsg/system/lib/libc/musl/src/exit/_Exit.c:7:2
           1: 0x6df7f - abort
                           at /build/emscripten-buVz5q/emscripten-3.1.5~dfsg/system/lib/standalone/standalone.c:33:3
           2: 0x8e789 - operator new(unsigned long)
                           at /build/emscripten-buVz5q/emscripten-3.1.5~dfsg/system/lib/libcxx/src/new.cpp:84:13
           3: 0x8e79d - operator new[](unsigned long)
                           at /build/emscripten-buVz5q/emscripten-3.1.5~dfsg/system/lib/libcxx/src/new.cpp:116:12
           4: 0x8e7a5 - operator new[](unsigned long, std::nothrow_t const&)
                           at /build/emscripten-buVz5q/emscripten-3.1.5~dfsg/system/lib/libcxx/src/new.cpp:128:13
           5: 0x1a134 - simdjson::dom::document::allocate(unsigned long)
                           at /home/celine/Téléchargements/json-parsing/include/simdjson.h:8430:21
           6: 0x19988 - simdjson::dom::parser::ensure_capacity(simdjson::dom::document&, unsigned long)
                           at /home/celine/Téléchargements/json-parsing/include/simdjson.h:7627:87
           7: 0x19570 - simdjson::dom::parser::parse_into_document(simdjson::dom::document&, unsigned char const*, unsigned long, bool) &
                           at /home/celine/Téléchargements/json-parsing/include/simdjson.h:7514:23
           8: 0x1941f - simdjson::dom::parser::parse(unsigned char const*, unsigned long, bool) &
                           at /home/celine/Téléchargements/json-parsing/include/simdjson.h:7546:10
           9: 0x150df - simdjson::dom::parser::parse(char const*, unsigned long, bool) &
                           at /home/celine/Téléchargements/json-parsing/include/simdjson.h:7550:10
          10: 0x14ad4 - simdjson::dom::parser::parse(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> > const&) &
                           at /home/celine/Téléchargements/json-parsing/include/simdjson.h:7553:10
          11: 0x1476f - test_simdjson
                           at /home/celine/Téléchargements/json-parsing/src/main.cpp:46:45
    2: Exited with i32 exit status 1

Yes! But I don't know how to allow more memory as explained above :)

view this post on Zulip Alex Crichton (Aug 22 2023 at 15:47):

the memory emscripten is generating is 256 pages large initially and additionally cannot grow beyond the 256 page limit, so my guess is that you're exceeding 256 wasm pages here and then it's hitting OOM

view this post on Zulip Alex Crichton (Aug 22 2023 at 15:47):

you'll need to see how to remove emscripten's upper bound on the memory, and I don't know how to do that

view this post on Zulip Alex Crichton (Aug 22 2023 at 15:47):

(I also don't know why --trap-on-grow-failure didn't work)

view this post on Zulip cdvl (Aug 22 2023 at 15:49):

Memory settings are set at the compile step? Does it depend on the compiler?

view this post on Zulip Alex Crichton (Aug 22 2023 at 15:56):

yes

view this post on Zulip cdvl (Aug 23 2023 at 08:13):

Ok and do you know how to do that with wasi-sdk?

view this post on Zulip Joel Dice (Aug 23 2023 at 13:39):

wasm-ld has a --max-memory option, so maybe something like clang -Wl,--max-memory=1073741824 ...?


Last updated: Nov 22 2024 at 16:03 UTC