Hi, i am new here and hope this is the correct stream for this. I am trying to understand, where call_indirect
's come from and i want to determine the call indirect index for a static analysis call graph generator.
Why are there so many in Rust? Even if you compile this simple code snippet:
fn main() { let a = 123; }
to WebAssembly you get 76 indirect calls. For a hello world you get over one hundred.
I analyzed the WasmBench dataset and looked for pattern ahead of a call indirect and it turns out, that there are basically two patterns determining over 87% the index.
The first one is the Add Case
and the second one the Load Offset Case
:
add.png
load-offset.png
The interesting part is that the Add Case
has a bitwise AND and only the last instruction differs, which is either a load/load offset
or a local.get
(they sum up to 99.78% of this pattern). And the Load Offset Case
is just a mixture of load/load offset
and the local.get
instruction.
So to predict the index, i would have the read from the memory and get a local variable(which could also be a function parameter). I build the memory from the data segments with Walrus assuming that the memory holding the value doesn't change - but encountered, that the assumption is wrong - since i've found that in a file we read from a different offset and call a different function type, but with the same "self read from memory index" 0, which is a violation to the language, as the table only holds one function under each index with a fixed type.
This leads me to the question, why does WebAssembly take the indirection over the memory?
Another idiosyncrasie i found, was when looking at the metadata file containing the source languages/compilers of each file. I did this, because i thought there must be correlation between the compiler and the occurrence of these two cases.
Case: C++ - Rust
Add: 540 - 46
Load: 305 - 1118
Files from C++: 4047
Files from Rust: 1261
Files with both cases: 0
The interesting thing is, that i never found a file, which contained both cases. It seems like once the compiler decides to use either the Add Case
or the Load Offset Case
it sticks to it.
Another obscure thing is, that there are often empty function declarations as the first local function in WebAssembly - since they do not have any function body, they can't do anything (func (;;1;;) (type 1: [], [])
.
So hopefully you can help me with my questions:
Not so relevant, but still interesting questions:
Why are there so many indirect calls in even the simplest Rust code?
This is more-or-less answered "the rust standard library". There's code that runs before main
, so your function isn't literally the only one in the program, and this supporting code does indirect calls.
Where do these indirect calls come from, when does the compiler decide to use one?
The answer to this is an indirect function call. Think calling through a function pointer in Rust or C/C++.
Why does WebAssembly use the indirection over the memory determining the index?
WebAssembly functions are stored in a "table" which cannot be modified by Rust/C/C++ at this time (or at least wasn't modifiable historically, Clang may have niche intrinsics nowadays). This means that function "pointers" are actually indices into this table (as represented in these languages). The function pointer is frequently stored in memory since that's how the program stored it.
Why are there never both cases in a file?
Unsure! Might just be idiomatic differences between languages.
Why is there often an empty function as the first local function?
Table index 0 is intended to be empty, or the equivalent of ref.null func
. This is so if you happen to call the "null pointer", which in this case is table index 0, then you'll get a trap which is similar to a segfault on native platforms.
Why is the a bitwise AND instruction in the Add Case, if we are working on an integer, which is the index of a table?
I'm not sure myself.
Hi Alex,
thank you so far, for your answers. I want to dig deeper and i hope you can help me.
Can you give me a link to the code, where the call_indirects are produced by the compiler?
Thank you in advance
Ah sorry I dont know where they're produced in LLVM
Why is there often an empty function as the first local function?
Table index 0 is intended to be empty, or the equivalent of
ref.null func
. This is so if you happen to call the "null pointer", which in this case is table index 0, then you'll get a trap which is similar to a segfault on native platforms.
As there is currently only one table at most allowed in WA, that can only hold funcref
s i thought, that the table is the function table, used for the call_indirect
s, to determine which function is called.
How is this table build? I thought it was simply first all the imported functions starting at index 0 and then the local functions? But this doesn't work out with the elem
statements - do these come before or after the imports? Are all functions of the module in this function table? Or is it rather, that the function table only contains call_indirect
related functions (so a limited set of all functions in the module)? For example only the functions set with an active elem declaration.
Why does WebAssembly use the indirection over the memory determining the index?
WebAssembly functions are stored in a "table" which cannot be modified by Rust/C/C++ at this time (or at least wasn't modifiable historically, Clang may have niche intrinsics nowadays). This means that function "pointers" are actually indices into this table (as represented in these languages). The function pointer is frequently stored in memory since that's how the program stored it.
Why do they use the memory to store pointer, rather than the stack? Isn't this causing a lot of overhead (reading and writing from and to the memory)?
Why are there so many indirect calls in even the simplest Rust code?
This is more-or-less answered "the rust standard library". There's code that runs before
main
, so your function isn't literally the only one in the program, and this supporting code does indirect calls.
Yes, nice - i compiled main() { let a=123; }
to WebAssembly and received 76 call_indirects. Compiling the same example with #[no_std] #[no_main]
gives us zero call_indirects :smile:
Presumably function pointers are sometimes stored in on the stack. But also sometimes in memory, e.g.
fn bar(x: u32) -> u32 { x }
struct Foo {
ptr: fn(u32) -> u32,
}
let foo = Box::new(Foo { ptr: bar });
Rust function pointers are lowered to indexes into the wasm table with function referenced.
You can't directly store a funcref in the linear memory.
Note that the emulated stack is stored in the linear memory next to the heap and global variables.
right, but you can store the table index in linear memory, which is what the code above illustrates.
Indeed
and as long as you don't take the address of that index, you can put it on the actual wasm stack
(I assume)
That is correct.
Ok, thank you. And how is the table build? Are all functions of the module in this function table? Or is it rather, that the function table only contains a subset of functions, which are loaded via elem
segments into the table?
The wasm file contains the list of things to put in the table. I believe wasm-ld by default only puts functions that actually get turned into a function pointer there, not every function in existence.
Last updated: Dec 23 2024 at 12:05 UTC