@fitzgen (he/him) hm I'm not entirely sure what you mean by RefCell and deferred destruction?
TableElements
has a RefCell<Vec<VMExternRef>>
and deferred destruction would be just saving all the ref_count == 0
references in a side table, and running their dtors at a known-safe point in time
I like your (more obvious) idea of just swapping it out of the table first. don't know why I didn't think of that >.<
although it is one more basic block. I don't think thats a big deal tho
ah ok
to be clear, this is what I'm thinking -- https://gist.github.com/alexcrichton/fe9a5aff25cdcf943234f433babbb55e
using your pseudocode
if that makes sense
so we shouldn't need deferred destruction or need to worry about ref cells and such
ah nice, yeah that doesn't need a new basic block either
also, out of curiosity, does cranelift still do the ebb thing?
where you can have multiple exits from a block?
not anymore
b/c that could simplify the blocks in a few places I think
ah ok, that makes more sense
fwiw, I was adding fallthrough
s manually, instead of jump
s, but
jump
s to the next block into a fallthrough
, and it works wellso I think we don't need to worry much about "lots of little blocks with many jumps between them" as long as we insert them into the layout in the right order
makes sense
I should learn more about what optimizations cranelift does
@fitzgen (he/him) so the frob-the-refcount, does that turn into one add
instruction on x86 with addressing modes?
is that an optimization that cranelift does?
it should be, but I'm not sure it does
ah ok, wasn't sure if we needed to represent that in IR or not
I should dig in at some point in the future, just not now lol
@fitzgen (he/him) so here's another question, I don't know how stackmaps work, so I'm unsure. So the gist I sent, what happens if the dtor calls Store::gc
?
we loaded the pointer to drop from the table
which presumably has the r64 type
but we're dropping it so it shouldn't be traced
even though it's "live" as a function argument
because it's a function argument does that mean it's not included in the stack map?
with the modified barrier pseudocode?
yeah
so like if cranelift sees function_call(some_r64_type)
does the stack map at that function call include some_r64_type
?
it would, yes
hm that may also be problematic
this may mean we have to do deffered destruction
and means we'll need to cast to i64 to call the drop intrinsic I think?
or could we load the pointer as i64 instead?
perhaps
this may also be an issue for table.get where we're manually frobbing the reference count
not sure I follow
I sort of forget how the reference counts work out during gc
er wait
nevermind
I'm worried about call activations_table_insert_with_gc(elem)
but that's guaranteed to not call user code
so we're ok there, it's just call drop_externref_in_place(current_elem)
I'm worried about in terms of stack maps
er, no, scratch that too, activations_table_insert_with_gc
will do a GC
it's at least something to consider, what happens when we gc during that call and we insert it from the stack plus manually
and same thing for the drop, what happens if we gc while dropping, making sure we don't accidentally try to keep it in the activations table
we should be fine with activations_table_insert_with_gc
, since both the reference will be in the stack maps, and the first thing we do is in the impl is VMExternRef::clone_from_raw
but yeah, asking these questions is great, and has already found a few bugs (thanks!)
the wonders of code review :P
Alex Crichton said:
fitzgen (he/him) so the frob-the-refcount, does that turn into one
add
instruction on x86 with addressing modes?
on x86 we would want to turn this into an xadd
right?
I don't see any mention of xadd
in the codebase
xadd
I thought was atomic_add
and I don't remember it from when I ported simple_preopt
to peepmatic
its atomic if it is prefixed with lock
I'm thinking of addl $1, (%rdi)
hmm....
I think you're right about activations_table_insert_with_gc
as well, that'll clone_from_raw
the pointer it reads from the stack and shove it in a set. The intrinsic also clone_from_raw
s, however, and since the ownership is in the original table anyway I think that all works out just fine
<removed>
wait I forgot to enable opts
$ cat ~/scratch/add.clif; cargo run --manifest-path cranelift/Cargo.toml -- compile --target x86_64 -dp ~/scratch/add.clif -D --set opt_level=speed
function %add(i64) {
block0(v0: i64):
v1 = load.i32 aligned notrap v0
v2 = iadd_imm v1, 1
store aligned notrap v2, v1
return
}
Finished dev [unoptimized + debuginfo] target(s) in 0.08s
Running `target/debug/clif-util compile --target x86_64 -dp /home/fitzgen/scratch/add.clif -D --set opt_level=speed`
function %add(i64 [%rdi], i64 fp [%rbp]) -> i64 fp [%rbp] fast {
ss0 = incoming_arg 16, offset -16
block0(v0: i64 [%rdi], v4: i64 [%rbp]):
[RexOp1pushq#50] x86_push v4
[RexOp1copysp#8089] copy_special %rsp -> %rbp
[RexOp1ld#8b,%rax] v1 = load.i32 notrap aligned v0
[DynRexOp1umr#89,%rcx] v3 = copy v1
[DynRexOp1r_ib#83,%rcx] v2 = iadd_imm v3, 1
[RexOp1st#89] store notrap aligned v2, v1
[RexOp1popq#58,%rbp] v5 = x86_pop.i64
[Op1ret#c3] return v5
}
.byte 64, 85, 72, 137, 229, 64, 139, 7, 137, 193, 131, 193, 1, 64, 137, 8, 64, 93, 195
Disassembly of 19 bytes:
0: 40 55 push rbp
2: 48 89 e5 mov rbp, rsp
5: 40 8b 07 mov eax, dword ptr [rdi]
8: 89 c1 mov ecx, eax
a: 83 c1 01 add ecx, 1
d: 40 89 08 mov dword ptr [rax], ecx
10: 40 5d pop rbp
12: c3 ret
trap: stk_ovf at 0
@Alex Crichton looks like cranelift can't do it
aww :(
it'd be neat if we could easily look at the asm for wasm modules too at some point
filed https://github.com/bytecodealliance/wasmtime/issues/1925
Alex Crichton said:
it'd be neat if we could easily look at the asm for wasm modules too at some point
Will wasmtime wasm2obj ..
+ objdump -d ...
work?
@Yury Delendik I haven't used that too too much but it probably works just fine, that's a good poitn!
I'm trying to move compiler to emit ELF image directly, that's will improve wasm2obj further. (I'm using objdump regularly)
oh nice
FWIW, RUST_LOG=debug
dumps a bunch of cranelift ir too, but not the final disassembly
@fitzgen (he/him) I think the fuzz corpus may need to be updated for the new tables ops fuzz test? -- https://oss-fuzz-build-logs.storage.googleapis.com/log-5a0b58fe-4a35-46d5-b4d2-221905e7d208.txt
we should fix the build to not require a corpus ahead of time, and just make an empty one if necessary; I'll make a PR
https://github.com/google/oss-fuzz/pull/4065
Last updated: Jan 24 2025 at 00:11 UTC