bjorn3 commented on Issue #1174:
Ping?
abrown commented on Issue #1174:
@bjorn3 I'm fine with this once the conflicts are resolved and tests are added but that is because I'm not familiar enough with other ways of implementing TLS to have much comment. There's a Cranelift meeting on Monday mornings and there might be someone there that has more insight into the future Cranelift's threading support; I mention that because the whole threading thing might require some back-and-forth discussion.
bjorn3 commented on Issue #1174:
I'm not familiar enough with other ways of implementing TLS to have much comment.
While there are other ways to implement TLS support, this matches the native TLS handling of ELF cq Mach-O. This way, it is possible to import/export tls variables from/to native objects. While this is not really necessary for cg_clif, as Rust makes a getter for the TLS var anyway and doesn't export the raw TLS var, for rcc it will likely be needed to interoperate with native code.
bjorn3 commented on Issue #1174:
I rebased this and added tests for legalization of
global_value
and encoding ofx86_{elf,macho}_tls_get_addr
. Should I also add a test thatcranelift-object
correctly writes the tls section?I mention that because the whole threading thing might require some back-and-forth discussion.
For cg_clif this is technically the only thing that prevents multithreading at the moment. I have made atomics atomic by using a global lock. While this is slow, at least it works. I would love to see atomic support in Cranelift, but for now TLS is more important for me.
bjorn3 commented on Issue #1174:
@abrown Has this been discussed?
abrown commented on Issue #1174:
I think the cranelift meeting yesterday was cancelled due to a US holiday (at least I didn't attend) but I posted in a zulip chat thread and I will bring it up next Monday.
bjorn3 commented on Issue #1174:
Ok, thanks!
bjorn3 commented on Issue #1174:
I decided to look a bit more at the
__tls_get_addr
ABI and noticed that I made a wrong assumption. In one place, it is said that the full code sequence needs to be 16 bytes (used to for example relax GD to other TLS models, https://github.com/llvm/llvm-project/blob/deb5819d6249cfe110da9377910f7e5c88e8ee09/lld/ELF/Arch/X86_64.cpp#L186), suggesting a custom ABI, but the glibc implementation seems to require the C abi: https://github.com/bminor/glibc/blob/5f72f9800b250410cad3abfeeb09469ef12b2438/elf/dl-tls.c#L821. When looking at LLVM, this seems to indeed be the case: https://github.com/llvm/llvm-project/blob/78be61871704a451a5d9462d7e96ed6c982746d4/llvm/lib/Target/X86/X86InstrCompiler.td#L453-L471. The current code works for cg_clif, but that may be caused by the fact that rust creates a very small function just to get the tls addr, so none of the clobbered registers are used after theglobal_value
instruction.Clobbering all necessary registers will mean that I have to add even more clobber registers. Another option would be to use the existing infrastructure for the
call
andcall_indirect
instruction.
bjorn3 edited a comment on Issue #1174:
I decided to look a bit more at the
__tls_get_addr
ABI and noticed that I made a wrong assumption. In one place, it is said that the full code sequence needs to be 16 bytes (used to for example relax GD to other TLS models, https://github.com/llvm/llvm-project/blob/deb5819d6249cfe110da9377910f7e5c88e8ee09/lld/ELF/Arch/X86_64.cpp#L186), suggesting a custom ABI, but the glibc implementation seems to require the C abi: https://github.com/bminor/glibc/blob/5f72f9800b250410cad3abfeeb09469ef12b2438/elf/dl-tls.c#L821. When looking at LLVM, this seems to indeed be the case: https://github.com/llvm/llvm-project/blob/78be61871704a451a5d9462d7e96ed6c982746d4/llvm/lib/Target/X86/X86InstrCompiler.td#L453-L471. The current code works for cg_clif, but that may be caused by the fact that rust creates a very small function just to get the tls addr, so none of the clobbered registers are used after theglobal_value
instruction.Clobbering all necessary registers will mean that I have to add even more clobber registers. <strike>Another option would be to use the existing infrastructure for the
call
andcall_indirect
instruction.</strike> (Edit: existing infrastructure only works for actualcall
andcall_indirect
.)
bjorn3 commented on Issue #1174:
I tried to add more output registers for the clobbering, but the maximum amount of outputs is 8, which is way less than the amount of registers than are clobbered.
I also tried to use the same spill logic as used by normal calls, but I can't find where it is determined which registers are caller-saved and should spilled to the stack.
iximeow commented on Issue #1174:
I also tried to use the same spill logic as used by normal calls, but I can't find where it is determined which registers are caller-saved and should spilled to the stack.
I think you might not have been able to find it because all register values that live through a call are spilled - callee-save/caller-save ABI guarantees are disregarded at the moment. There's a note in
spilling.rs
for this. Thisif
could be extended to spill around clobber-happy TLS operations.
bjorn3 commented on Issue #1174:
Thanks @iximeow! It now correctly spills all caller-saved registers.
bjorn3 commented on Issue #1174:
:tada:
stefson commented on Issue #1174:
@bjorn3
I'm getting a compile error with building for aarch64-unknown-linux-gnu on current master, and found this pullrequest as the culprit when bisecting the issue. Here's the error:
Compiling cranelift-codegen-meta v0.59.0 (/tmp/wasmtime/cranelift/codegen/meta) Compiling cranelift-bforest v0.59.0 (/tmp/wasmtime/cranelift/bforest) Compiling wasi-common v0.12.0 (/tmp/wasmtime/crates/wasi-common) Compiling cranelift-codegen v0.59.0 (/tmp/wasmtime/cranelift/codegen) warning: unused import: `self::call::expand_call` --> cranelift/codegen/src/legalizer/mod.rs:35:5 | 35 | use self::call::expand_call; | ^^^^^^^^^^^^^^^^^^^^^^^ | = note: `#[warn(unused_imports)]` on by default warning: unused import: `self::globalvalue::expand_global_value` --> cranelift/codegen/src/legalizer/mod.rs:36:5 | 36 | use self::globalvalue::expand_global_value; | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ warning: unused import: `self::heap::expand_heap_addr` --> cranelift/codegen/src/legalizer/mod.rs:37:5 | 37 | use self::heap::expand_heap_addr; | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ warning: unused import: `self::table::expand_table_addr` --> cranelift/codegen/src/legalizer/mod.rs:39:5 | 39 | use self::table::expand_table_addr; | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ error[E0599]: no variant or associated item named `X86ElfTlsGetAddr` found for type `ir::instructions::Opcode` in the current scope --> cranelift/codegen/src/regalloc/spilling.rs:272:45 | 272 | || opcode == crate::ir::Opcode::X86ElfTlsGetAddr | ^^^^^^^^^^^^^^^^ variant or associated item not found in `ir::instructions::Opcode` | ::: /tmp/wasmtime/target/aarch64-unknown-linux-gnu/debug/build/cranelift-codegen-dd375821e82eb469/out/opcodes.rs:1427:1 | 1427 | pub enum Opcode { | --------------- variant or associated item `X86ElfTlsGetAddr` not found here error[E0599]: no variant or associated item named `X86MachoTlsGetAddr` found for type `ir::instructions::Opcode` in the current scope --> cranelift/codegen/src/regalloc/spilling.rs:273:45 | 273 | || opcode == crate::ir::Opcode::X86MachoTlsGetAddr | ^^^^^^^^^^^^^^^^^^ variant or associated item not found in `ir::instructions::Opcode` | ::: /tmp/wasmtime/target/aarch64-unknown-linux-gnu/debug/build/cranelift-codegen-dd375821e82eb469/out/opcodes.rs:1427:1 | 1427 | pub enum Opcode { | --------------- variant or associated item `X86MachoTlsGetAddr` not found here error: aborting due to 2 previous errors For more information about this error, try `rustc --explain E0599`. error: could not compile `cranelift-codegen`.It's propably worth noting that this only happens when executing
cargo build
from the wasmtime dir, and it doesn't whencd wasmtime/cranelift & cargo build
. Still the error seems to be from cranelift.P.S: the regression is caused by https://github.com/bytecodealliance/wasmtime/commit/0a1bb3ba6ccadd4b716cd259a097bb31435c0c6d , I hope this thread is the correct one to note the author of that regression.
bjorn3 commented on Issue #1174:
The problem is probably that Cranelift got built without x86 support, but I added a special case to regalloc with an x86 only instruction without checking for x86 support. I think the solution is to add an instruction attribute, just like
is_call
andis_branch
, for this special case instead.
abrown commented on Issue #1174:
@bjorn3, can you create an issue in wasmtime so we can track this there?
abrown edited a comment on Issue #1174:
@bjorn3 (or @stefson), can you create an issue in wasmtime so we can track this there?
Last updated: Jan 24 2025 at 00:11 UTC