Stream: git-cranelift

Topic: cranelift / Issue #1174 Tls support for ELF and MachO


view this post on Zulip GitHub (Feb 12 2020 at 20:41):

bjorn3 commented on Issue #1174:

Ping?

view this post on Zulip GitHub (Feb 12 2020 at 21:18):

abrown commented on Issue #1174:

@bjorn3 I'm fine with this once the conflicts are resolved and tests are added but that is because I'm not familiar enough with other ways of implementing TLS to have much comment. There's a Cranelift meeting on Monday mornings and there might be someone there that has more insight into the future Cranelift's threading support; I mention that because the whole threading thing might require some back-and-forth discussion.

view this post on Zulip GitHub (Feb 13 2020 at 19:57):

bjorn3 commented on Issue #1174:

I'm not familiar enough with other ways of implementing TLS to have much comment.

While there are other ways to implement TLS support, this matches the native TLS handling of ELF cq Mach-O. This way, it is possible to import/export tls variables from/to native objects. While this is not really necessary for cg_clif, as Rust makes a getter for the TLS var anyway and doesn't export the raw TLS var, for rcc it will likely be needed to interoperate with native code.

view this post on Zulip GitHub (Feb 14 2020 at 14:23):

bjorn3 commented on Issue #1174:

I rebased this and added tests for legalization of global_value and encoding of x86_{elf,macho}_tls_get_addr. Should I also add a test that cranelift-object correctly writes the tls section?

I mention that because the whole threading thing might require some back-and-forth discussion.

For cg_clif this is technically the only thing that prevents multithreading at the moment. I have made atomics atomic by using a global lock. While this is slow, at least it works. I would love to see atomic support in Cranelift, but for now TLS is more important for me.

view this post on Zulip GitHub (Feb 18 2020 at 13:46):

bjorn3 commented on Issue #1174:

@abrown Has this been discussed?

view this post on Zulip GitHub (Feb 18 2020 at 16:14):

abrown commented on Issue #1174:

I think the cranelift meeting yesterday was cancelled due to a US holiday (at least I didn't attend) but I posted in a zulip chat thread and I will bring it up next Monday.

view this post on Zulip GitHub (Feb 18 2020 at 18:56):

bjorn3 commented on Issue #1174:

Ok, thanks!

view this post on Zulip GitHub (Feb 24 2020 at 19:56):

bjorn3 commented on Issue #1174:

I decided to look a bit more at the __tls_get_addr ABI and noticed that I made a wrong assumption. In one place, it is said that the full code sequence needs to be 16 bytes (used to for example relax GD to other TLS models, https://github.com/llvm/llvm-project/blob/deb5819d6249cfe110da9377910f7e5c88e8ee09/lld/ELF/Arch/X86_64.cpp#L186), suggesting a custom ABI, but the glibc implementation seems to require the C abi: https://github.com/bminor/glibc/blob/5f72f9800b250410cad3abfeeb09469ef12b2438/elf/dl-tls.c#L821. When looking at LLVM, this seems to indeed be the case: https://github.com/llvm/llvm-project/blob/78be61871704a451a5d9462d7e96ed6c982746d4/llvm/lib/Target/X86/X86InstrCompiler.td#L453-L471. The current code works for cg_clif, but that may be caused by the fact that rust creates a very small function just to get the tls addr, so none of the clobbered registers are used after the global_value instruction.

Clobbering all necessary registers will mean that I have to add even more clobber registers. Another option would be to use the existing infrastructure for the call and call_indirect instruction.

view this post on Zulip GitHub (Feb 24 2020 at 20:10):

bjorn3 edited a comment on Issue #1174:

I decided to look a bit more at the __tls_get_addr ABI and noticed that I made a wrong assumption. In one place, it is said that the full code sequence needs to be 16 bytes (used to for example relax GD to other TLS models, https://github.com/llvm/llvm-project/blob/deb5819d6249cfe110da9377910f7e5c88e8ee09/lld/ELF/Arch/X86_64.cpp#L186), suggesting a custom ABI, but the glibc implementation seems to require the C abi: https://github.com/bminor/glibc/blob/5f72f9800b250410cad3abfeeb09469ef12b2438/elf/dl-tls.c#L821. When looking at LLVM, this seems to indeed be the case: https://github.com/llvm/llvm-project/blob/78be61871704a451a5d9462d7e96ed6c982746d4/llvm/lib/Target/X86/X86InstrCompiler.td#L453-L471. The current code works for cg_clif, but that may be caused by the fact that rust creates a very small function just to get the tls addr, so none of the clobbered registers are used after the global_value instruction.

Clobbering all necessary registers will mean that I have to add even more clobber registers. <strike>Another option would be to use the existing infrastructure for the call and call_indirect instruction.</strike> (Edit: existing infrastructure only works for actual call and call_indirect.)

view this post on Zulip GitHub (Feb 25 2020 at 19:37):

bjorn3 commented on Issue #1174:

I tried to add more output registers for the clobbering, but the maximum amount of outputs is 8, which is way less than the amount of registers than are clobbered.

I also tried to use the same spill logic as used by normal calls, but I can't find where it is determined which registers are caller-saved and should spilled to the stack.

view this post on Zulip GitHub (Feb 25 2020 at 19:46):

iximeow commented on Issue #1174:

I also tried to use the same spill logic as used by normal calls, but I can't find where it is determined which registers are caller-saved and should spilled to the stack.

I think you might not have been able to find it because all register values that live through a call are spilled - callee-save/caller-save ABI guarantees are disregarded at the moment. There's a note in spilling.rs for this. This if could be extended to spill around clobber-happy TLS operations.

view this post on Zulip GitHub (Feb 25 2020 at 20:21):

bjorn3 commented on Issue #1174:

Thanks @iximeow! It now correctly spills all caller-saved registers.

view this post on Zulip GitHub (Feb 26 2020 at 07:19):

bjorn3 commented on Issue #1174:

:tada:

view this post on Zulip GitHub (Mar 05 2020 at 12:30):

stefson commented on Issue #1174:

@bjorn3

I'm getting a compile error with building for aarch64-unknown-linux-gnu on current master, and found this pullrequest as the culprit when bisecting the issue. Here's the error:

   Compiling cranelift-codegen-meta v0.59.0 (/tmp/wasmtime/cranelift/codegen/meta)
   Compiling cranelift-bforest v0.59.0 (/tmp/wasmtime/cranelift/bforest)
   Compiling wasi-common v0.12.0 (/tmp/wasmtime/crates/wasi-common)
   Compiling cranelift-codegen v0.59.0 (/tmp/wasmtime/cranelift/codegen)
warning: unused import: `self::call::expand_call`
  --> cranelift/codegen/src/legalizer/mod.rs:35:5
   |
35 | use self::call::expand_call;
   |     ^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

warning: unused import: `self::globalvalue::expand_global_value`
  --> cranelift/codegen/src/legalizer/mod.rs:36:5
   |
36 | use self::globalvalue::expand_global_value;
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: unused import: `self::heap::expand_heap_addr`
  --> cranelift/codegen/src/legalizer/mod.rs:37:5
   |
37 | use self::heap::expand_heap_addr;
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: unused import: `self::table::expand_table_addr`
  --> cranelift/codegen/src/legalizer/mod.rs:39:5
   |
39 | use self::table::expand_table_addr;
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0599]: no variant or associated item named `X86ElfTlsGetAddr` found for type `ir::instructions::Opcode` in the current scope
    --> cranelift/codegen/src/regalloc/spilling.rs:272:45
     |
272  |             || opcode == crate::ir::Opcode::X86ElfTlsGetAddr
     |                                             ^^^^^^^^^^^^^^^^ variant or associated item not found in `ir::instructions::Opcode`
     |
    ::: /tmp/wasmtime/target/aarch64-unknown-linux-gnu/debug/build/cranelift-codegen-dd375821e82eb469/out/opcodes.rs:1427:1
     |
1427 | pub enum Opcode {
     | --------------- variant or associated item `X86ElfTlsGetAddr` not found here

error[E0599]: no variant or associated item named `X86MachoTlsGetAddr` found for type `ir::instructions::Opcode` in the current scope
    --> cranelift/codegen/src/regalloc/spilling.rs:273:45
     |
273  |             || opcode == crate::ir::Opcode::X86MachoTlsGetAddr
     |                                             ^^^^^^^^^^^^^^^^^^ variant or associated item not found in `ir::instructions::Opcode`
     |
    ::: /tmp/wasmtime/target/aarch64-unknown-linux-gnu/debug/build/cranelift-codegen-dd375821e82eb469/out/opcodes.rs:1427:1
     |
1427 | pub enum Opcode {
     | --------------- variant or associated item `X86MachoTlsGetAddr` not found here

error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0599`.
error: could not compile `cranelift-codegen`.

It's propably worth noting that this only happens when executing cargo build from the wasmtime dir, and it doesn't when cd wasmtime/cranelift & cargo build. Still the error seems to be from cranelift.

P.S: the regression is caused by https://github.com/bytecodealliance/wasmtime/commit/0a1bb3ba6ccadd4b716cd259a097bb31435c0c6d , I hope this thread is the correct one to note the author of that regression.

view this post on Zulip GitHub (Mar 05 2020 at 13:02):

bjorn3 commented on Issue #1174:

The problem is probably that Cranelift got built without x86 support, but I added a special case to regalloc with an x86 only instruction without checking for x86 support. I think the solution is to add an instruction attribute, just like is_call and is_branch, for this special case instead.

view this post on Zulip GitHub (Mar 05 2020 at 19:38):

abrown commented on Issue #1174:

@bjorn3, can you create an issue in wasmtime so we can track this there?

view this post on Zulip GitHub (Mar 05 2020 at 19:38):

abrown edited a comment on Issue #1174:

@bjorn3 (or @stefson), can you create an issue in wasmtime so we can track this there?


Last updated: Nov 22 2024 at 16:03 UTC