1. 23 8月, 2023 1 次提交
  2. 01 6月, 2023 1 次提交
  3. 22 5月, 2023 1 次提交
  4. 08 5月, 2023 1 次提交
  5. 23 2月, 2023 1 次提交
  6. 02 11月, 2022 1 次提交
  7. 27 7月, 2022 1 次提交
  8. 20 6月, 2022 1 次提交
  9. 17 6月, 2022 1 次提交
  10. 28 4月, 2022 1 次提交
  11. 06 1月, 2022 2 次提交
  12. 07 7月, 2021 1 次提交
  13. 11 6月, 2021 1 次提交
  14. 11 3月, 2021 1 次提交
  15. 20 11月, 2020 1 次提交
    • R
      fix regression in pthread_exit · debbddf7
      Rich Felker 提交于
      commit d26e0774 moved the detach state
      transition at exit before the thread list lock was taken. this
      inadvertently allowed pthread_join to race to take the thread list
      lock first, and proceed with unmapping of the exiting thread's memory.
      
      we could fix this by just revering the offending commit and instead
      performing __vm_wait unconditionally before taking the thread list
      lock, but that may be costly. instead, bring back the old DT_EXITING
      vs DT_EXITED state distinction that was removed in commit
      8f11e612, and don't transition to
      DT_EXITED (a value of 0, which is what pthread_join waits for) until
      after the lock has been taken.
      debbddf7
  16. 15 10月, 2020 3 次提交
  17. 29 9月, 2020 1 次提交
    • R
      fix fork of processes with active async io contexts · 34904d83
      Rich Felker 提交于
      previously, if a file descriptor had aio operations pending in the
      parent before fork, attempting to close it in the child would attempt
      to cancel a thread belonging to the parent. this could deadlock, fail,
      or crash the whole process of the cancellation signal handler was not
      yet installed in the parent. in addition, further use of aio from the
      child could malfunction or deadlock.
      
      POSIX specifies that async io operations are not inherited by the
      child on fork, so clear the entire aio fd map in the child, and take
      the aio map lock (with signals blocked) across the fork so that the
      lock is kept in a consistent state.
      34904d83
  18. 09 9月, 2020 1 次提交
  19. 28 8月, 2020 2 次提交
    • R
      remove redundant pthread struct members repeated for layout purposes · 57f6e85c
      Rich Felker 提交于
      dtv_copy, canary2, and canary_at_end existed solely to match multiple
      ABI and asm-accessed layouts simultaneously. now that pthread_arch.h
      can be included before struct __pthread is defined, the struct layout
      can depend on macros defined by pthread_arch.h.
      57f6e85c
    • R
      deduplicate __pthread_self thread pointer adjustment out of each arch · 3a5b9ae7
      Rich Felker 提交于
      the adjustment made is entirely a function of TLS_ABOVE_TP and
      TP_OFFSET. aside from avoiding repetition of the TP_OFFSET value and
      arithmetic, this change makes pthread_arch.h independent of the
      definition of struct __pthread from pthread_impl.h. this in turn will
      allow inclusion of pthread_arch.h to be moved to the top of
      pthread_impl.h so that it can influence the definition of the
      structure.
      
      previously, arch files were very inconsistent about the type used for
      the thread pointer. this change unifies the new __get_tp interface to
      always use uintptr_t, which is the most correct when performing
      arithmetic that may involve addresses outside the actual pointed-to
      object (due to TP_OFFSET).
      3a5b9ae7
  20. 25 8月, 2020 2 次提交
    • R
      deduplicate TP_ADJ logic out of each arch, replace with TP_OFFSET · ea71a900
      Rich Felker 提交于
      the only part of TP_ADJ that was not uniquely determined by
      TLS_ABOVE_TP was the 0x7000 adjustment used mainly on mips and powerpc
      variants.
      ea71a900
    • R
      make h_errno thread-local · 9d0b8b92
      Rich Felker 提交于
      the framework to do this always existed but it was deemed unnecessary
      because the only [ex-]standard functions using h_errno were not
      thread-safe anyway. however, some of the nonstandard res_* functions
      are also supposed to set h_errno to indicate the cause of error, and
      were unable to do so because it was not thread-safe. this change is a
      prerequisite for fixing them.
      9d0b8b92
  21. 17 8月, 2020 1 次提交
  22. 30 9月, 2019 1 次提交
    • S
      remove remaining traces of __tls_get_new · 33bc7f05
      Szabolcs Nagy 提交于
      Some declarations of __tls_get_new were left in the code, even
      though the definition got removed in
      
        commit 9d44b646
        install dynamic tls synchronously at dlopen, streamline access
      
      this can make the build fail with
      
        ld: lib/libc.so: hidden symbol `__tls_get_new' isn't defined
      
      when libc.so is linked without --gc-sections, because a .hidden
      declaration in asm code creates a reference even if the symbol
      is not actually used.
      33bc7f05
  23. 22 2月, 2019 1 次提交
    • R
      add membarrier syscall wrapper, refactor dynamic tls install to use it · ba18c1ec
      Rich Felker 提交于
      the motivation for this change is twofold. first, it gets the fallback
      logic out of the dynamic linker, improving code readability and
      organization. second, it provides application code that wants to use
      the membarrier syscall, which depends on preregistration of intent
      before the process becomes multithreaded unless unbounded latency is
      acceptable, with a symbol that, when linked, ensures that this
      registration happens.
      ba18c1ec
  24. 19 2月, 2019 1 次提交
    • R
      install dynamic tls synchronously at dlopen, streamline access · 9d44b646
      Rich Felker 提交于
      previously, dynamic loading of new libraries with thread-local storage
      allocated the storage needed for all existing threads at load-time,
      precluding late failure that can't be handled, but left installation
      in existing threads to take place lazily on first access. this imposed
      an additional memory access and branch on every dynamic tls access,
      and imposed a requirement, which was not actually met, that the
      dynamic tlsdesc asm functions preserve all call-clobbered registers
      before calling C code to to install new dynamic tls on first access.
      the x86[_64] versions of this code wrongly omitted saving and
      restoring of fpu/vector registers, assuming the compiler would not
      generate anything using them in the called C code. the arm and aarch64
      versions saved known existing registers, but failed to be future-proof
      against expansion of the register file.
      
      now that we track live threads in a list, it's possible to install the
      new dynamic tls for each thread at dlopen time. for the most part,
      synchronization is not needed, because if a thread has not
      synchronized with completion of the dlopen, there is no way it can
      meaningfully request access to a slot past the end of the old dtv,
      which remains valid for accessing slots which already existed.
      however, it is necessary to ensure that, if a thread sees its new dtv
      pointer, it sees correct pointers in each of the slots that existed
      prior to the dlopen. my understanding is that, on most real-world
      coherency architectures including all the ones we presently support, a
      built-in consume order guarantees this; however, don't rely on that.
      instead, the SYS_membarrier syscall is used to ensure that all threads
      see the stores to the slots of their new dtv prior to the installation
      of the new dtv. if it is not supported, the same is implemented in
      userspace via signals, using the same mechanism as __synccall.
      
      the __tls_get_addr function, variants, and dynamic tlsdesc asm
      functions are all updated to remove the fallback paths for claiming
      new dynamic tls, and are now all branch-free.
      9d44b646
  25. 16 2月, 2019 3 次提交
    • R
      rewrite __synccall in terms of global thread list · e4235d70
      Rich Felker 提交于
      the __synccall mechanism provides stop-the-world synchronous execution
      of a callback in all threads of the process. it is used to implement
      multi-threaded setuid/setgid operations, since Linux lacks them at the
      kernel level, and for some other less-critical purposes.
      
      this change eliminates dependency on /proc/self/task to determine the
      set of live threads, which in addition to being an unwanted dependency
      and a potential point of resource-exhaustion failure, turned out to be
      inaccurate. test cases provided by Alexey Izbyshev showed that it
      could fail to reflect newly created threads. due to how the
      presignaling phase worked, this usually yielded a deadlock if hit, but
      in the worst case it could also result in threads being silently
      missed (allowed to continue running without executing the callback).
      e4235d70
    • R
      track all live threads in an AS-safe, fully-consistent linked list · 8f11e612
      Rich Felker 提交于
      the hard problem here is unlinking threads from a list when they exit
      without creating a window of inconsistency where the kernel task for a
      thread still exists and is still executing instructions in userspace,
      but is not reflected in the list. the magic solution here is getting
      rid of per-thread exit futex addresses (set_tid_address), and instead
      using the exit futex to unlock the global thread list.
      
      since pthread_join can no longer see the thread enter a detach_state
      of EXITED (which depended on the exit futex address pointing to the
      detach_state), it must now observe the unlocking of the thread list
      lock before it can unmap the joined thread and return. it doesn't
      actually have to take the lock. for this, a __tl_sync primitive is
      offered, with a signature that will allow it to be enhanced for quick
      return even under contention on the lock, if needed. for now, the
      exiting thread always performs a futex wake on its detach_state. a
      future change could optimize this out except when there is already a
      joiner waiting.
      
      initial/dynamic variants of detached state no longer need to be
      tracked separately, since the futex address is always set to the
      global list lock, not a thread-local address that could become invalid
      on detached thread exit. all detached threads, however, must perform a
      second sigprocmask syscall to block implementation-internal signals,
      since locking the thread list with them already blocked is not
      permissible.
      
      the arch-independent C version of __unmapself no longer needs to take
      a lock or setup its own futex address to release the lock, since it
      must necessarily be called with the thread list lock already held,
      guaranteeing exclusive access to the temporary stack.
      
      changes to libc.threads_minus_1 no longer need to be atomic, since
      they are guarded by the thread list lock. it is largely vestigial at
      this point, and can be replaced with a cheaper boolean indicating
      whether the process is multithreaded at some point in the future.
      8f11e612
    • R
      always block signals for starting new threads, refactor start args · 04335d92
      Rich Felker 提交于
      whether signals need to be blocked at thread start, and whether
      unblocking is necessary in the entry point function, has historically
      depended on intricacies of the cancellation design and on whether
      there are scheduling operations to perform on the new thread before
      its successful creation can be committed. future changes to track an
      AS-safe list of live threads will require signals to be blocked
      whenever changes are made to the list, so ...
      
      prior to commits b8742f32 and
      40bae2d3, a signal mask for the entry
      function to restore was part of the pthread structure. it was removed
      to trim down the size of the structure, which both saved a small
      amount of stack space and improved code generation on archs where
      small immediate displacements are less costly than arbitrary ones, by
      limiting the range of offsets between the base of the thread
      structure, its members, and the thread pointer. these commits moved
      the saved mask to a special structure used only when special
      scheduling was needed, in which case the pthread_create caller and new
      thread had to synchronize with each other and could use this memory to
      pass a mask.
      
      this commit partially reverts the above two commits, but instead of
      putting the mask back in the pthread structure, it moves all "start
      argument" members out of the pthread structure, trimming it down
      further, and puts them in a separate structure passed on the new
      thread's stack. the code path for explicit scheduling of the new
      thread is also changed to synchronize with the calling thread in such
      a way to avoid spurious futex wakes.
      04335d92
  26. 19 12月, 2018 1 次提交
    • R
      add __timedwait backend workaround for old kernels where futex EINTRs · a63c0104
      Rich Felker 提交于
      prior to linux 2.6.22, futex wait could fail with EINTR even for
      non-interrupting (SA_RESTART) signals. this was no problem provided
      the caller simply restarted the wait, but sem_[timed]wait is required
      by POSIX to return when interrupted by a signal. commit
      a113434c introduced this behavior, and
      commit c0ed5a20 reverted it based on a
      mistaken belief that it was not required. this belief stems from a bug
      in the specification: the description requires the function to return
      when interrupted, but the errors section marks EINTR as a "may fail"
      condition rather than a "shall fail" one.
      
      since there does seem to be significant value in the change made in
      commit c0ed5a20, making it so that
      programs that call sem_wait without checking for EINTR don't silently
      make forward progress without obtaining the semaphore or treat it as a
      fatal error and abort, add a behind-the-scenes mechanism in the
      __timedwait backend to suppress EINTR in programs that have never
      installed interrupting signal handlers, and have sigaction track and
      report this state. this way the semaphore code is not cluttered by
      workarounds and can be updated (to be done in next commit) to reflect
      the high-level logic for conforming behavior.
      
      these changes are based loosely on a patch by Markus Wichmann, with
      the main changes being atomic update to flag object and moving the
      workaround from sem_timedwait to the __timedwait futex backend.
      a63c0104
  27. 12 10月, 2018 1 次提交
    • R
      combine arch ABI's DTP_OFFSET into DTV pointers · b6d701a4
      Rich Felker 提交于
      as explained in commit 6ba5517a, some
      archs use an offset (typicaly -0x8000) with their DTPOFF relocations,
      which __tls_get_addr needs to invert. on affected archs, which lack
      direct support for large immediates, this can cost multiple extra
      instructions in the hot path. instead, incorporate the DTP_OFFSET into
      the DTV entries. this means they are no longer valid pointers, so
      store them as an array of uintptr_t rather than void *; this also
      makes it easier to access slot 0 as a valid slot count.
      
      commit e75b16cf left behind cruft in
      two places, __reset_tls and __tls_get_new, from back when it was
      possible to have uninitialized gap slots indicated by a null pointer
      in the DTV. since the concept of null pointer is no longer meaningful
      with an offset applied, remove this cruft.
      
      presently there are no archs with both TLSDESC and nonzero DTP_OFFSET,
      but the dynamic TLSDESC relocation code is also updated to apply an
      inverted offset to its offset field, so that the offset DTV would not
      impose a runtime cost in TLSDESC resolver functions.
      b6d701a4
  28. 19 9月, 2018 2 次提交
    • R
      increase default thread stack/guard size · c0058ab4
      Rich Felker 提交于
      stack size default is increased from 80k to 128k. this coincides with
      Linux's hard-coded default stack for the main thread (128k is
      initially committed; growth beyond that up to ulimit is contingent on
      additional allocation succeeding) and GNU ld's default PT_GNU_STACK
      size for FDPIC, at least on sh.
      
      guard size default is increased from 4k to 8k to reduce the risk of
      guard page jumping on overflow, since use of just over 4k of stack is
      common (PATH_MAX buffers, etc.).
      c0058ab4
    • R
      limit the configurable default stack/guard size for threads · 792f3277
      Rich Felker 提交于
      limit to 8MB/1MB, repectively. since the defaults cannot be reduced
      once increased, excessively large settings would lead to an
      unrecoverably broken state. this change is in preparation to allow
      defaults to be increased via program headers at the linker level.
      
      creation of threads that really need larger sizes needs to be done
      with an explicit attribute.
      792f3277
  29. 18 9月, 2018 1 次提交
    • R
      fix deletion of pthread tsd keys that still have non-null values stored · 84d061d5
      Rich Felker 提交于
      per POSIX, deletion of a key for which some threads still have values
      stored is permitted, and newly created keys must initially hold the
      null value in all threads. these properties were not met by our
      implementation; if a key was deleted with values left and a new key
      was created in the same slot, the old values were still visible.
      
      moreover, due to lack of any synchronization in pthread_key_delete,
      there was a TOCTOU race whereby a concurrent pthread_exit could
      attempt to call a null destructor pointer for the newly orphaned
      value.
      
      this commit introduces a solution based on __synccall, stopping the
      world to zero out the values for deleted keys, but only does so lazily
      when all key slots have been exhausted. pthread_key_delete is split
      off into a separate translation unit so that static-linked programs
      which only create keys but never delete them will not pull in the
      __synccall machinery.
      
      a global rwlock is added to synchronize creation and deletion of keys
      with dtor execution. since the dtor execution loop now has to release
      and retake the lock around its call to each dtor, checks are made not
      to call the nodtor dummy function for keys which lack a dtor.
      84d061d5
  30. 13 9月, 2018 3 次提交