Print

Print


We've recently observed a few cases where an xrootd server stops serving clients and enters into a deadlock (or possibly livelock) state. This is with xrootd 4.6 and 4.7, and a specific CASTOR plugin for metadata processing (data access goes via pure xrootd on local disk).

A typical signature in gdb looks like:

(gdb) info threads
  Id   Target Id         Frame
  430  Thread 0x7fa16d85d740 (LWP 2282) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
  429  Thread 0x7f9d4fcbc700 (LWP 12212) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
  428  Thread 0x7f9d4fdbd700 (LWP 12211) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
  427  Thread 0x7f9d4febe700 (LWP 12210) 0x00007fa16bfe47df in accept4 () from /lib64/libc.so.6
  426  Thread 0x7f9d4ffbf700 (LWP 12209) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
  425  Thread 0x7f9d500c0700 (LWP 30135) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
  424  Thread 0x7f9d501c1700 (LWP 30134) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
  423  Thread 0x7f9d502c2700 (LWP 30131) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
  422  Thread 0x7f9d504c4700 (LWP 30132) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
  421  Thread 0x7f9d503c3700 (LWP 30133) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
  420  Thread 0x7f9d505c5700 (LWP 29260) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
  419  Thread 0x7f9d506c6700 (LWP 29037) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
  418  Thread 0x7f9d507c7700 (LWP 29016) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
  417  Thread 0x7f9d508c8700 (LWP 29003) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
  416  Thread 0x7f9d509c9700 (LWP 28985) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
  415  Thread 0x7f9d50aca700 (LWP 28962) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
  414  Thread 0x7f9d50bcb700 (LWP 28961) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
  413  Thread 0x7f9d50ccc700 (LWP 28908) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
...
  24   Thread 0x7fa15a6e6700 (LWP 21066) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
  23   Thread 0x7fa15b8f8700 (LWP 21063) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
  22   Thread 0x7fa15a8e8700 (LWP 12629) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
  21   Thread 0x7fa15aaea700 (LWP 28154) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
  20   Thread 0x7fa16bad5700 (LWP 26013) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
  19   Thread 0x7fa15b5f5700 (LWP 8808) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
  18   Thread 0x7fa15b6f6700 (LWP 8452) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
  17   Thread 0x7fa15aeee700 (LWP 29477) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
  16   Thread 0x7fa15a7e7700 (LWP 4804) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
  15   Thread 0x7fa15acec700 (LWP 4728) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
  14   Thread 0x7fa15b7f7700 (LWP 4477) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
  13   Thread 0x7fa1686b4700 (LWP 2561) 0x00007fa16bfe47df in accept4 () from /lib64/libc.so.6
  12   Thread 0x7fa1687b5700 (LWP 2553) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
  11   Thread 0x7fa1688b6700 (LWP 2552) 0x00007fa16cce470d in read () from /lib64/libpthread.so.0
  10   Thread 0x7fa168eb3700 (LWP 2551) 0x00007fa16cce4e4d in nanosleep () from /lib64/libpthread.so.0
  9    Thread 0x7fa168fb4700 (LWP 2550) 0x00007fa16bf213fa in sigwaitinfo () from /lib64/libc.so.6
  8    Thread 0x7fa1690b5700 (LWP 2549) 0x00007fa16bf213fa in sigwaitinfo () from /lib64/libc.so.6
  7    Thread 0x7fa1691b6700 (LWP 2548) 0x00007fa16cce4e4d in nanosleep () from /lib64/libpthread.so.0
  6    Thread 0x7fa16b60b700 (LWP 2535) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6
  5    Thread 0x7fa16b74e700 (LWP 2534) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6
  4    Thread 0x7fa16b891700 (LWP 2533) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6
  3    Thread 0x7fa16bbd6700 (LWP 2530) 0x00007fa16cce1cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2    Thread 0x7fa16bcd7700 (LWP 2529) 0x00007fa16cce1cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1    Thread 0x7fa16d826700 (LWP 2299) 0x00007fa16cce4ead in open64 () from /lib64/libpthread.so.0

@simonmichal is having a look at a core dump. More info to follow.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/xrootd/xrootd","title":"xrootd/xrootd","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/xrootd/xrootd"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"Deadlock (?) in xrootd server observed in CASTOR/CERN (#615)"}],"action":{"name":"View Issue","url":"https://github.com/xrootd/xrootd/issues/615"}}}

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1