Hi Giuseppe, I am in contact with Michal trying to get additional information to see what is causing this particular issue. Andy From: Giuseppe Sent: Thursday, November 02, 2017 11:35 AM To: xrootd/xrootd Cc: Subscribed Subject: [xrootd/xrootd] Deadlock (?) in xrootd server observed in CASTOR/CERN (#615) We've recently observed a few cases where an xrootd server stops serving clients and enters into a deadlock (or possibly livelock) state. This is with xrootd 4.6 and 4.7, and a specific CASTOR plugin for metadata processing (data access goes via pure xrootd on local disk). A typical signature in gdb looks like: (gdb) info threads Id Target Id Frame 430 Thread 0x7fa16d85d740 (LWP 2282) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 429 Thread 0x7f9d4fcbc700 (LWP 12212) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 428 Thread 0x7f9d4fdbd700 (LWP 12211) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 427 Thread 0x7f9d4febe700 (LWP 12210) 0x00007fa16bfe47df in accept4 () from /lib64/libc.so.6 426 Thread 0x7f9d4ffbf700 (LWP 12209) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 425 Thread 0x7f9d500c0700 (LWP 30135) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 424 Thread 0x7f9d501c1700 (LWP 30134) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 423 Thread 0x7f9d502c2700 (LWP 30131) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 422 Thread 0x7f9d504c4700 (LWP 30132) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 421 Thread 0x7f9d503c3700 (LWP 30133) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 420 Thread 0x7f9d505c5700 (LWP 29260) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 419 Thread 0x7f9d506c6700 (LWP 29037) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 418 Thread 0x7f9d507c7700 (LWP 29016) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 417 Thread 0x7f9d508c8700 (LWP 29003) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 416 Thread 0x7f9d509c9700 (LWP 28985) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0 415 Thread 0x7f9d50aca700 (LWP 28962) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0 414 Thread 0x7f9d50bcb700 (LWP 28961) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0 413 Thread 0x7f9d50ccc700 (LWP 28908) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0 ... 24 Thread 0x7fa15a6e6700 (LWP 21066) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0 23 Thread 0x7fa15b8f8700 (LWP 21063) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0 22 Thread 0x7fa15a8e8700 (LWP 12629) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0 21 Thread 0x7fa15aaea700 (LWP 28154) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0 20 Thread 0x7fa16bad5700 (LWP 26013) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0 19 Thread 0x7fa15b5f5700 (LWP 8808) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0 18 Thread 0x7fa15b6f6700 (LWP 8452) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 17 Thread 0x7fa15aeee700 (LWP 29477) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0 16 Thread 0x7fa15a7e7700 (LWP 4804) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0 15 Thread 0x7fa15acec700 (LWP 4728) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0 14 Thread 0x7fa15b7f7700 (LWP 4477) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0 13 Thread 0x7fa1686b4700 (LWP 2561) 0x00007fa16bfe47df in accept4 () from /lib64/libc.so.6 12 Thread 0x7fa1687b5700 (LWP 2553) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 11 Thread 0x7fa1688b6700 (LWP 2552) 0x00007fa16cce470d in read () from /lib64/libpthread.so.0 10 Thread 0x7fa168eb3700 (LWP 2551) 0x00007fa16cce4e4d in nanosleep () from /lib64/libpthread.so.0 9 Thread 0x7fa168fb4700 (LWP 2550) 0x00007fa16bf213fa in sigwaitinfo () from /lib64/libc.so.6 8 Thread 0x7fa1690b5700 (LWP 2549) 0x00007fa16bf213fa in sigwaitinfo () from /lib64/libc.so.6 7 Thread 0x7fa1691b6700 (LWP 2548) 0x00007fa16cce4e4d in nanosleep () from /lib64/libpthread.so.0 6 Thread 0x7fa16b60b700 (LWP 2535) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6 5 Thread 0x7fa16b74e700 (LWP 2534) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6 4 Thread 0x7fa16b891700 (LWP 2533) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6 3 Thread 0x7fa16bbd6700 (LWP 2530) 0x00007fa16cce1cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 2 Thread 0x7fa16bcd7700 (LWP 2529) 0x00007fa16cce1cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 * 1 Thread 0x7fa16d826700 (LWP 2299) 0x00007fa16cce4ead in open64 () from /lib64/libpthread.so.0 @simonmichal is having a look at a core dump. More info to follow. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/xrootd/xrootd/issues/615#issuecomment-341588779 ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1