We've recently observed a few cases where an xrootd server stops serving clients and enters into a deadlock (or possibly livelock) state. This is with xrootd 4.6 and 4.7, and a specific CASTOR plugin for metadata processing (data access goes via pure xrootd on local disk).
A typical signature in gdb looks like:
(gdb) info threads
Id Target Id Frame
430 Thread 0x7fa16d85d740 (LWP 2282) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
429 Thread 0x7f9d4fcbc700 (LWP 12212) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
428 Thread 0x7f9d4fdbd700 (LWP 12211) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
427 Thread 0x7f9d4febe700 (LWP 12210) 0x00007fa16bfe47df in accept4 () from /lib64/libc.so.6
426 Thread 0x7f9d4ffbf700 (LWP 12209) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
425 Thread 0x7f9d500c0700 (LWP 30135) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
424 Thread 0x7f9d501c1700 (LWP 30134) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
423 Thread 0x7f9d502c2700 (LWP 30131) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
422 Thread 0x7f9d504c4700 (LWP 30132) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
421 Thread 0x7f9d503c3700 (LWP 30133) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
420 Thread 0x7f9d505c5700 (LWP 29260) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
419 Thread 0x7f9d506c6700 (LWP 29037) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
418 Thread 0x7f9d507c7700 (LWP 29016) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
417 Thread 0x7f9d508c8700 (LWP 29003) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
416 Thread 0x7f9d509c9700 (LWP 28985) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
415 Thread 0x7f9d50aca700 (LWP 28962) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
414 Thread 0x7f9d50bcb700 (LWP 28961) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
413 Thread 0x7f9d50ccc700 (LWP 28908) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
...
24 Thread 0x7fa15a6e6700 (LWP 21066) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
23 Thread 0x7fa15b8f8700 (LWP 21063) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
22 Thread 0x7fa15a8e8700 (LWP 12629) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
21 Thread 0x7fa15aaea700 (LWP 28154) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
20 Thread 0x7fa16bad5700 (LWP 26013) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
19 Thread 0x7fa15b5f5700 (LWP 8808) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
18 Thread 0x7fa15b6f6700 (LWP 8452) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
17 Thread 0x7fa15aeee700 (LWP 29477) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
16 Thread 0x7fa15a7e7700 (LWP 4804) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
15 Thread 0x7fa15acec700 (LWP 4728) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
14 Thread 0x7fa15b7f7700 (LWP 4477) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
13 Thread 0x7fa1686b4700 (LWP 2561) 0x00007fa16bfe47df in accept4 () from /lib64/libc.so.6
12 Thread 0x7fa1687b5700 (LWP 2553) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
11 Thread 0x7fa1688b6700 (LWP 2552) 0x00007fa16cce470d in read () from /lib64/libpthread.so.0
10 Thread 0x7fa168eb3700 (LWP 2551) 0x00007fa16cce4e4d in nanosleep () from /lib64/libpthread.so.0
9 Thread 0x7fa168fb4700 (LWP 2550) 0x00007fa16bf213fa in sigwaitinfo () from /lib64/libc.so.6
8 Thread 0x7fa1690b5700 (LWP 2549) 0x00007fa16bf213fa in sigwaitinfo () from /lib64/libc.so.6
7 Thread 0x7fa1691b6700 (LWP 2548) 0x00007fa16cce4e4d in nanosleep () from /lib64/libpthread.so.0
6 Thread 0x7fa16b60b700 (LWP 2535) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6
5 Thread 0x7fa16b74e700 (LWP 2534) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6
4 Thread 0x7fa16b891700 (LWP 2533) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6
3 Thread 0x7fa16bbd6700 (LWP 2530) 0x00007fa16cce1cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
2 Thread 0x7fa16bcd7700 (LWP 2529) 0x00007fa16cce1cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1 Thread 0x7fa16d826700 (LWP 2299) 0x00007fa16cce4ead in open64 () from /lib64/libpthread.so.0
@simonmichal is having a look at a core dump. More info to follow.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1