Hi Giuseppe,

I am in contact with Michal trying to get additional information to see what is causing this particular issue.

Andy

From: Giuseppe
Sent: Thursday, November 02, 2017 11:35 AM
To: xrootd/xrootd
Cc: Subscribed
Subject: [xrootd/xrootd] Deadlock (?) in xrootd server observed in CASTOR/CERN (#615)

We've recently observed a few cases where an xrootd server stops serving clients and enters into a deadlock (or possibly livelock) state. This is with xrootd 4.6 and 4.7, and a specific CASTOR plugin for metadata processing (data access goes via pure xrootd on local disk).

A typical signature in gdb looks like:

(gdb) info threads
Id Target Id Frame
430 Thread 0x7fa16d85d740 (LWP 2282) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
429 Thread 0x7f9d4fcbc700 (LWP 12212) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
428 Thread 0x7f9d4fdbd700 (LWP 12211) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
427 Thread 0x7f9d4febe700 (LWP 12210) 0x00007fa16bfe47df in accept4 () from /lib64/libc.so.6
426 Thread 0x7f9d4ffbf700 (LWP 12209) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
425 Thread 0x7f9d500c0700 (LWP 30135) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
424 Thread 0x7f9d501c1700 (LWP 30134) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
423 Thread 0x7f9d502c2700 (LWP 30131) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
422 Thread 0x7f9d504c4700 (LWP 30132) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
421 Thread 0x7f9d503c3700 (LWP 30133) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
420 Thread 0x7f9d505c5700 (LWP 29260) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
419 Thread 0x7f9d506c6700 (LWP 29037) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
418 Thread 0x7f9d507c7700 (LWP 29016) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
417 Thread 0x7f9d508c8700 (LWP 29003) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
416 Thread 0x7f9d509c9700 (LWP 28985) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
415 Thread 0x7f9d50aca700 (LWP 28962) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
414 Thread 0x7f9d50bcb700 (LWP 28961) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
413 Thread 0x7f9d50ccc700 (LWP 28908) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
...
24 Thread 0x7fa15a6e6700 (LWP 21066) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
23 Thread 0x7fa15b8f8700 (LWP 21063) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
22 Thread 0x7fa15a8e8700 (LWP 12629) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
21 Thread 0x7fa15aaea700 (LWP 28154) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
20 Thread 0x7fa16bad5700 (LWP 26013) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
19 Thread 0x7fa15b5f5700 (LWP 8808) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
18 Thread 0x7fa15b6f6700 (LWP 8452) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
17 Thread 0x7fa15aeee700 (LWP 29477) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
16 Thread 0x7fa15a7e7700 (LWP 4804) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
15 Thread 0x7fa15acec700 (LWP 4728) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
14 Thread 0x7fa15b7f7700 (LWP 4477) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
13 Thread 0x7fa1686b4700 (LWP 2561) 0x00007fa16bfe47df in accept4 () from /lib64/libc.so.6
12 Thread 0x7fa1687b5700 (LWP 2553) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
11 Thread 0x7fa1688b6700 (LWP 2552) 0x00007fa16cce470d in read () from /lib64/libpthread.so.0
10 Thread 0x7fa168eb3700 (LWP 2551) 0x00007fa16cce4e4d in nanosleep () from /lib64/libpthread.so.0
9 Thread 0x7fa168fb4700 (LWP 2550) 0x00007fa16bf213fa in sigwaitinfo () from /lib64/libc.so.6
8 Thread 0x7fa1690b5700 (LWP 2549) 0x00007fa16bf213fa in sigwaitinfo () from /lib64/libc.so.6
7 Thread 0x7fa1691b6700 (LWP 2548) 0x00007fa16cce4e4d in nanosleep () from /lib64/libpthread.so.0
6 Thread 0x7fa16b60b700 (LWP 2535) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6
5 Thread 0x7fa16b74e700 (LWP 2534) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6
4 Thread 0x7fa16b891700 (LWP 2533) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6
3 Thread 0x7fa16bbd6700 (LWP 2530) 0x00007fa16cce1cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
2 Thread 0x7fa16bcd7700 (LWP 2529) 0x00007fa16cce1cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1 Thread 0x7fa16d826700 (LWP 2299) 0x00007fa16cce4ead in open64 () from /lib64/libpthread.so.0
@simonmichal is having a look at a core dump. More info to follow.

ā€”
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/xrootd/xrootd","title":"xrootd/xrootd","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/xrootd/xrootd"}},"updates":{"snippets":[{"icon":"PERSON","message":"@abh3 in #615: Hi Giuseppe,\n\nI am in contact with Michal trying to get additional information to see what is causing this particular issue.\n\nAndy\n\nFrom: Giuseppe \nSent: Thursday, November 02, 2017 11:35 AM\nTo: xrootd/xrootd \nCc: Subscribed \nSubject: [xrootd/xrootd] Deadlock (?) in xrootd server observed in CASTOR/CERN (#615)\n\nWe've recently observed a few cases where an xrootd server stops serving clients and enters into a deadlock (or possibly livelock) state. This is with xrootd 4.6 and 4.7, and a specific CASTOR plugin for metadata processing (data access goes via pure xrootd on local disk).\n\nA typical signature in gdb looks like:\n\n(gdb) info threads\n Id Target Id Frame\n 430 Thread 0x7fa16d85d740 (LWP 2282) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 429 Thread 0x7f9d4fcbc700 (LWP 12212) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 428 Thread 0x7f9d4fdbd700 (LWP 12211) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 427 Thread 0x7f9d4febe700 (LWP 12210) 0x00007fa16bfe47df in accept4 () from /lib64/libc.so.6\n 426 Thread 0x7f9d4ffbf700 (LWP 12209) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 425 Thread 0x7f9d500c0700 (LWP 30135) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 424 Thread 0x7f9d501c1700 (LWP 30134) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 423 Thread 0x7f9d502c2700 (LWP 30131) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 422 Thread 0x7f9d504c4700 (LWP 30132) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 421 Thread 0x7f9d503c3700 (LWP 30133) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 420 Thread 0x7f9d505c5700 (LWP 29260) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 419 Thread 0x7f9d506c6700 (LWP 29037) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 418 Thread 0x7f9d507c7700 (LWP 29016) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 417 Thread 0x7f9d508c8700 (LWP 29003) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 416 Thread 0x7f9d509c9700 (LWP 28985) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 415 Thread 0x7f9d50aca700 (LWP 28962) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 414 Thread 0x7f9d50bcb700 (LWP 28961) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 413 Thread 0x7f9d50ccc700 (LWP 28908) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n...\n 24 Thread 0x7fa15a6e6700 (LWP 21066) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 23 Thread 0x7fa15b8f8700 (LWP 21063) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 22 Thread 0x7fa15a8e8700 (LWP 12629) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 21 Thread 0x7fa15aaea700 (LWP 28154) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 20 Thread 0x7fa16bad5700 (LWP 26013) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 19 Thread 0x7fa15b5f5700 (LWP 8808) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 18 Thread 0x7fa15b6f6700 (LWP 8452) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 17 Thread 0x7fa15aeee700 (LWP 29477) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 16 Thread 0x7fa15a7e7700 (LWP 4804) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 15 Thread 0x7fa15acec700 (LWP 4728) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 14 Thread 0x7fa15b7f7700 (LWP 4477) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 13 Thread 0x7fa1686b4700 (LWP 2561) 0x00007fa16bfe47df in accept4 () from /lib64/libc.so.6\n 12 Thread 0x7fa1687b5700 (LWP 2553) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 11 Thread 0x7fa1688b6700 (LWP 2552) 0x00007fa16cce470d in read () from /lib64/libpthread.so.0\n 10 Thread 0x7fa168eb3700 (LWP 2551) 0x00007fa16cce4e4d in nanosleep () from /lib64/libpthread.so.0\n 9 Thread 0x7fa168fb4700 (LWP 2550) 0x00007fa16bf213fa in sigwaitinfo () from /lib64/libc.so.6\n 8 Thread 0x7fa1690b5700 (LWP 2549) 0x00007fa16bf213fa in sigwaitinfo () from /lib64/libc.so.6\n 7 Thread 0x7fa1691b6700 (LWP 2548) 0x00007fa16cce4e4d in nanosleep () from /lib64/libpthread.so.0\n 6 Thread 0x7fa16b60b700 (LWP 2535) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6\n 5 Thread 0x7fa16b74e700 (LWP 2534) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6\n 4 Thread 0x7fa16b891700 (LWP 2533) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6\n 3 Thread 0x7fa16bbd6700 (LWP 2530) 0x00007fa16cce1cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0\n 2 Thread 0x7fa16bcd7700 (LWP 2529) 0x00007fa16cce1cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0\n* 1 Thread 0x7fa16d826700 (LWP 2299) 0x00007fa16cce4ead in open64 () from /lib64/libpthread.so.0\n@simonmichal is having a look at a core dump. More info to follow.\n\nā€”\nYou are receiving this because you are subscribed to this thread.\nReply to this email directly, view it on GitHub, or mute the thread.\n"}],"action":{"name":"View Issue","url":"https://github.com/xrootd/xrootd/issues/615#issuecomment-341588779"}}}

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1