Hi Giuseppe,
I am in contact with Michal trying to get additional information to see what is causing this particular issue.
Andy
From: Giuseppe
Sent: Thursday, November 02, 2017 11:35 AM
To: xrootd/xrootd
Cc: Subscribed
Subject: [xrootd/xrootd] Deadlock (?) in xrootd server observed in CASTOR/CERN (#615)
We've recently observed a few cases where an xrootd server stops serving clients and enters into a deadlock (or possibly livelock) state. This is with xrootd 4.6 and 4.7, and a specific CASTOR plugin for metadata processing (data access goes via pure xrootd on local disk).
A typical signature in gdb looks like:
(gdb) info threads
Id Target Id Frame
430 Thread 0x7fa16d85d740 (LWP 2282) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
429 Thread 0x7f9d4fcbc700 (LWP 12212) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
428 Thread 0x7f9d4fdbd700 (LWP 12211) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
427 Thread 0x7f9d4febe700 (LWP 12210) 0x00007fa16bfe47df in accept4 () from /lib64/libc.so.6
426 Thread 0x7f9d4ffbf700 (LWP 12209) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
425 Thread 0x7f9d500c0700 (LWP 30135) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
424 Thread 0x7f9d501c1700 (LWP 30134) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
423 Thread 0x7f9d502c2700 (LWP 30131) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
422 Thread 0x7f9d504c4700 (LWP 30132) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
421 Thread 0x7f9d503c3700 (LWP 30133) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
420 Thread 0x7f9d505c5700 (LWP 29260) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
419 Thread 0x7f9d506c6700 (LWP 29037) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
418 Thread 0x7f9d507c7700 (LWP 29016) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
417 Thread 0x7f9d508c8700 (LWP 29003) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
416 Thread 0x7f9d509c9700 (LWP 28985) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
415 Thread 0x7f9d50aca700 (LWP 28962) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
414 Thread 0x7f9d50bcb700 (LWP 28961) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
413 Thread 0x7f9d50ccc700 (LWP 28908) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
...
24 Thread 0x7fa15a6e6700 (LWP 21066) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
23 Thread 0x7fa15b8f8700 (LWP 21063) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
22 Thread 0x7fa15a8e8700 (LWP 12629) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
21 Thread 0x7fa15aaea700 (LWP 28154) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
20 Thread 0x7fa16bad5700 (LWP 26013) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
19 Thread 0x7fa15b5f5700 (LWP 8808) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
18 Thread 0x7fa15b6f6700 (LWP 8452) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
17 Thread 0x7fa15aeee700 (LWP 29477) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
16 Thread 0x7fa15a7e7700 (LWP 4804) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
15 Thread 0x7fa15acec700 (LWP 4728) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
14 Thread 0x7fa15b7f7700 (LWP 4477) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0
13 Thread 0x7fa1686b4700 (LWP 2561) 0x00007fa16bfe47df in accept4 () from /lib64/libc.so.6
12 Thread 0x7fa1687b5700 (LWP 2553) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0
11 Thread 0x7fa1688b6700 (LWP 2552) 0x00007fa16cce470d in read () from /lib64/libpthread.so.0
10 Thread 0x7fa168eb3700 (LWP 2551) 0x00007fa16cce4e4d in nanosleep () from /lib64/libpthread.so.0
9 Thread 0x7fa168fb4700 (LWP 2550) 0x00007fa16bf213fa in sigwaitinfo () from /lib64/libc.so.6
8 Thread 0x7fa1690b5700 (LWP 2549) 0x00007fa16bf213fa in sigwaitinfo () from /lib64/libc.so.6
7 Thread 0x7fa1691b6700 (LWP 2548) 0x00007fa16cce4e4d in nanosleep () from /lib64/libpthread.so.0
6 Thread 0x7fa16b60b700 (LWP 2535) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6
5 Thread 0x7fa16b74e700 (LWP 2534) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6
4 Thread 0x7fa16b891700 (LWP 2533) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6
3 Thread 0x7fa16bbd6700 (LWP 2530) 0x00007fa16cce1cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
2 Thread 0x7fa16bcd7700 (LWP 2529) 0x00007fa16cce1cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1 Thread 0x7fa16d826700 (LWP 2299) 0x00007fa16cce4ead in open64 () from /lib64/libpthread.so.0
@simonmichal is having a look at a core dump. More info to follow.
ā
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/xrootd/xrootd","title":"xrootd/xrootd","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/xrootd/xrootd"}},"updates":{"snippets":[{"icon":"PERSON","message":"@abh3 in #615: Hi Giuseppe,\n\nI am in contact with Michal trying to get additional information to see what is causing this particular issue.\n\nAndy\n\nFrom: Giuseppe \nSent: Thursday, November 02, 2017 11:35 AM\nTo: xrootd/xrootd \nCc: Subscribed \nSubject: [xrootd/xrootd] Deadlock (?) in xrootd server observed in CASTOR/CERN (#615)\n\nWe've recently observed a few cases where an xrootd server stops serving clients and enters into a deadlock (or possibly livelock) state. This is with xrootd 4.6 and 4.7, and a specific CASTOR plugin for metadata processing (data access goes via pure xrootd on local disk).\n\nA typical signature in gdb looks like:\n\n(gdb) info threads\n Id Target Id Frame\n 430 Thread 0x7fa16d85d740 (LWP 2282) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 429 Thread 0x7f9d4fcbc700 (LWP 12212) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 428 Thread 0x7f9d4fdbd700 (LWP 12211) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 427 Thread 0x7f9d4febe700 (LWP 12210) 0x00007fa16bfe47df in accept4 () from /lib64/libc.so.6\n 426 Thread 0x7f9d4ffbf700 (LWP 12209) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 425 Thread 0x7f9d500c0700 (LWP 30135) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 424 Thread 0x7f9d501c1700 (LWP 30134) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 423 Thread 0x7f9d502c2700 (LWP 30131) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 422 Thread 0x7f9d504c4700 (LWP 30132) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 421 Thread 0x7f9d503c3700 (LWP 30133) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 420 Thread 0x7f9d505c5700 (LWP 29260) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 419 Thread 0x7f9d506c6700 (LWP 29037) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 418 Thread 0x7f9d507c7700 (LWP 29016) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 417 Thread 0x7f9d508c8700 (LWP 29003) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 416 Thread 0x7f9d509c9700 (LWP 28985) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 415 Thread 0x7f9d50aca700 (LWP 28962) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 414 Thread 0x7f9d50bcb700 (LWP 28961) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 413 Thread 0x7f9d50ccc700 (LWP 28908) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n...\n 24 Thread 0x7fa15a6e6700 (LWP 21066) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 23 Thread 0x7fa15b8f8700 (LWP 21063) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 22 Thread 0x7fa15a8e8700 (LWP 12629) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 21 Thread 0x7fa15aaea700 (LWP 28154) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 20 Thread 0x7fa16bad5700 (LWP 26013) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 19 Thread 0x7fa15b5f5700 (LWP 8808) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 18 Thread 0x7fa15b6f6700 (LWP 8452) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 17 Thread 0x7fa15aeee700 (LWP 29477) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 16 Thread 0x7fa15a7e7700 (LWP 4804) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 15 Thread 0x7fa15acec700 (LWP 4728) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 14 Thread 0x7fa15b7f7700 (LWP 4477) 0x00007fa16cce442d in __lll_lock_wait () from /lib64/libpthread.so.0\n 13 Thread 0x7fa1686b4700 (LWP 2561) 0x00007fa16bfe47df in accept4 () from /lib64/libc.so.6\n 12 Thread 0x7fa1687b5700 (LWP 2553) 0x00007fa16cce3a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0\n 11 Thread 0x7fa1688b6700 (LWP 2552) 0x00007fa16cce470d in read () from /lib64/libpthread.so.0\n 10 Thread 0x7fa168eb3700 (LWP 2551) 0x00007fa16cce4e4d in nanosleep () from /lib64/libpthread.so.0\n 9 Thread 0x7fa168fb4700 (LWP 2550) 0x00007fa16bf213fa in sigwaitinfo () from /lib64/libc.so.6\n 8 Thread 0x7fa1690b5700 (LWP 2549) 0x00007fa16bf213fa in sigwaitinfo () from /lib64/libc.so.6\n 7 Thread 0x7fa1691b6700 (LWP 2548) 0x00007fa16cce4e4d in nanosleep () from /lib64/libpthread.so.0\n 6 Thread 0x7fa16b60b700 (LWP 2535) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6\n 5 Thread 0x7fa16b74e700 (LWP 2534) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6\n 4 Thread 0x7fa16b891700 (LWP 2533) 0x00007fa16bfe3923 in epoll_wait () from /lib64/libc.so.6\n 3 Thread 0x7fa16bbd6700 (LWP 2530) 0x00007fa16cce1cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0\n 2 Thread 0x7fa16bcd7700 (LWP 2529) 0x00007fa16cce1cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0\n* 1 Thread 0x7fa16d826700 (LWP 2299) 0x00007fa16cce4ead in open64 () from /lib64/libpthread.so.0\n@simonmichal is having a look at a core dump. More info to follow.\n\nā\nYou are receiving this because you are subscribed to this thread.\nReply to this email directly, view it on GitHub, or mute the thread.\n"}],"action":{"name":"View Issue","url":"https://github.com/xrootd/xrootd/issues/615#issuecomment-341588779"}}}
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1