Print

Print


@abh3 - it looks like there's a potential deadlock whenever there's a I/O operation queued in the Scheduler that holds a reference count to the XrdLink object.

When there are no remaining idle threads, then the queued jobs that hold a reference count will keep the reference count non-zero indefinitely.

However, some of the running jobs cannot complete until the ref count drops to zero. There's ~1500 stuck threads In @lmorganti's stack trace of the following pattern:

#4  0x00007fae9f87c1fe in Wait (this=<optimized out>) at /usr/src/debug/xrootd/xrootd/src/XrdSys/XrdSysPthread.hh:419
#5  XrdLink::Serialize (this=0x7fac6401b978) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdLink.cc:1086
#6  0x00007fae9fb002a8 in XrdXrootdProtocol::do_Close (this=0x7fadbc0402e0) at /usr/src/debug/xrootd/xrootd/src/XrdXrootd/XrdXrootdXeq.cc:532
#7  0x00007fae9f87ce49 in XrdLink::DoIt (this=0x7fac6401b978) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdLink.cc:441
#8  0x00007fae9f8801df in XrdScheduler::Run (this=0x610e58 <XrdMain::Config+440>) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdScheduler.cc:357
#9  0x00007fae9f880329 in XrdStartWorking (carg=<optimized out>) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdScheduler.cc:87
#10 0x00007fae9f845be7 in XrdSysThread_Xeq (myargs=0x7fad100449f0) at /usr/src/debug/xrootd/xrootd/src/XrdSys/XrdSysPthread.cc:86

Assuming there's a queued I/O request corresponding to those links, those 1500 threads will never finish.

Queued job can't run until something finishes. Running job can't finish until queued job runs. Classic deadlock.

Now, in what circumstances do queued jobs hold a reference to the link? I count two cases:

  1. Async I/O (both normal reads and pgreads)
  2. When non-default sockets are used for a read. I think this might occur if the client requests for encrypted control and unencrypted data.

I'm not 100% sure but perhaps (2) is rare (is it really rare in the case of caches as well?)? That might give some hope that (1) can be avoided simply by turning off async IO.

Is it possible to automatically turn off async IO when thread counts are limited?


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: <xrootd/xrootd/issues/1799/1334481044@github.com>

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/1799#issuecomment-1334481044", "url": "https://github.com/xrootd/xrootd/issues/1799#issuecomment-1334481044", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1