Thanks. There are several possibilities; none of which look impossible but none that I could point to the smoking gun. I see what is happening but can't explain why. I do know hat there have been significant work on trying to keep GPFS from slowing down when multiple threads hit it at once (this happens in this particular case). You can see all fixed to GPFS since your release here:

https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_apars_505x.html#APARs

You may want to consider upgrading GPFS. Also, there have been numerous improvements on how async I/I is handled by the server since 4.12.4. As this release is no longer supported I would strongly urge you to upgrade to the latest release, 5.5.1.

In any case, it would strongly imply that turning off async on several servers keeps GPFS from getting overloaded and that does no trigger deficiencies in xrootd that wind up effectively deadlocking. Tha, because client recovery, quickly uses up all available threads.

So, we can wait and I suspect we won't see a problem but let's not make that call until a couple of more weeks go by. It could simply be that the load has dropped off due to the holidays.


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: <xrootd/xrootd/issues/1799/1334238576@github.com>

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/1799#issuecomment-1334238576", "url": "https://github.com/xrootd/xrootd/issues/1799#issuecomment-1334238576", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1