XRootD client fails (repeatedly) with the following back trace in eos/pps/storage:
#0 0x00007fa7ee07f250 in ?? ()
#1 0x000000335a65e477 in ~XRootDMsgHandler (this=0x7fa7c5038800, __in_chrg=) at /usr/src/debug/xrootd/xrootd/src/XrdCl/XrdClXRootDMsgHandler.hh:103
#2 XrdCl::XRootDMsgHandler::~XRootDMsgHandler (this=0x7fa7c5038800, __in_chrg=) at /usr/src/debug/xrootd/xrootd/src/XrdCl/XrdClXRootDMsgHandler.hh:108
#3 0x000000335a65b49a in XrdCl::XRootDMsgHandler::HandleResponse (this=0x7fa7c5038800) at /usr/src/debug/xrootd/xrootd/src/XrdCl/XrdClXRootDMsgHandler.cc:1075
#4 0x000000335a65bcd3 in XrdCl::XRootDMsgHandler::HandleError (this=0x7fa7c5038800, status=..., msg=) at /usr/src/debug/xrootd/xrootd/src/XrdCl/XrdClXRootDMsgHandler.cc:1799
#5 0x000000335a65c209 in XrdCl::XRootDMsgHandler::OnStatusReady (this=0x7fa7c5038800, message=, status=...) at /usr/src/debug/xrootd/xrootd/src/XrdCl/XrdClXRootDMsgHandler.cc:965
#6 0x000000335a64e079 in XrdCl::OutQueue::Report (this=0x7fa7f44fbc00, status=...) at /usr/src/debug/xrootd/xrootd/src/XrdCl/XrdClOutQueue.cc:79
#7 0x000000335a6433d4 in XrdCl::Stream::OnFatalError (this=0x7fa7f7a01c80, subStream=, status=..., lock=...) at /usr/src/debug/xrootd/xrootd/src/XrdCl/XrdClStream.cc:814
#8 0x000000335a6448a5 in XrdCl::Stream::OnConnectError (this=0x7fa7f7a01c80, subStream=0, status=) at /usr/src/debug/xrootd/xrootd/src/XrdCl/XrdClStream.cc:684
#9 0x000000335a6862d7 in XrdCl::AsyncSocketHandler::OnFaultWhileHandshaking (this=0x7fa7f7a18500, st=...) at /usr/src/debug/xrootd/xrootd/src/XrdCl/XrdClAsyncSocketHandler.cc:736
#10 0x000000335a63db03 in (anonymous namespace)::SocketCallBack::Event (this=0x7fa7f09d70a0, chP=, cbArg=, evFlags=) at /usr/src/debug/xrootd/xrootd/src/XrdCl/XrdClPollerBuiltIn.cc:82
#11 0x000000335a22aaa8 in XrdSys::IOEvents::Poller::CbkXeq (this=0x7fa7f79e48e0, cP=0x7fa7c1453060, events=8, eNum=0, eTxt=0x0) at /usr/src/debug/xrootd/xrootd/src/XrdSys/XrdSysIOEvents.cc:692
#12 0x000000335a22ad37 in XrdSys::IOEvents::Poller::CbkTMO (this=0x7fa7f79e48e0) at /usr/src/debug/xrootd/xrootd/src/XrdSys/XrdSysIOEvents.cc:597
#13 0x000000335a22ae42 in XrdSys::IOEvents::PollE::Begin (this=0x7fa7f79e48e0, syncsem=, retcode=, eTxt=) at /usr/src/debug/xrootd/xrootd/src/XrdSys/XrdSysIOEventsPollE.icc:217
#14 0x000000335a2296fd in XrdSys::IOEvents::BootStrap::Start (parg=0x7ffe5495db60) at /usr/src/debug/xrootd/xrootd/src/XrdSys/XrdSysIOEvents.cc:131
#15 0x000000335a2276ff in XrdSysThread_Xeq (myargs=0x7fa7f7a5efa0) at /usr/src/debug/xrootd/xrootd/src/XrdSys/XrdSysPthread.cc:86
#16 0x0000003a64207aa1 in start_thread () from /lib64/libpthread.so.0
#17 0x0000003a63ee893d in clone () from /lib64/libc.so.6
After investigating the problem I came up with the following scenario (which I was able to reproduce with GDB):
This is very difficult to reproduce because various timeouts need to happen in a very specific order.
(Also, the theory has a weak point: a segfault could also happen earlier when the open request is being unmarshalled (XrdClXRootDMsgHandler.cc:1043), which for some reason never happens in eos/pps/storage.)
I can see two possible solutions:
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1