We started to move file stageout from GridFTP to XrdHttp and overnight began to see segfaults. The traceback looks like this: ``` #6 <signal handler called> #7 getErrText (ecode=@0x7f6d1cffec7c: 32621, this=0x8) at /usr/src/debug/xrootd/xrootd/src/XrdOuc/XrdOucErrInfo.hh:273 #8 XrdXrootdProtocol::fsError (this=0x7f6d4484d808, rc=-1, opC=0 '\000', myError=..., Path=0x0, Cgi=0x0) at /usr/src/debug/xrootd/xrootd/src/XrdXrootd/XrdXrootdXeq.cc:3185 #9 0x00007f6d635304e0 in XrdXrootdTransit::Process (this=0x7f6d4484d800, lp=0x7f6d27082028) at /usr/src/debug/xrootd/xrootd/src/XrdXrootd/XrdXrootdTransit.cc:370 #10 0x00007f6d632b2e19 in XrdLink::DoIt (this=0x7f6d27082028) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdLink.cc:441 #11 0x00007f6d632b61cf in XrdScheduler::Run (this=0x610e78 <XrdMain::Config+440>) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdScheduler.cc:357 #12 0x00007f6d632b6319 in XrdStartWorking (carg=<optimized out>) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdScheduler.cc:87 #13 0x00007f6d63274947 in XrdSysThread_Xeq (myargs=0x7f6d1d00e040) at /usr/src/debug/xrootd/xrootd/src/XrdSys/XrdSysPthread.cc:86 #14 0x00007f6d62e30e25 in start_thread (arg=0x7f6d1cfff700) at pthread_create.c:308 #15 0x00007f6d62133bad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 ``` So, it's in the core xrootd code. Note that inlining is going on as the actual stack trace doesn't exist in the code. The key insight is this one: ``` #7 getErrText (ecode=@0x7f6d1cffec7c: 32621, this=0x8) at /usr/src/debug/xrootd/xrootd/src/XrdOuc/XrdOucErrInfo.hh:273 ``` That indicates the `XrdOucErrInfo` object is an 8 byte offset of a null pointer. That is, something is doing `foo->myError` where `foo` is unexpectedly `NULL`. Further guessing this isn't in the read code paths, we get the following code: https://github.com/xrootd/xrootd/blob/master/src/XrdXrootd/XrdXrootdXeq.cc#L2846 or https://github.com/xrootd/xrootd/blob/master/src/XrdXrootd/XrdXrootdXeq.cc#L547 Looking at the `myFile` object I can confirm that `XrdSfsp` is null: ``` (gdb) p myFile->XrdSfsp $6 = (XrdSfsFile *) 0x0 ``` and, searching for the path corresponding to `myFile` in the log: ``` 180910 08:24:20 19163 acc_Audit: http grant uscmsPool001@[::ffff:XXX.YYY.ZZZ.AAA] create /store/.../BBbar_JpsiFilter_SoftQCD_GEN_SIM_2441.root Resulting PFN: /user/uscms01/pnfs/unl.edu/data4/cms/store/.../BBbar_JpsiFilter_SoftQCD_GEN_SIM_2441.root Resulting PFN: /user/uscms01/pnfs/unl.edu/data4/cms/store/.../BBbar_JpsiFilter_SoftQCD_GEN_SIM_2441.root File we will access: /user/uscms01/pnfs/unl.edu/data4/cms/store/.../BBbar_JpsiFilter_SoftQCD_GEN_SIM_2441.root org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/uscms01/pnfs/unl.edu/data4/cms/store/.../BBbar_JpsiFilter_SoftQCD_GEN_SIM_2441.root could only be replicated to 0 nodes instead of minReplication (=1). There are 207 datanode(s) running and no node(s) are excluded in this operation. RemoteException: File /user/uscms01/pnfs/unl.edu/data4/cms/store/.../BBbar_JpsiFilter_SoftQCD_GEN_SIM_2441.root could only be replicated to 0 nodes instead of minReplication (=1). There are 207 datanode(s) running and no node(s) are excluded in this operation. org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/uscms01/pnfs/unl.edu/data4/cms/store/.../BBbar_JpsiFilter_SoftQCD_GEN_SIM_2441.root could only be replicated to 0 nodes instead of minReplication (=1). There are 207 datanode(s) running and no node(s) are excluded in this operation. 180910 08:24:20 4341 hdfs_close: Unable to close /user/uscms01/pnfs/unl.edu/data4/cms/store/.../BBbar_JpsiFilter_SoftQCD_GEN_SIM_2441.root; Unknown error 255 ``` (log snippet lightly edited to remove user name and IP addresses) So, the last error message is about `hdfs_close`. Hence, I think the failure is in the close code. Indeed, here we delete the object: https://github.com/xrootd/xrootd/blob/master/src/XrdXrootd/XrdXrootdXeq.cc#L541 That eventually calls `XrdXrootdFile`'s destructor which indeed zeros out `fp->XrdSfsp`, causing the segfault. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/xrootd/xrootd/issues/818 ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1