Print

Print


We started to move file stageout from GridFTP to XrdHttp and overnight began to see segfaults. The traceback looks like this:

#6  <signal handler called>
#7  getErrText (ecode=@0x7f6d1cffec7c: 32621, this=0x8) at /usr/src/debug/xrootd/xrootd/src/XrdOuc/XrdOucErrInfo.hh:273
#8  XrdXrootdProtocol::fsError (this=0x7f6d4484d808, rc=-1, opC=0 '\000', myError=..., Path=0x0, Cgi=0x0) at /usr/src/debug/xrootd/xrootd/src/XrdXrootd/XrdXrootdXeq.cc:3185
#9  0x00007f6d635304e0 in XrdXrootdTransit::Process (this=0x7f6d4484d800, lp=0x7f6d27082028) at /usr/src/debug/xrootd/xrootd/src/XrdXrootd/XrdXrootdTransit.cc:370
#10 0x00007f6d632b2e19 in XrdLink::DoIt (this=0x7f6d27082028) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdLink.cc:441
#11 0x00007f6d632b61cf in XrdScheduler::Run (this=0x610e78 <XrdMain::Config+440>) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdScheduler.cc:357
#12 0x00007f6d632b6319 in XrdStartWorking (carg=<optimized out>) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdScheduler.cc:87
#13 0x00007f6d63274947 in XrdSysThread_Xeq (myargs=0x7f6d1d00e040) at /usr/src/debug/xrootd/xrootd/src/XrdSys/XrdSysPthread.cc:86
#14 0x00007f6d62e30e25 in start_thread (arg=0x7f6d1cfff700) at pthread_create.c:308
#15 0x00007f6d62133bad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

So, it's in the core xrootd code. Note that inlining is going on as the actual stack trace doesn't exist in the code. The key insight is this one:

#7  getErrText (ecode=@0x7f6d1cffec7c: 32621, this=0x8) at /usr/src/debug/xrootd/xrootd/src/XrdOuc/XrdOucErrInfo.hh:273

That indicates the XrdOucErrInfo object is an 8 byte offset of a null pointer. That is, something is doing foo->myError where foo is unexpectedly NULL. Further guessing this isn't in the read code paths, we get the following code:

https://github.com/xrootd/xrootd/blob/master/src/XrdXrootd/XrdXrootdXeq.cc#L2846

or

https://github.com/xrootd/xrootd/blob/master/src/XrdXrootd/XrdXrootdXeq.cc#L547

Looking at the myFile object I can confirm that XrdSfsp is null:

(gdb) p myFile->XrdSfsp
$6 = (XrdSfsFile *) 0x0

and, searching for the path corresponding to myFile in the log:

180910 08:24:20 19163 acc_Audit: http grant  uscmsPool001@[::ffff:XXX.YYY.ZZZ.AAA] create /store/.../BBbar_JpsiFilter_SoftQCD_GEN_SIM_2441.root
Resulting PFN: /user/uscms01/pnfs/unl.edu/data4/cms/store/.../BBbar_JpsiFilter_SoftQCD_GEN_SIM_2441.root
Resulting PFN: /user/uscms01/pnfs/unl.edu/data4/cms/store/.../BBbar_JpsiFilter_SoftQCD_GEN_SIM_2441.root
File we will access: /user/uscms01/pnfs/unl.edu/data4/cms/store/.../BBbar_JpsiFilter_SoftQCD_GEN_SIM_2441.root
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/uscms01/pnfs/unl.edu/data4/cms/store/.../BBbar_JpsiFilter_SoftQCD_GEN_SIM_2441.root could only be replicated to 0 nodes instead of minReplication (=1).  There are 207 datanode(s) running and no node(s) are excluded in this operation.
RemoteException: File /user/uscms01/pnfs/unl.edu/data4/cms/store/.../BBbar_JpsiFilter_SoftQCD_GEN_SIM_2441.root could only be replicated to 0 nodes instead of minReplication (=1).  There are 207 datanode(s) running and no node(s) are excluded in this operation.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/uscms01/pnfs/unl.edu/data4/cms/store/.../BBbar_JpsiFilter_SoftQCD_GEN_SIM_2441.root could only be replicated to 0 nodes instead of minReplication (=1).  There are 207 datanode(s) running and no node(s) are excluded in this operation.
180910 08:24:20 4341 hdfs_close: Unable to close /user/uscms01/pnfs/unl.edu/data4/cms/store/.../BBbar_JpsiFilter_SoftQCD_GEN_SIM_2441.root; Unknown error 255

(log snippet lightly edited to remove user name and IP addresses)

So, the last error message is about hdfs_close. Hence, I think the failure is in the close code.

Indeed, here we delete the object:

https://github.com/xrootd/xrootd/blob/master/src/XrdXrootd/XrdXrootdXeq.cc#L541

That eventually calls XrdXrootdFile's destructor which indeed zeros out fp->XrdSfsp, causing the segfault.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/xrootd/xrootd","title":"xrootd/xrootd","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/email/message_cards/header.png","avatar_image_url":"https://assets-cdn.github.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/xrootd/xrootd"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"Segfaults in the XrdHttp write code (#818)"}],"action":{"name":"View Issue","url":"https://github.com/xrootd/xrootd/issues/818"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/818", "url": "https://github.com/xrootd/xrootd/issues/818", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } }, { "@type": "MessageCard", "@context": "http://schema.org/extensions", "hideOriginalBody": "false", "originator": "AF6C5A86-E920-430C-9C59-A73278B5EFEB", "title": "Segfaults in the XrdHttp write code (#818)", "sections": [ { "text": "", "activityTitle": "**Brian Bockelman**", "activityImage": "https://assets-cdn.github.com/images/email/message_cards/avatar.png", "activitySubtitle": "@bbockelm", "facts": [ { "name": "Repository: ", "value": "xrootd/xrootd" }, { "name": "Issue #: ", "value": 818 } ] } ], "potentialAction": [ { "name": "Add a comment", "@type": "ActionCard", "inputs": [ { "isMultiLine": true, "@type": "TextInput", "id": "IssueComment", "isRequired": false } ], "actions": [ { "name": "Comment", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueComment\",\n\"repositoryFullName\": \"xrootd/xrootd\",\n\"issueId\": 818,\n\"IssueComment\": \"{{IssueComment.value}}\"\n}" } ] }, { "name": "Close issue", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueClose\",\n\"repositoryFullName\": \"xrootd/xrootd\",\n\"issueId\": 818\n}" }, { "targets": [ { "os": "default", "uri": "https://github.com/xrootd/xrootd/issues/818" } ], "@type": "OpenUri", "name": "View on GitHub" }, { "name": "Unsubscribe", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"MuteNotification\",\n\"threadId\": 377903072\n}" } ], "themeColor": "26292E" } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1