Print

Print


Syracuse is reporting intermittent segfaults in their stashcache server, anywhere from minutes to hours between. Reviewing a couple core files, it looks like there's memory corruption happening.

They're running xrootd-4.8.5-1.osg34.el7.x86_64

I'll update this with more info as we work on debugging. Below are notes from looking at two corefiles.

Regards,
John

Core 1:

The XrdCl::XRootDStatus.Read() function pointer address appears to be corrupt.

(gdb) bt
#0  0x00000000ffffffff in ?? ()
#1  0x00007f68ad169e8e in XrdCl::File::Stat (this=this@entry=0x7f68300001b8, force=force@entry=false, handler=handler@entry=0x7f68ac42fbd0, timeout=timeout@entry=0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClFile.cc:161
#2  0x00007f68ad169f5a in XrdCl::File::Stat (this=this@entry=0x7f68300001b8, force=force@entry=false, response=@0x7f68ac42fca8: 0x0, timeout=timeout@entry=0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClFile.cc:174
#3  0x00007f68ad407638 in XrdPosixFile::Stat (this=0x7f6830000108, Status=..., force=force@entry=false) at /usr/src/debug/xrootd-4.8.5/src/XrdPosix/XrdPosixFile.cc:467
#4  0x00007f68ad40afb2 in XrdPosixPrepIO::Init (this=this@entry=0x7f6833b47c80, iocbP=iocbP@entry=0x7f681df09110) at /usr/src/debug/xrootd-4.8.5/src/XrdPosix/XrdPosixPrepIO.cc:69
#5  0x00007f68ad41173c in XrdPosixPrepIO::Read (this=0x7f6833b47c80, iocb=..., buff=0x7f6754000030 "", offs=292552704, rlen=32505856) at /usr/src/debug/xrootd-4.8.5/src/XrdPosix/XrdPosixPrepIO.hh:59
#6  0x00007f68ac84a25c in XrdFileCache::File::ProcessBlockRequests (this=this@entry=0x7f66ee325b70, blks=std::list = {...}) at /usr/src/debug/xrootd-4.8.5/src/XrdFileCache/XrdFileCacheFile.cc:374
#7  0x00007f68ac84e832 in XrdFileCache::File::Prefetch (this=0x7f66ee325b70) at /usr/src/debug/xrootd-4.8.5/src/XrdFileCache/XrdFileCacheFile.cc:923
#8  0x00007f68ac844e4d in XrdFileCache::Cache::Prefetch (this=0xd19150) at /usr/src/debug/xrootd-4.8.5/src/XrdFileCache/XrdFileCache.cc:570
#9  0x00007f68ac844e69 in PrefetchThread (ptr=<optimized out>) at /usr/src/debug/xrootd-4.8.5/src/XrdFileCache/XrdFileCache.cc:62
#10 0x00007f68b702c537 in XrdSysThread_Xeq (myargs=0xd1a620) at /usr/src/debug/xrootd-4.8.5/src/XrdSys/XrdSysPthread.cc:86
#11 0x00007f68b6be8e25 in start_thread (arg=0x7f68ac430700) at pthread_create.c:308
#12 0x00007f68b5eebbad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

It jumped off into nowhere? (frame 0 is 0xffffffff). Going up to the last good frame and looking at the pPlugIn pointer and vtable:

(gdb) frame 1
#1  0x00007f68ad169e8e in XrdCl::File::Stat (this=this@entry=0x7f68300001b8, force=force@entry=false, handler=handler@entry=0x7f68ac42fbd0, timeout=timeout@entry=0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClFile.cc:161
161          return pPlugIn->Stat( force, handler, timeout );
(gdb) print *pPlugIn
$10 = {_vptr.FilePlugIn = 0x7f683343f9d0}
(gdb) info vtbl pPlugIn
vtable for 'XrdCl::FilePlugIn' @ 0x7f683343f9d0 (subobject @ 0x7f68300001b8):
[0]: 0x0
[1]: 0x141
[2]: 0x7f68300001a8
[3]: 0x7f68300001a8
[4]: 0xffffffff
[5]: 0x0
[6]: 0x0
[7]: 0x6f2e726f00000001
[8]: 0x0
[9]: 0x7f68b69d8338 <std::string::_Rep::_S_empty_rep_storage+24>
[10]: 0x7f6764001730
[11]: 0x7f681df0a9b0
[12]: 0x7f6764001760
[13]: 0x7f6764000b40
[14]: 0x0

The vtable is mangled? And [4] in the vtable suspiciously matches the segfault address. Compare with the type:

(gdb) ptype pPlugIn
type = class XrdCl::FilePlugIn {
  public:
    ~FilePlugIn(int);
    virtual XrdCl::XRootDStatus Open(const std::string &, XrdCl::OpenFlags::Flags, XrdCl::Access::Mode, XrdCl::ResponseHandler *, uint16_t);
    virtual XrdCl::XRootDStatus Close(XrdCl::ResponseHandler *, uint16_t);
    virtual XrdCl::XRootDStatus Stat(bool, XrdCl::ResponseHandler *, uint16_t);
    virtual XrdCl::XRootDStatus Read(uint64_t, uint32_t, void *, XrdCl::ResponseHandler *, uint16_t);
    virtual XrdCl::XRootDStatus Write(uint64_t, uint32_t, const void *, XrdCl::ResponseHandler *, uint16_t);
    virtual XrdCl::XRootDStatus Sync(XrdCl::ResponseHandler *, uint16_t);
    virtual XrdCl::XRootDStatus Truncate(uint64_t, XrdCl::ResponseHandler *, uint16_t);
    virtual XrdCl::XRootDStatus VectorRead(const XrdCl::ChunkList &, void *, XrdCl::ResponseHandler *, uint16_t);
    virtual XrdCl::XRootDStatus Fcntl(const XrdCl::Buffer &, XrdCl::ResponseHandler *, uint16_t);
    virtual XrdCl::XRootDStatus Visa(XrdCl::ResponseHandler *, uint16_t);
    virtual bool IsOpen(void) const;
    virtual bool SetProperty(const std::string &, const std::string &);
    virtual bool GetProperty(const std::string &, std::string &) const;
} *

Stat() is the fourth entry down. Matches up with "[4]: 0xffffffff" ?

Core 2:

The XrdCl::URL.pHostId variable appears to be corrupted (unexpectedly null):

(gdb) bt
#0  _M_grab (__alloc2=..., __alloc1=..., this=<optimized out>) at /usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.h:229
#1  std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (this=0x7ff7f1fe2400, __str=<error reading variable: Cannot access memory at address 0xffffffffffffffe8>)
    at /usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.tcc:173
#2  0x00007ff81e73fad4 in GetHostId (this=<optimized out>) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClURL.hh:78
#3  XrdCl::FileStateHandler::OnOpen (this=0x7ff7a460ac10, status=status@entry=0x7ff72c0013d0, openInfo=0x7ff72c000940, hostList=hostList@entry=0x7ff753e003c0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClFileStateHandler.cc:1187
#4  0x00007ff81e7405ab in (anonymous namespace)::OpenHandler::HandleResponseWithHosts (this=0x7ff7dbe04ca0, status=0x7ff72c0013d0, response=0x7ff72c000db0, hostList=0x7ff753e003c0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClFileStateHandler.cc:84
#5  0x00007ff81e72180b in XrdCl::XRootDMsgHandler::HandleResponse (this=this@entry=0x7ff7dbe03190) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClXRootDMsgHandler.cc:1128
#6  0x00007ff81e7259c3 in XrdCl::XRootDMsgHandler::Process (this=0x7ff7dbe03190, msg=0x7ff7c8000bd0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClXRootDMsgHandler.cc:375
#7  0x00007ff81e706a4e in XrdCl::Stream::HandleIncMsgJob::Run (this=0x7ff7c8000a80, arg=<optimized out>) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClStream.hh:284
#8  0x00007ff81e76cf8f in XrdCl::JobManager::RunJobs (this=0x7ff80c005480) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClJobManager.cc:148
#9  0x00007ff81e76d1e9 in RunRunnerThread (arg=<optimized out>) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClJobManager.cc:33
#10 0x00007ff8281afe25 in start_thread (arg=0x7ff7f1fe3700) at pthread_create.c:308
#11 0x00007ff8274b2bad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) frame 3
#3  XrdCl::FileStateHandler::OnOpen (this=0x7ff7a460ac10, status=status@entry=0x7ff72c0013d0, openInfo=0x7ff72c000940, hostList=hostList@entry=0x7ff753e003c0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClFileStateHandler.cc:1187
1187	    std::string lastServer = pFileUrl->GetHostId();
(gdb) print *pFileUrl
$23 = {pHostId = <error reading variable: Cannot access memory at address 0xffffffffffffffe8>, pProtocol = "root", pUserName = "31", pPassword = "", pHostName = "redirector.osgstorage.org", pPort = 1094, pPath = "", pParams = std::map with 1 elements = {
    ["oss.lcl"] = "1"}, pURL = "root:[log in to unmask]:1094//gwdata/O1/strain.16k/frame.v1/H1/1130364928/H-H1_LOSC_16_V1-1130782720-4096.gwf?oss.lcl=1"}
(gdb) x &pFileUrl.pHostId
0x7ff7dbe05280:	0x0000000000000000


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/xrootd/xrootd","title":"xrootd/xrootd","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/email/message_cards/header.png","avatar_image_url":"https://assets-cdn.github.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/xrootd/xrootd"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"Intermittent stashcache segfaults in XrdCl (v4.8.5) (#853)"}],"action":{"name":"View Issue","url":"https://github.com/xrootd/xrootd/issues/853"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/853", "url": "https://github.com/xrootd/xrootd/issues/853", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } }, { "@type": "MessageCard", "@context": "http://schema.org/extensions", "hideOriginalBody": "false", "originator": "AF6C5A86-E920-430C-9C59-A73278B5EFEB", "title": "Intermittent stashcache segfaults in XrdCl (v4.8.5) (#853)", "sections": [ { "text": "", "activityTitle": "**jthiltges**", "activityImage": "https://assets-cdn.github.com/images/email/message_cards/avatar.png", "activitySubtitle": "@jthiltges", "facts": [ { "name": "Repository: ", "value": "xrootd/xrootd" }, { "name": "Issue #: ", "value": 853 } ] } ], "potentialAction": [ { "name": "Add a comment", "@type": "ActionCard", "inputs": [ { "isMultiLine": true, "@type": "TextInput", "id": "IssueComment", "isRequired": false } ], "actions": [ { "name": "Comment", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueComment\",\n\"repositoryFullName\": \"xrootd/xrootd\",\n\"issueId\": 853,\n\"IssueComment\": \"{{IssueComment.value}}\"\n}" } ] }, { "name": "Close issue", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueClose\",\n\"repositoryFullName\": \"xrootd/xrootd\",\n\"issueId\": 853\n}" }, { "targets": [ { "os": "default", "uri": "https://github.com/xrootd/xrootd/issues/853" } ], "@type": "OpenUri", "name": "View on GitHub" }, { "name": "Unsubscribe", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"MuteNotification\",\n\"threadId\": 406523645\n}" } ], "themeColor": "26292E" } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1