Print

Print


Syracuse is reporting intermittent segfaults in their stashcache server, anywhere from minutes to hours between. Reviewing a couple core files, it looks like there's memory corruption happening.

They're running xrootd-4.8.5-1.osg34.el7.x86_64

I'll update this with more info as we work on debugging. Below are notes from looking at two corefiles.

Regards,
John

Core 1:
-------

The XrdCl::XRootDStatus.Read() function pointer address appears to be corrupt.

```
(gdb) bt
#0  0x00000000ffffffff in ?? ()
#1  0x00007f68ad169e8e in XrdCl::File::Stat (this=this@entry=0x7f68300001b8, force=force@entry=false, handler=handler@entry=0x7f68ac42fbd0, timeout=timeout@entry=0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClFile.cc:161
#2  0x00007f68ad169f5a in XrdCl::File::Stat (this=this@entry=0x7f68300001b8, force=force@entry=false, response=@0x7f68ac42fca8: 0x0, timeout=timeout@entry=0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClFile.cc:174
#3  0x00007f68ad407638 in XrdPosixFile::Stat (this=0x7f6830000108, Status=..., force=force@entry=false) at /usr/src/debug/xrootd-4.8.5/src/XrdPosix/XrdPosixFile.cc:467
#4  0x00007f68ad40afb2 in XrdPosixPrepIO::Init (this=this@entry=0x7f6833b47c80, iocbP=iocbP@entry=0x7f681df09110) at /usr/src/debug/xrootd-4.8.5/src/XrdPosix/XrdPosixPrepIO.cc:69
#5  0x00007f68ad41173c in XrdPosixPrepIO::Read (this=0x7f6833b47c80, iocb=..., buff=0x7f6754000030 "", offs=292552704, rlen=32505856) at /usr/src/debug/xrootd-4.8.5/src/XrdPosix/XrdPosixPrepIO.hh:59
#6  0x00007f68ac84a25c in XrdFileCache::File::ProcessBlockRequests (this=this@entry=0x7f66ee325b70, blks=std::list = {...}) at /usr/src/debug/xrootd-4.8.5/src/XrdFileCache/XrdFileCacheFile.cc:374
#7  0x00007f68ac84e832 in XrdFileCache::File::Prefetch (this=0x7f66ee325b70) at /usr/src/debug/xrootd-4.8.5/src/XrdFileCache/XrdFileCacheFile.cc:923
#8  0x00007f68ac844e4d in XrdFileCache::Cache::Prefetch (this=0xd19150) at /usr/src/debug/xrootd-4.8.5/src/XrdFileCache/XrdFileCache.cc:570
#9  0x00007f68ac844e69 in PrefetchThread (ptr=<optimized out>) at /usr/src/debug/xrootd-4.8.5/src/XrdFileCache/XrdFileCache.cc:62
#10 0x00007f68b702c537 in XrdSysThread_Xeq (myargs=0xd1a620) at /usr/src/debug/xrootd-4.8.5/src/XrdSys/XrdSysPthread.cc:86
#11 0x00007f68b6be8e25 in start_thread (arg=0x7f68ac430700) at pthread_create.c:308
#12 0x00007f68b5eebbad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
```

It jumped off into nowhere? (frame 0 is 0xffffffff). Going up to the last good frame and looking at the pPlugIn pointer and vtable:

```
(gdb) frame 1
#1  0x00007f68ad169e8e in XrdCl::File::Stat (this=this@entry=0x7f68300001b8, force=force@entry=false, handler=handler@entry=0x7f68ac42fbd0, timeout=timeout@entry=0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClFile.cc:161
161          return pPlugIn->Stat( force, handler, timeout );
(gdb) print *pPlugIn
$10 = {_vptr.FilePlugIn = 0x7f683343f9d0}
(gdb) info vtbl pPlugIn
vtable for 'XrdCl::FilePlugIn' @ 0x7f683343f9d0 (subobject @ 0x7f68300001b8):
[0]: 0x0
[1]: 0x141
[2]: 0x7f68300001a8
[3]: 0x7f68300001a8
[4]: 0xffffffff
[5]: 0x0
[6]: 0x0
[7]: 0x6f2e726f00000001
[8]: 0x0
[9]: 0x7f68b69d8338 <std::string::_Rep::_S_empty_rep_storage+24>
[10]: 0x7f6764001730
[11]: 0x7f681df0a9b0
[12]: 0x7f6764001760
[13]: 0x7f6764000b40
[14]: 0x0
```

The vtable is mangled? And [4] in the vtable suspiciously matches the segfault address. Compare with the type:

```
(gdb) ptype pPlugIn
type = class XrdCl::FilePlugIn {
  public:
    ~FilePlugIn(int);
    virtual XrdCl::XRootDStatus Open(const std::string &, XrdCl::OpenFlags::Flags, XrdCl::Access::Mode, XrdCl::ResponseHandler *, uint16_t);
    virtual XrdCl::XRootDStatus Close(XrdCl::ResponseHandler *, uint16_t);
    virtual XrdCl::XRootDStatus Stat(bool, XrdCl::ResponseHandler *, uint16_t);
    virtual XrdCl::XRootDStatus Read(uint64_t, uint32_t, void *, XrdCl::ResponseHandler *, uint16_t);
    virtual XrdCl::XRootDStatus Write(uint64_t, uint32_t, const void *, XrdCl::ResponseHandler *, uint16_t);
    virtual XrdCl::XRootDStatus Sync(XrdCl::ResponseHandler *, uint16_t);
    virtual XrdCl::XRootDStatus Truncate(uint64_t, XrdCl::ResponseHandler *, uint16_t);
    virtual XrdCl::XRootDStatus VectorRead(const XrdCl::ChunkList &, void *, XrdCl::ResponseHandler *, uint16_t);
    virtual XrdCl::XRootDStatus Fcntl(const XrdCl::Buffer &, XrdCl::ResponseHandler *, uint16_t);
    virtual XrdCl::XRootDStatus Visa(XrdCl::ResponseHandler *, uint16_t);
    virtual bool IsOpen(void) const;
    virtual bool SetProperty(const std::string &, const std::string &);
    virtual bool GetProperty(const std::string &, std::string &) const;
} *
```

Stat() is the fourth entry down. Matches up with "[4]: 0xffffffff" ? 

Core 2:
-------

The XrdCl::URL.pHostId variable appears to be corrupted (unexpectedly null):

```
(gdb) bt
#0  _M_grab (__alloc2=..., __alloc1=..., this=<optimized out>) at /usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.h:229
#1  std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (this=0x7ff7f1fe2400, __str=<error reading variable: Cannot access memory at address 0xffffffffffffffe8>)
    at /usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.tcc:173
#2  0x00007ff81e73fad4 in GetHostId (this=<optimized out>) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClURL.hh:78
#3  XrdCl::FileStateHandler::OnOpen (this=0x7ff7a460ac10, status=status@entry=0x7ff72c0013d0, openInfo=0x7ff72c000940, hostList=hostList@entry=0x7ff753e003c0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClFileStateHandler.cc:1187
#4  0x00007ff81e7405ab in (anonymous namespace)::OpenHandler::HandleResponseWithHosts (this=0x7ff7dbe04ca0, status=0x7ff72c0013d0, response=0x7ff72c000db0, hostList=0x7ff753e003c0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClFileStateHandler.cc:84
#5  0x00007ff81e72180b in XrdCl::XRootDMsgHandler::HandleResponse (this=this@entry=0x7ff7dbe03190) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClXRootDMsgHandler.cc:1128
#6  0x00007ff81e7259c3 in XrdCl::XRootDMsgHandler::Process (this=0x7ff7dbe03190, msg=0x7ff7c8000bd0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClXRootDMsgHandler.cc:375
#7  0x00007ff81e706a4e in XrdCl::Stream::HandleIncMsgJob::Run (this=0x7ff7c8000a80, arg=<optimized out>) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClStream.hh:284
#8  0x00007ff81e76cf8f in XrdCl::JobManager::RunJobs (this=0x7ff80c005480) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClJobManager.cc:148
#9  0x00007ff81e76d1e9 in RunRunnerThread (arg=<optimized out>) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClJobManager.cc:33
#10 0x00007ff8281afe25 in start_thread (arg=0x7ff7f1fe3700) at pthread_create.c:308
#11 0x00007ff8274b2bad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
```

```
(gdb) frame 3
#3  XrdCl::FileStateHandler::OnOpen (this=0x7ff7a460ac10, status=status@entry=0x7ff72c0013d0, openInfo=0x7ff72c000940, hostList=hostList@entry=0x7ff753e003c0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClFileStateHandler.cc:1187
1187	    std::string lastServer = pFileUrl->GetHostId();
(gdb) print *pFileUrl
$23 = {pHostId = <error reading variable: Cannot access memory at address 0xffffffffffffffe8>, pProtocol = "root", pUserName = "31", pPassword = "", pHostName = "redirector.osgstorage.org", pPort = 1094, pPath = "", pParams = std::map with 1 elements = {
    ["oss.lcl"] = "1"}, pURL = "root:[log in to unmask]:1094//gwdata/O1/strain.16k/frame.v1/H1/1130364928/H-H1_LOSC_16_V1-1130782720-4096.gwf?oss.lcl=1"}
(gdb) x &pFileUrl.pHostId
0x7ff7dbe05280:	0x0000000000000000
```

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/xrootd/xrootd/issues/853

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1