Syracuse is reporting intermittent segfaults in their stashcache server, anywhere from minutes to hours between. Reviewing a couple core files, it looks like there's memory corruption happening. They're running xrootd-4.8.5-1.osg34.el7.x86_64 I'll update this with more info as we work on debugging. Below are notes from looking at two corefiles. Regards, John Core 1: ------- The XrdCl::XRootDStatus.Read() function pointer address appears to be corrupt. ``` (gdb) bt #0 0x00000000ffffffff in ?? () #1 0x00007f68ad169e8e in XrdCl::File::Stat (this=this@entry=0x7f68300001b8, force=force@entry=false, handler=handler@entry=0x7f68ac42fbd0, timeout=timeout@entry=0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClFile.cc:161 #2 0x00007f68ad169f5a in XrdCl::File::Stat (this=this@entry=0x7f68300001b8, force=force@entry=false, response=@0x7f68ac42fca8: 0x0, timeout=timeout@entry=0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClFile.cc:174 #3 0x00007f68ad407638 in XrdPosixFile::Stat (this=0x7f6830000108, Status=..., force=force@entry=false) at /usr/src/debug/xrootd-4.8.5/src/XrdPosix/XrdPosixFile.cc:467 #4 0x00007f68ad40afb2 in XrdPosixPrepIO::Init (this=this@entry=0x7f6833b47c80, iocbP=iocbP@entry=0x7f681df09110) at /usr/src/debug/xrootd-4.8.5/src/XrdPosix/XrdPosixPrepIO.cc:69 #5 0x00007f68ad41173c in XrdPosixPrepIO::Read (this=0x7f6833b47c80, iocb=..., buff=0x7f6754000030 "", offs=292552704, rlen=32505856) at /usr/src/debug/xrootd-4.8.5/src/XrdPosix/XrdPosixPrepIO.hh:59 #6 0x00007f68ac84a25c in XrdFileCache::File::ProcessBlockRequests (this=this@entry=0x7f66ee325b70, blks=std::list = {...}) at /usr/src/debug/xrootd-4.8.5/src/XrdFileCache/XrdFileCacheFile.cc:374 #7 0x00007f68ac84e832 in XrdFileCache::File::Prefetch (this=0x7f66ee325b70) at /usr/src/debug/xrootd-4.8.5/src/XrdFileCache/XrdFileCacheFile.cc:923 #8 0x00007f68ac844e4d in XrdFileCache::Cache::Prefetch (this=0xd19150) at /usr/src/debug/xrootd-4.8.5/src/XrdFileCache/XrdFileCache.cc:570 #9 0x00007f68ac844e69 in PrefetchThread (ptr=<optimized out>) at /usr/src/debug/xrootd-4.8.5/src/XrdFileCache/XrdFileCache.cc:62 #10 0x00007f68b702c537 in XrdSysThread_Xeq (myargs=0xd1a620) at /usr/src/debug/xrootd-4.8.5/src/XrdSys/XrdSysPthread.cc:86 #11 0x00007f68b6be8e25 in start_thread (arg=0x7f68ac430700) at pthread_create.c:308 #12 0x00007f68b5eebbad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 ``` It jumped off into nowhere? (frame 0 is 0xffffffff). Going up to the last good frame and looking at the pPlugIn pointer and vtable: ``` (gdb) frame 1 #1 0x00007f68ad169e8e in XrdCl::File::Stat (this=this@entry=0x7f68300001b8, force=force@entry=false, handler=handler@entry=0x7f68ac42fbd0, timeout=timeout@entry=0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClFile.cc:161 161 return pPlugIn->Stat( force, handler, timeout ); (gdb) print *pPlugIn $10 = {_vptr.FilePlugIn = 0x7f683343f9d0} (gdb) info vtbl pPlugIn vtable for 'XrdCl::FilePlugIn' @ 0x7f683343f9d0 (subobject @ 0x7f68300001b8): [0]: 0x0 [1]: 0x141 [2]: 0x7f68300001a8 [3]: 0x7f68300001a8 [4]: 0xffffffff [5]: 0x0 [6]: 0x0 [7]: 0x6f2e726f00000001 [8]: 0x0 [9]: 0x7f68b69d8338 <std::string::_Rep::_S_empty_rep_storage+24> [10]: 0x7f6764001730 [11]: 0x7f681df0a9b0 [12]: 0x7f6764001760 [13]: 0x7f6764000b40 [14]: 0x0 ``` The vtable is mangled? And [4] in the vtable suspiciously matches the segfault address. Compare with the type: ``` (gdb) ptype pPlugIn type = class XrdCl::FilePlugIn { public: ~FilePlugIn(int); virtual XrdCl::XRootDStatus Open(const std::string &, XrdCl::OpenFlags::Flags, XrdCl::Access::Mode, XrdCl::ResponseHandler *, uint16_t); virtual XrdCl::XRootDStatus Close(XrdCl::ResponseHandler *, uint16_t); virtual XrdCl::XRootDStatus Stat(bool, XrdCl::ResponseHandler *, uint16_t); virtual XrdCl::XRootDStatus Read(uint64_t, uint32_t, void *, XrdCl::ResponseHandler *, uint16_t); virtual XrdCl::XRootDStatus Write(uint64_t, uint32_t, const void *, XrdCl::ResponseHandler *, uint16_t); virtual XrdCl::XRootDStatus Sync(XrdCl::ResponseHandler *, uint16_t); virtual XrdCl::XRootDStatus Truncate(uint64_t, XrdCl::ResponseHandler *, uint16_t); virtual XrdCl::XRootDStatus VectorRead(const XrdCl::ChunkList &, void *, XrdCl::ResponseHandler *, uint16_t); virtual XrdCl::XRootDStatus Fcntl(const XrdCl::Buffer &, XrdCl::ResponseHandler *, uint16_t); virtual XrdCl::XRootDStatus Visa(XrdCl::ResponseHandler *, uint16_t); virtual bool IsOpen(void) const; virtual bool SetProperty(const std::string &, const std::string &); virtual bool GetProperty(const std::string &, std::string &) const; } * ``` Stat() is the fourth entry down. Matches up with "[4]: 0xffffffff" ? Core 2: ------- The XrdCl::URL.pHostId variable appears to be corrupted (unexpectedly null): ``` (gdb) bt #0 _M_grab (__alloc2=..., __alloc1=..., this=<optimized out>) at /usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.h:229 #1 std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (this=0x7ff7f1fe2400, __str=<error reading variable: Cannot access memory at address 0xffffffffffffffe8>) at /usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.tcc:173 #2 0x00007ff81e73fad4 in GetHostId (this=<optimized out>) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClURL.hh:78 #3 XrdCl::FileStateHandler::OnOpen (this=0x7ff7a460ac10, status=status@entry=0x7ff72c0013d0, openInfo=0x7ff72c000940, hostList=hostList@entry=0x7ff753e003c0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClFileStateHandler.cc:1187 #4 0x00007ff81e7405ab in (anonymous namespace)::OpenHandler::HandleResponseWithHosts (this=0x7ff7dbe04ca0, status=0x7ff72c0013d0, response=0x7ff72c000db0, hostList=0x7ff753e003c0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClFileStateHandler.cc:84 #5 0x00007ff81e72180b in XrdCl::XRootDMsgHandler::HandleResponse (this=this@entry=0x7ff7dbe03190) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClXRootDMsgHandler.cc:1128 #6 0x00007ff81e7259c3 in XrdCl::XRootDMsgHandler::Process (this=0x7ff7dbe03190, msg=0x7ff7c8000bd0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClXRootDMsgHandler.cc:375 #7 0x00007ff81e706a4e in XrdCl::Stream::HandleIncMsgJob::Run (this=0x7ff7c8000a80, arg=<optimized out>) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClStream.hh:284 #8 0x00007ff81e76cf8f in XrdCl::JobManager::RunJobs (this=0x7ff80c005480) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClJobManager.cc:148 #9 0x00007ff81e76d1e9 in RunRunnerThread (arg=<optimized out>) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClJobManager.cc:33 #10 0x00007ff8281afe25 in start_thread (arg=0x7ff7f1fe3700) at pthread_create.c:308 #11 0x00007ff8274b2bad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 ``` ``` (gdb) frame 3 #3 XrdCl::FileStateHandler::OnOpen (this=0x7ff7a460ac10, status=status@entry=0x7ff72c0013d0, openInfo=0x7ff72c000940, hostList=hostList@entry=0x7ff753e003c0) at /usr/src/debug/xrootd-4.8.5/src/XrdCl/XrdClFileStateHandler.cc:1187 1187 std::string lastServer = pFileUrl->GetHostId(); (gdb) print *pFileUrl $23 = {pHostId = <error reading variable: Cannot access memory at address 0xffffffffffffffe8>, pProtocol = "root", pUserName = "31", pPassword = "", pHostName = "redirector.osgstorage.org", pPort = 1094, pPath = "", pParams = std::map with 1 elements = { ["oss.lcl"] = "1"}, pURL = "root:[log in to unmask]:1094//gwdata/O1/strain.16k/frame.v1/H1/1130364928/H-H1_LOSC_16_V1-1130782720-4096.gwf?oss.lcl=1"} (gdb) x &pFileUrl.pHostId 0x7ff7dbe05280: 0x0000000000000000 ``` -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/xrootd/xrootd/issues/853 ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1