Hello, A segfault with the following stack trace was recently reported on an EOS disk server node at cern (running custom xrootd build, and with EOS plugins, approximately equivalent to xrootd 5.5.2): ``` Core was generated by `/opt/eos/xrootd/bin/xrootd -n fst -c /etc/xrd.cf.fst -l /var/log/eos/xrdlog.fst'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f8a36b66a4b in XrdXrootdMonFile::GetSlot (slotSZ=<optimized out>) at /usr/include/bits/byteswap.h:47 [...] (gdb) where #0 0x00007f8a36b66a4b in XrdXrootdMonFile::GetSlot (slotSZ=<optimized out>) at /usr/include/bits/byteswap.h:47 #1 0x00007f8a36b66bd3 in XrdXrootdMonFile::Close (fsP=<optimized out>, isDisc=<optimized out>) at /usr/src/debug/xrootd-5.5.5/src/XrdXrootd/XrdXrootdMonFile.cc:169 #2 0x00007f8a36b6409b in XrdXrootdFileTable::Del (this=0x7f82a27db520, monP=<optimized out>, fnum=<optimized out>, fnum@entry=0, dodel=<optimized out>) at /usr/src/debug/xrootd-5.5.5/src/XrdXrootd/XrdXrootdFile.cc:309 #3 0x00007f8a36b79e88 in XrdXrootdProtocol::do_Close (this=this@entry=0x7f8792088500) at /usr/src/debug/xrootd-5.5.5/src/XrdXrootd/XrdXrootdMonitor.hh:192 #4 0x00007f8a36b6fffe in XrdXrootdProtocol::Process2 (this=0x7f8792088500) at /usr/src/debug/xrootd-5.5.5/src/XrdXrootd/XrdXrootdProtocol.cc:492 #5 0x00007f8a368af8c0 in XrdLinkXeq::DoIt (this=<optimized out>) at /usr/src/debug/xrootd-5.5.5/src/Xrd/XrdLinkXeq.cc:320 #6 XrdLinkXeq::DoIt (this=0x7f8a0f8d0c50) at /usr/src/debug/xrootd-5.5.5/src/Xrd/XrdLinkXeq.cc:308 #7 0x00007f8a368b2867 in XrdScheduler::Run (this=0x615e80 <XrdGlobal::Sched>) at /usr/src/debug/xrootd-5.5.5/src/Xrd/XrdScheduler.cc:406 #8 0x00007f8a368b2989 in XrdStartWorking (carg=<optimized out>) at /usr/src/debug/xrootd-5.5.5/src/Xrd/XrdScheduler.cc:89 #9 0x00007f8a36842e27 in XrdSysThread_Xeq (myargs=0x7f847b7897a0) at /usr/src/debug/xrootd-5.5.5/src/XrdSys/XrdSysPthread.cc:86 #10 0x00007f8a359a7ea5 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f8a356d0b0d in clone () from /lib64/libc.so.6 ``` It's suspected to be a race condition inside the xrootd server during a delayed file close(). I'm attaching a diff to build a slightly modified server (on top of current mater brach) that seems to reproduce the issue. I've also prepared a possible fix, but would it would be great to see if there's perhaps another opinion about the best approach - if wanted I can open a pull request with a fix and have discussion there about it; or if e.g. Andy would like to look into, I can leave it with only this ticket in the meantime. Attaching reproducer-modification, some instructions and possible fix shortly. -- Reply to this email directly or view it on GitHub: https://github.com/xrootd/xrootd/issues/1898 You are receiving this because you are subscribed to this thread. Message ID: <[log in to unmask]> ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1