We've been seeing segfaults in XrdFileCache on a StashCache server. It appears to be related to persistent HTTP connections and heavily used files. There's an unsigned short `use` counter for the active links: https://github.com/xrootd/xrootd/blob/v4.7.1/src/XrdOfs/XrdOfsHandle.hh#L53 When the counter reaches maximum value, it appears to trigger a segfault: ``` 171113 04:11:36 3933800 unknown.343:[log in to unmask] ofs_open: attach use=65533 fn=/user/eharstad/public/blast_database/nt.fa.nsq 171113 04:11:39 3812035 unknown.463:[log in to unmask] ofs_open: attach use=65534 fn=/user/eharstad/public/blast_database/nt.fa.nsq 171113 04:11:40 3919073 unknown.343:[log in to unmask] ofs_open: attach use=65535 fn=/user/eharstad/public/blast_database/nt.fa.nsq *segfault* ``` It seems StashCache clients are keeping long-lived persistent HTTP connections. For each GET request on a file, xrootd increments the use counter for the file, but the counter is not decremented until the HTTP connection is closed. If there's a popular file (we're seeing it happen with BLAST databases in particular) with enough clients requesting various ranges inside the file, the counter eventually reaches 65536 and xrootd crashes. Backtrace of a crash on hcc-stash.unl.edu (xrootd-4.7.1-1.osg33.el7.x86_64): ``` Core was generated by `/usr/bin/xrootd -l /var/log/xrootd/xrootd.log -c /etc/xrootd/xrootd-stashcache-'. Program terminated with signal 11, Segmentation fault. #0 0x00007fd5b464b327 in XrdFileCache::IOEntireFile::FSize (this=0x7fd59000a0b0) at /usr/src/debug/xrootd-4.7.1/src/XrdFileCache/XrdFileCacheIOEntireFile.cc:76 76 return m_file->GetFileSize(); (gdb) bt #0 0x00007fd5b464b327 in XrdFileCache::IOEntireFile::FSize (this=0x7fd59000a0b0) at /usr/src/debug/xrootd-4.7.1/src/XrdFileCache/XrdFileCacheIOEntireFile.cc:76 #1 0x00007fd5b464b584 in XrdFileCache::IOEntireFile::Read (this=0x7fd59000a0b0, buff=0x7fd4dc718000 "\217\377t\351N\270QG\\Q\375\004oU\234Wy\226\256\345\267\335N\031\320[<\022\337xQO\035\355\235\037\337Q\251\026\256Tƥ\317{w\364Lh\021\354\265\177\260\257\317\362\001\274\070\337\177\064]W\353\365\275w\273r\317\363<_\340\371\177\376\337\035\204\004L\361\263\367\312\317\361\363O\240~\363\\\312\311\345\337", <incomplete sequence \337>, off=246530048, size=1048576) at /usr/src/debug/xrootd-4.7.1/src/XrdFileCache/XrdFileCacheIOEntireFile.cc:174 #2 0x00007fd5b541dd3b in XrdPosixXrootd::Pread (fildes=<optimized out>, buf=0x7fd4dc718000, nbyte=1048576, offset=246530048) at /usr/src/debug/xrootd-4.7.1/src/XrdPosix/XrdPosixXrootd.cc:677 #3 0x00007fd5b5848079 in XrdPssFile::Read (this=<optimized out>, buff=<optimized out>, offset=<optimized out>, blen=<optimized out>) at /usr/src/debug/xrootd-4.7.1/src/XrdPss/XrdPss.cc:758 #4 0x00007fd5bef58c83 in XrdOfsFile::read (this=0x7fd3790ae960, offset=246530048, buff=0x7fd4dc718000 "\217\377t\351N\270QG\\Q\375\004oU\234Wy\226\256\345\267\335N\031\320[<\022\337xQO\035\355\235\037\337Q\251\026\256Tƥ\317{w\364Lh\021\354\265\177\260\257\317\362\001\274\070\337\177\064]W\353\365\275w\273r\317\363<_\340\371\177\376\337\035\204\004L\361\263\367\312\317\361\363O\240~\363\\\312\311\345\337", <incomplete sequence \337>, blen=1048576) at /usr/src/debug/xrootd-4.7.1/src/XrdOfs/XrdOfs.cc:891 #5 0x00007fd5bef4e2be in XrdXrootdProtocol::do_ReadAll (this=this@entry=0x7fd58c5adac8, asyncOK=asyncOK@entry=1) at /usr/src/debug/xrootd-4.7.1/src/XrdXrootd/XrdXrootdXeq.cc:2001 #6 0x00007fd5bef4e6e6 in XrdXrootdProtocol::do_Read (this=this@entry=0x7fd58c5adac8) at /usr/src/debug/xrootd-4.7.1/src/XrdXrootd/XrdXrootdXeq.cc:1938 #7 0x00007fd5bef44f80 in XrdXrootdProtocol::Process2 (this=this@entry=0x7fd58c5adac8) at /usr/src/debug/xrootd-4.7.1/src/XrdXrootd/XrdXrootdProtocol.cc:463 #8 0x00007fd5bef48820 in XrdXrootdTransit::Process (this=0x7fd58c5adac0, lp=0x7fd5903cf7b8) at /usr/src/debug/xrootd-4.7.1/src/XrdXrootd/XrdXrootdTransit.cc:369 #9 0x00007fd5beccc719 in XrdLink::DoIt (this=0x7fd5903cf7b8) at /usr/src/debug/xrootd-4.7.1/src/Xrd/XrdLink.cc:426 #10 0x00007fd5beccfcff in XrdScheduler::Run (this=0x610e98 <XrdMain::Config+440>) at /usr/src/debug/xrootd-4.7.1/src/Xrd/XrdScheduler.cc:357 #11 0x00007fd5beccfe49 in XrdStartWorking (carg=<optimized out>) at /usr/src/debug/xrootd-4.7.1/src/Xrd/XrdScheduler.cc:87 #12 0x00007fd5bec8c4d7 in XrdSysThread_Xeq (myargs=0x7fd37c0d3810) at /usr/src/debug/xrootd-4.7.1/src/XrdSys/XrdSysPthread.cc:86 #13 0x00007fd5be848e25 in start_thread (arg=0x7fd3196ba700) at pthread_create.c:308 #14 0x00007fd5bdb4e34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 (gdb) print m_file $1 = (XrdFileCache::File *) 0x0 ``` It can be reproduced with a small python script doing 65536 requests on a persistent connection (can run several in parallel to speed things along): ```python #!/usr/bin/env python import requests import logging logging.basicConfig() logging.getLogger().setLevel(logging.DEBUG) s = requests.Session() headers = {"Range": "bytes=0-100"} url = 'http://stash.example.edu:8000/testfile' for i in range(0, 65537): print "Request #%d" % i r = s.get(url, headers=headers) print "Status = %d\n" % r.status_code ``` -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/xrootd/xrootd/issues/623 ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1