Print

Print


We've been seeing segfaults in XrdFileCache on a StashCache server. It appears to be related to persistent HTTP connections and heavily used files.

There's an unsigned short use counter for the active links: https://github.com/xrootd/xrootd/blob/v4.7.1/src/XrdOfs/XrdOfsHandle.hh#L53

When the counter reaches maximum value, it appears to trigger a segfault:

171113 04:11:36 3933800 unknown.343:[log in to unmask] ofs_open: attach use=65533 fn=/user/eharstad/public/blast_database/nt.fa.nsq
171113 04:11:39 3812035 unknown.463:[log in to unmask] ofs_open: attach use=65534 fn=/user/eharstad/public/blast_database/nt.fa.nsq
171113 04:11:40 3919073 unknown.343:[log in to unmask] ofs_open: attach use=65535 fn=/user/eharstad/public/blast_database/nt.fa.nsq
*segfault*

It seems StashCache clients are keeping long-lived persistent HTTP connections. For each GET request on a file, xrootd increments the use counter for the file, but the counter is not decremented until the HTTP connection is closed. If there's a popular file (we're seeing it happen with BLAST databases in particular) with enough clients requesting various ranges inside the file, the counter eventually reaches 65536 and xrootd crashes.

Backtrace of a crash on hcc-stash.unl.edu (xrootd-4.7.1-1.osg33.el7.x86_64):

Core was generated by `/usr/bin/xrootd -l /var/log/xrootd/xrootd.log -c /etc/xrootd/xrootd-stashcache-'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fd5b464b327 in XrdFileCache::IOEntireFile::FSize (this=0x7fd59000a0b0) at /usr/src/debug/xrootd-4.7.1/src/XrdFileCache/XrdFileCacheIOEntireFile.cc:76
76       return m_file->GetFileSize();
(gdb) bt
#0  0x00007fd5b464b327 in XrdFileCache::IOEntireFile::FSize (this=0x7fd59000a0b0) at /usr/src/debug/xrootd-4.7.1/src/XrdFileCache/XrdFileCacheIOEntireFile.cc:76
#1  0x00007fd5b464b584 in XrdFileCache::IOEntireFile::Read (this=0x7fd59000a0b0,
    buff=0x7fd4dc718000 "\217\377t\351N\270QG\\Q\375\004oU\234Wy\226\256\345\267\335N\031\320[<\022\337xQO\035\355\235\037\337Q\251\026\256Tƥ\317{w\364Lh\021\354\265\177\260\257\317\362\001\274\070\337\177\064]W\353\365\275w\273r\317\363<_\340\371\177\376\337\035\204\004L\361\263\367\312\317\361\363O\240~\363\\\312\311\345\337", <incomplete sequence \337>, off=246530048, size=1048576) at /usr/src/debug/xrootd-4.7.1/src/XrdFileCache/XrdFileCacheIOEntireFile.cc:174
#2  0x00007fd5b541dd3b in XrdPosixXrootd::Pread (fildes=<optimized out>, buf=0x7fd4dc718000, nbyte=1048576, offset=246530048) at /usr/src/debug/xrootd-4.7.1/src/XrdPosix/XrdPosixXrootd.cc:677
#3  0x00007fd5b5848079 in XrdPssFile::Read (this=<optimized out>, buff=<optimized out>, offset=<optimized out>, blen=<optimized out>) at /usr/src/debug/xrootd-4.7.1/src/XrdPss/XrdPss.cc:758
#4  0x00007fd5bef58c83 in XrdOfsFile::read (this=0x7fd3790ae960, offset=246530048,
    buff=0x7fd4dc718000 "\217\377t\351N\270QG\\Q\375\004oU\234Wy\226\256\345\267\335N\031\320[<\022\337xQO\035\355\235\037\337Q\251\026\256Tƥ\317{w\364Lh\021\354\265\177\260\257\317\362\001\274\070\337\177\064]W\353\365\275w\273r\317\363<_\340\371\177\376\337\035\204\004L\361\263\367\312\317\361\363O\240~\363\\\312\311\345\337", <incomplete sequence \337>, blen=1048576) at /usr/src/debug/xrootd-4.7.1/src/XrdOfs/XrdOfs.cc:891
#5  0x00007fd5bef4e2be in XrdXrootdProtocol::do_ReadAll (this=this@entry=0x7fd58c5adac8, asyncOK=asyncOK@entry=1) at /usr/src/debug/xrootd-4.7.1/src/XrdXrootd/XrdXrootdXeq.cc:2001
#6  0x00007fd5bef4e6e6 in XrdXrootdProtocol::do_Read (this=this@entry=0x7fd58c5adac8) at /usr/src/debug/xrootd-4.7.1/src/XrdXrootd/XrdXrootdXeq.cc:1938
#7  0x00007fd5bef44f80 in XrdXrootdProtocol::Process2 (this=this@entry=0x7fd58c5adac8) at /usr/src/debug/xrootd-4.7.1/src/XrdXrootd/XrdXrootdProtocol.cc:463
#8  0x00007fd5bef48820 in XrdXrootdTransit::Process (this=0x7fd58c5adac0, lp=0x7fd5903cf7b8) at /usr/src/debug/xrootd-4.7.1/src/XrdXrootd/XrdXrootdTransit.cc:369
#9  0x00007fd5beccc719 in XrdLink::DoIt (this=0x7fd5903cf7b8) at /usr/src/debug/xrootd-4.7.1/src/Xrd/XrdLink.cc:426
#10 0x00007fd5beccfcff in XrdScheduler::Run (this=0x610e98 <XrdMain::Config+440>) at /usr/src/debug/xrootd-4.7.1/src/Xrd/XrdScheduler.cc:357
#11 0x00007fd5beccfe49 in XrdStartWorking (carg=<optimized out>) at /usr/src/debug/xrootd-4.7.1/src/Xrd/XrdScheduler.cc:87
#12 0x00007fd5bec8c4d7 in XrdSysThread_Xeq (myargs=0x7fd37c0d3810) at /usr/src/debug/xrootd-4.7.1/src/XrdSys/XrdSysPthread.cc:86
#13 0x00007fd5be848e25 in start_thread (arg=0x7fd3196ba700) at pthread_create.c:308
#14 0x00007fd5bdb4e34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) print m_file
$1 = (XrdFileCache::File *) 0x0

It can be reproduced with a small python script doing 65536 requests on a persistent connection (can run several in parallel to speed things along):

#!/usr/bin/env python

import requests
import logging

logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)

s = requests.Session()
headers = {"Range": "bytes=0-100"}

url = 'http://stash.example.edu:8000/testfile'

for i in range(0, 65537):
    print "Request #%d" % i
    r = s.get(url, headers=headers)
    print "Status = %d\n" % r.status_code


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/xrootd/xrootd","title":"xrootd/xrootd","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/xrootd/xrootd"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"segfault in XrdFileCache (#623)"}],"action":{"name":"View Issue","url":"https://github.com/xrootd/xrootd/issues/623"}}}

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1