Print

Print


We are using a custom build version of XRootD 5.4.3 with 3 extra commits to address some bugs that were affecting some of the more demanding EOS instances at CERN. The 3 extra commits are the following:
4df4cda
624daad
50da3f0

Unfortunately, using this XRootD 5.4.3++ version we see crashes (SEGV) in "random" places of the code which don't make much sense. Therefore, we deployed an ASAN enabled version of EOS on some of the diskservers that were crashing and it detected a memory corruption when handling pgRead operations. These operations come most likely from new xrdcp commands that are probably the only ones that trigger the pgRead functionality on the server side.

Below you have a sample output of the ASAN report:

=================================================================
==33310==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f3ba0b0e000 at pc 0x7f3d1ffe1f5d bp 0x7f3b6e775da0 sp 0x7f3b6e775548
    #0 0x7f3d1ffe1f5c  (/usr/lib64/libasan.so.5+0x57f5c)
    #1 0x7f3d015b8a2d in eos::fst::XrdFstOssFile::Read(void*, long, unsigned long) /root/rpmbuild/BUILD/eos-5.0.26-1/fst/XrdFstOssFile.cc:267
    #2 0x7f3d1f9c349c in XrdOfsFile::read(long long, char*, int) (/opt/eos/xrootd/lib64/libXrdServer.so.3+0x13949c)
    #3 0x7f3d15f5768e in eos::fst::XrdFstOfsFile::readofs(long long, char*, int) /root/rpmbuild/BUILD/eos-5.0.26-1/fst/XrdFstOfsFile.cc:2235
    #4 0x7f3d1618e94a in eos::fst::LocalIo::fileRead(long long, char*, int, unsigned short) /root/rpmbuild/BUILD/eos-5.0.26-1/fst/io/local/LocalIo.cc:114
    #5 0x7f3d14951ae3 in eos::fst::ReplicaParLayout::Read(long long, char*, int, bool) /root/rpmbuild/BUILD/eos-5.0.26-1/fst/layout/ReplicaParLayout.cc:211
    #6 0x7f3d15f5ca5f in eos::fst::XrdFstOfsFile::read(long long, char*, int) /root/rpmbuild/BUILD/eos-5.0.26-1/fst/XrdFstOfsFile.cc:787
    #7 0x7f3d15f4ed0d in eos::fst::XrdFstOfsFile::pgRead(long long, char*, int, unsigned int*, unsigned long) /root/rpmbuild/BUILD/eos-5.0.26-1/fst/XrdFstOfsFile.cc:858
    #8 0x7f3d1f9b9561 in XrdXrootdProtocol::do_PgRIO() (/opt/eos/xrootd/lib64/libXrdServer.so.3+0x12f561)
    #9 0x7f3d1f9ba65f in XrdXrootdProtocol::do_PgRead() (/opt/eos/xrootd/lib64/libXrdServer.so.3+0x13065f)
    #10 0x7f3d1f9735f7 in XrdXrootdProtocol::Process2() (/opt/eos/xrootd/lib64/libXrdServer.so.3+0xe95f7)
    #11 0x7f3d1f58de76 in XrdLinkXeq::DoIt() (/opt/eos/xrootd/lib64/libXrdUtils.so.3+0x216e76)
    #12 0x7f3d1f585319 in XrdLink::setProtocol(XrdProtocol*, bool, bool) (/opt/eos/xrootd/lib64/libXrdUtils.so.3+0x20e319)
    #13 0x7f3d1f595bf5 in XrdScheduler::Run() (/opt/eos/xrootd/lib64/libXrdUtils.so.3+0x21ebf5)
    #14 0x7f3d1f595f28 in XrdStartWorking(void*) (/opt/eos/xrootd/lib64/libXrdUtils.so.3+0x21ef28)
    #15 0x7f3d1f432199 in XrdSysThread_Xeq (/opt/eos/xrootd/lib64/libXrdUtils.so.3+0xbb199)
    #16 0x7f3d1e536ea4 in start_thread (/lib64/libpthread.so.0+0x7ea4)
    #17 0x7f3d1e25fb0c in clone (/lib64/libc.so.6+0xfeb0c)

0x7f3ba0b0e000 is located 0 bytes to the right of 2097152-byte region [0x7f3ba090e000,0x7f3ba0b0e000)
allocated by thread T589561 here:
    #0 0x7f3d200974fd in posix_memalign (/usr/lib64/libasan.so.5+0x10d4fd)
    #1 0x7f3d1f580e2a in XrdBuffManager::Obtain(int) (/opt/eos/xrootd/lib64/libXrdUtils.so.3+0x209e2a)

SUMMARY: AddressSanitizer: heap-buffer-overflow (/usr/lib64/libasan.so.5+0x57f5c)
Shadow bytes around the buggy address:
  0x0fe7f4159bb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe7f4159bc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe7f4159bd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe7f4159be0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe7f4159bf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0fe7f4159c00:[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0fe7f4159c10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0fe7f4159c20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0fe7f4159c30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0fe7f4159c40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0fe7f4159c50: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==33310==ABORTING

Has anyone experienced similar crashes with the latest XRootD 5.4.3? We assume this is not a side effect of any of the 3 extra commits that we are using.

Thanks,
Elvin


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <xrootd/xrootd/issues/1743@github.com>

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/1743", "url": "https://github.com/xrootd/xrootd/issues/1743", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1