Print

Print


Versions:
* xrootd-5.1.0-0.rc5.el7.x86_64
* xrootd-hdfs-2.1.8-1.1.osgup.el7.x86_64

An error condition reported by the xrootd-hdfs layer appears to cause TPC to get into a recursive loop, ending in a segfault.

The two TPC::Stream::Write() functions call each other:
* https://github.com/xrootd/xrootd/blob/v5.1.0-rc5/src/XrdTpc/XrdTpcStream.hh#L82
* https://github.com/xrootd/xrootd/blob/v5.1.0-rc5/src/XrdTpc/XrdTpcStream.cc#L112

When hdfsWrite reports an error, these two functions can get into a loop. Reviewing the core dump shows thousands of stack frames and a segfault.

<details>
  <summary>XRootD logs</summary>

```
210121 07:31:54 30996 sysMacaroonGen: ID=39cac713-6b57-4c32-9a9a-a682a2607e44, resource=/user/dteam/domatests/https/domatest/file5_cbbcb6a0-12c6-4eb8-a83c-b176d665bc94, protocol=https, name=dteam, host=[2001:1458:201:e3::100:568], vorg=dteam, groups=/dteam, endorsements=/dteam/Role=NULL/Capability=NULL, base_activities=activity:READ_METADATA,UPLOAD,DOWNLOAD,DELETE,MANAGE,UPDATE_METADATA,LIST, user_caveat=activity:MANAGE,UPLOAD,DELETE,LIST, expires=2021-01-21T16:08:54Z
210121 07:31:55 13402 XrootdBridge: dteam.38195:[log in to unmask] login as dteam
210121 07:31:55 13402 hdfs_stat: Unable to stat /user/dteam/domatests/https/domatest/file5_cbbcb6a0-12c6-4eb8-a83c-b176d665bc94; No such file or directory
210121 07:31:55 13402 ofs_stat: dteam.38195:[log in to unmask] Unable to locate /user/dteam/domatests/https/domatest/file5_cbbcb6a0-12c6-4eb8-a83c-b176d665bc94; no such file or directory
210121 07:31:55 13402 dteam.38195:[log in to unmask] XrootdResponse: sending err 3011: Unable to locate /user/dteam/domatests/https/domatest/file5_cbbcb6a0-12c6-4eb8-a83c-b176d665bc94; no such file or directory
210121 07:31:55 13402 XrootdXeq: dteam.38195:[log in to unmask] disc 0:00:00 (send failure)
210121 07:31:56 6671 XrootdBridge: dteam.38196:[log in to unmask] login as dteam
210121 07:31:56 6671 hdfs_stat: Unable to stat /user/dteam/domatests/https/domatest/file5_cbbcb6a0-12c6-4eb8-a83c-b176d665bc94; No such file or directory
210121 07:31:56 6671 ofs_stat: dteam.38196:[log in to unmask] Unable to locate /user/dteam/domatests/https/domatest/file5_cbbcb6a0-12c6-4eb8-a83c-b176d665bc94; no such file or directory
210121 07:31:56 6671 dteam.38196:[log in to unmask] XrootdResponse: sending err 3011: Unable to locate /user/dteam/domatests/https/domatest/file5_cbbcb6a0-12c6-4eb8-a83c-b176d665bc94; no such file or directory
210121 07:31:56 6671 XrootdXeq: dteam.38196:[log in to unmask] disc 0:00:01 (send failure)
210121 07:31:57 30399 TPC_PullRequest: event=PULL_START, local=/user/dteam/domatests/https/domatest/file5_cbbcb6a0-12c6-4eb8-a83c-b176d665bc94, remote=https://t2-dpm-01.na.infn.it:443/dpm/na.infn.it/home/dteam/tpc/https/domatest/file5_cbbcb6a0-12c6-4eb8-a83c-b176d665bc94, user=dteam; Starting a push request
210121 07:32:11 30996 XrootdBridge: cmsprod0.38210:778@red-c5108 login as cmsprod004
210121 07:32:11 30996 acc_Audit: cmsprod0.38210:778@red-c5108 grant https cmsprod004@[::ffff:129.93.183.144] create /store/unmerged/RunIISummer20UL16wmLHEGENAPV/DYJetsToLL_Pt-100To250_MatchEWPDG20_TuneCP5_13TeV-amcatnloFXFX-pythia8/LHE/106X_mcRun2_asymptotic_preVFP_v8-v1/260006/BD14AAB6-20C6-164B-9DB8-A4317747AC16.root
210121 07:32:33 10941 XrootdBridge: dteam.38224:[log in to unmask] login as dteam
210121 07:32:33 10941 hdfs_stat: Unable to stat /user/dteam/domatests/https/domatest/file5_cbbcb6a0-12c6-4eb8-a83c-b176d665bc94; No such file or directory
210121 07:32:33 10941 ofs_stat: dteam.38224:[log in to unmask] Unable to locate /user/dteam/domatests/https/domatest/file5_cbbcb6a0-12c6-4eb8-a83c-b176d665bc94; no such file or directory
210121 07:32:33 10941 dteam.38224:[log in to unmask] XrootdResponse: sending err 3011: Unable to locate /user/dteam/domatests/https/domatest/file5_cbbcb6a0-12c6-4eb8-a83c-b176d665bc94; no such file or directory
210121 07:32:33 10941 XrootdXeq: dteam.38224:[log in to unmask] disc 0:00:00 (send failure)
21/01/21 07:32:34 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /user/dteam/domatests/https/domatest/file5_cbbcb6a0-12c6-4eb8-a83c-b176d665bc94 (inode 271670062): File does not exist. Holder DFSClient_NONMAPREDUCE_1534607401_38 does not have any open files.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3698)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3499)
...
hdfsWrite: FSDataOutputStream#write error:
RemoteException: No lease on /user/dteam/domatests/https/domatest/file5_cbbcb6a0-12c6-4eb8-a83c-b176d665bc94 (inode 271670062): File does not exist. Holder DFSClient_NONMAPREDUCE_1534607401_38 does not have any open files.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3698)
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /user/dteam/domatests/https/domatest/file5_cbbcb6a0-12c6-4eb8-a83c-b176d665bc94 (inode 271670062): File does not exist. Holder DFSClient_NONMAPREDUCE_1534607401_38 does not have any open files.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3698)
...
210121 07:32:34 30399 ofs_write: dteam Unable to write /user/dteam/domatests/https/domatest/file5_cbbcb6a0-12c6-4eb8-a83c-b176d665bc94; operation not permitted
hdfsWrite: FSDataOutputStream#write error:
RemoteException: No lease on /user/dteam/domatests/https/domatest/file5_cbbcb6a0-12c6-4eb8-a83c-b176d665bc94 (inode 271670062): File does not exist. Holder DFSClient_NONMAPREDUCE_1534607401_38 does not have any open files.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3698)
...
```
</details>

<details>
  <summary>Backtrace</summary>

```
#0  0x00007f72cc69568c in vfprintf () from /lib64/libc.so.6
#1  0x00007f72cc69ae5b in buffered_vfprintf () from /lib64/libc.so.6
#2  0x00007f72cc69581e in vfprintf () from /lib64/libc.so.6
#3  0x00007f72bfb33325 in printExceptionAndFreeV () from /lib64/libhdfs.so.0
#4  0x00007f72bfb334ad in printExceptionAndFree () from /lib64/libhdfs.so.0
#5  0x00007f72bfb37b58 in hdfsWrite () from /lib64/libhdfs.so.0
#6  0x00007f72bfd4705c in XrdHdfsFile::Write (this=0x7f72872cdc00, buff=0x7f71e4800000, offset=<optimized out>, blen=1048576) at /usr/src/debug/xrootd-hdfs-2.1.8/src/XrdHdfs.cc:730
#7  0x00007f72cdb90408 in XrdOfsFile::write (this=0x7f72872c8c00, offset=542113792,
    buff=0x7f71e4800000 "\262\213\352\357p\304\361i!r]\033sߠ\346\001̟\220^\312{\302W\254\256\260\177\255\245\201\202&L\313\317I/TL\210\060P\313\024\352\226ۼ\370\301\213s\305^\303\335\324r-c\005\340\223m|]\234\325հ\346\373\002\016\016\005\212\236\211\200\234Ⴄ\333ܞx*\016n\311\366\023\fn\333\006(/\211\375\256\352\366\201\376\b\201Bf6\354\201p\351Q\261\246/\254\212\252\250E\024\351\265\024\016\062\230ߝ\371D", blen=1048576) at /usr/src/debug/xrootd/xrootd/src/XrdOfs/XrdOfs.cc:1218
#8  0x00007f72bc7eb5ad in TPC::Stream::Write (this=this@entry=0x7f72015f7480, offset=542113792,
    buf=0x7f71e4800000 "\262\213\352\357p\304\361i!r]\033sߠ\346\001̟\220^\312{\302W\254\256\260\177\255\245\201\202&L\313\317I/TL\210\060P\313\024\352\226ۼ\370\301\213s\305^\303\335\324r-c\005\340\223m|]\234\325հ\346\373\002\016\016\005\212\236\211\200\234Ⴄ\333ܞx*\016n\311\366\023\fn\333\006(/\211\375\256\352\366\201\376\b\201Bf6\354\201p\351Q\261\246/\254\212\252\250E\024\351\265\024\016\062\230ߝ\371D", size=size@entry=1048576, force=force@entry=false) at /usr/src/debug/xrootd/xrootd/src/XrdTpc/XrdTpcStream.cc:80
#9  0x00007f72bc7eb309 in Write (force=false, stream=..., this=0x7f72872bfa30) at /usr/src/debug/xrootd/xrootd/src/XrdTpc/XrdTpcStream.hh:82
#10 TPC::Stream::Write (this=this@entry=0x7f72015f7480, offset=542113792,
    buf=0x7f71e4800000 "\262\213\352\357p\304\361i!r]\033sߠ\346\001̟\220^\312{\302W\254\256\260\177\255\245\201\202&L\313\317I/TL\210\060P\313\024\352\226ۼ\370\301\213s\305^\303\335\324r-c\005\340\223m|]\234\325հ\346\373\002\016\016\005\212\236\211\200\234Ⴄ\333ܞx*\016n\311\366\023\fn\333\006(/\211\375\256\352\366\201\376\b\201Bf6\354\201p\351Q\261\246/\254\212\252\250E\024\351\265\024\016\062\230ߝ\371D", size=size@entry=1048576, force=force@entry=false) at /usr/src/debug/xrootd/xrootd/src/XrdTpc/XrdTpcStream.cc:112

...

#29844 TPC::Stream::Write (this=this@entry=0x7f72015f7480, offset=542113792, buf=0x7f71e4800000 "\262\213\352\357p\304\361i!r]\033sߠ\346\001̟\220^\312{\302W\254\256\260\177\255\245\201\202&L\313\317I/TL\210\060P\313\024\352\226ۼ\370\301\213s\305^\303\335\324r-c\005\340\223m|]\234\325հ\346\373\002\016\016\005\212\236\211\200\234Ⴄ\333ܞx*\016n\311\366\023\fn\333\006(/\211\375\256\352\366\201\376\b\201Bf6\354\201p\351Q\261\246/\254\212\252\250E\024\351\265\024\016\062\230ߝ\371D", size=size@entry=1048576, force=force@entry=false) at /usr/src/debug/xrootd/xrootd/src/XrdTpc/XrdTpcStream.cc:112
#29845 0x00007f72bc7eb309 in Write (force=false, stream=..., this=0x7f72872bfa30) at /usr/src/debug/xrootd/xrootd/src/XrdTpc/XrdTpcStream.hh:82
#29846 TPC::Stream::Write (this=0x7f72015f7480, offset=543162368, buf=0x7f7221f71000 "\301Y\253\270#e\031ݹ\215\302\315s^\312\365\001\225\264O\035\230\224\216ذ\032j7ԁ_\311\364V\227!\216;M \225\033\347{Z\020D\341w-\350\036\217\262n\344\n/\265ë\213Z\r\373\271Tv\020a\333\071\372\316\067\327\363\265{\331*VP\030\260\035\n\316:\255\346\224{ys\255&\322S\314\347\244yO\nc\337#\276\277\236J\246g1\260\255k½ŋN\363=͡\341F\365@\302f\rG\225\346\021\303a\334\315\\pH\346\377x\373k\374", <incomplete sequence \327>, size=16384, force=force@entry=false) at /usr/src/debug/xrootd/xrootd/src/XrdTpc/XrdTpcStream.cc:112
#29847 0x00007f72bc7e8be6 in TPC::State::Write (this=this@entry=0x7f72015f74d0, buffer=<optimized out>, size=<optimized out>) at /usr/src/debug/xrootd/xrootd/src/XrdTpc/XrdTpcState.cc:192
#29848 0x00007f72bc7e8d2f in TPC::State::WriteCB (buffer=<optimized out>, size=<optimized out>, nitems=<optimized out>, userdata=0x7f72015f74d0) at /usr/src/debug/xrootd/xrootd/src/XrdTpc/XrdTpcState.cc:188
#29849 0x00007f72bc588b60 in Curl_client_write () from /lib64/libcurl.so.4
#29850 0x00007f72bc59d6f5 in Curl_readwrite () from /lib64/libcurl.so.4
#29851 0x00007f72bc5a6c9b in multi_runsingle () from /lib64/libcurl.so.4
#29852 0x00007f72bc5a7641 in curl_multi_perform () from /lib64/libcurl.so.4
#29853 0x00007f72bc7f3d15 in TPC::TPCHandler::RunCurlWithUpdates (this=this@entry=0x7f72be833840, curl=curl@entry=0x7f7203072000, req=..., state=..., rec=...) at /usr/src/debug/xrootd/xrootd/src/XrdTpc/XrdTpcTPC.cc:404
#29854 0x00007f72bc7f8923 in TPC::TPCHandler::ProcessPullReq (this=this@entry=0x7f72be833840, resource=..., req=...) at /usr/src/debug/xrootd/xrootd/src/XrdTpc/XrdTpcTPC.cc:744
#29855 0x00007f72bc7f92b2 in TPC::TPCHandler::ProcessReq (this=0x7f72be833840, req=...) at /usr/src/debug/xrootd/xrootd/src/XrdTpc/XrdTpcTPC.cc:128
#29856 0x00007f72c71dd69c in XrdHttpReq::ProcessHTTPReq (this=this@entry=0x7f728b8213e0) at /usr/src/debug/xrootd/xrootd/src/XrdHttp/XrdHttpReq.cc:996
#29857 0x00007f72c71d55bd in XrdHttpProtocol::Process (this=0x7f728b821280, lp=<optimized out>) at /usr/src/debug/xrootd/xrootd/src/XrdHttp/XrdHttpProtocol.cc:854
#29858 0x00007f72cd8d9d39 in XrdLinkXeq::DoIt (this=0x7f72b7b4a030) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdLinkXeq.cc:302
#29859 0x00007f72cd8d66e9 in XrdLink::setProtocol (this=0x7f72b7b4a030, pp=<optimized out>, runit=<optimized out>, push=<optimized out>) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdLink.cc:425
#29860 0x00007f72cd8dd02f in XrdScheduler::Run (this=0x614c40 <XrdGlobal::Sched>) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdScheduler.cc:382
#29861 0x00007f72cd8dd179 in XrdStartWorking (carg=<optimized out>) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdScheduler.cc:88
#29862 0x00007f72cd89d9c7 in XrdSysThread_Xeq (myargs=0x7f72b70459a0) at /usr/src/debug/xrootd/xrootd/src/XrdSys/XrdSysPthread.cc:86
#29863 0x00007f72cd449ea5 in start_thread () from /lib64/libpthread.so.0
#29864 0x00007f72cc74b8dd in clone () from /lib64/libc.so.6

```
</details>

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/xrootd/xrootd/issues/1384
########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1