Hi Andy, hm, that doesn't agree with my observations so far. But i might still be wrong. So i did setup xrootd-4.8.3-rc1 for 12 data servers summing up to 620TB. At 8:33am the redirector was restartet and at 8:38am a ls command sent with no stalling. At 8:39am an xrdcp job started with stalling promptly for 5sec. The stalling can be forced when using a new file name which is not known to xrootd (caching ??). Yes you are right that the stalling occurs repeatedly when the redirector is not "resumed" after startup. Pls let me know if i could do some more debugging because this is still a test setup. Best Heiko The log of the Redirector: 8:33am: Restart of Redirector for 12 data servers with 620TB total space <snip> 180425 08:32:02 23926 XrdXeq: Admin traffic thread started 180425 08:32:02 23924 XrdXeq: async callback thread started 180425 08:33:32 23923 Receive glogin1 0 bytes on 0 180425 08:33:32 23923 setStatus REDIRECTOR sent resume event 180425 08:33:32 23923 cms_setStatus: Manager REDIRECTOR resumed <snap> 8:38am: xrdfs rdr ls /xrootd/schroete (no stalling) <snip> 180425 08:38:22 23916 XrdSched: Now have 3 workers 180425 08:38:22 23916 XrdSched: running main accept inq=0 180425 08:38:22 24065 XrdXeq: Worker thread started 180425 08:38:22 23915 XrdInet: Accepted connection from 7@CLIENT 180425 08:38:22 23915 XrdProtocol: matched protocol xrootd 180425 08:38:22 23915 ?:7@CLIENT XrdPoll: FD 7 attached to poller 0; num=1 180425 08:38:22 23915 ?:7@CLIENT XrootdProtocol: 0000 req=login dlen=108 180425 08:38:22 23915 schroete.64810:7@CLIENT XrootdResponse: 0000 sending 16 data bytes 180425 08:38:22 23915 XrootdXeq: schroete.64810:7@CLIENT pvt IPv4 login 180425 08:38:22 23915 schroete.64810:7@CLIENT XrootdProtocol: 0100 req=locate dlen=18 180425 08:38:22 23915 schroete.64810:7@CLIENT XrootdProtocol: 0100 locate n */xrootd/schroete/ 180425 08:38:22 23915 schroete.64810:7@CLIENT ofs_fsctl: fn=*/xrootd/schroete/ 180425 08:38:22 23923 Receive glogin1 315 bytes on 3071 180425 08:38:22 23923 Decode glogin1 sent schroete.64810:7@CLIENT 'Sw[::192.168.16.146]:1094 Sw[::192.168.16.147]:1094 Sw[::192.168.16.127]:1094 Sw[::192.168.16.97]:1094 Sw[::192.168.16.120]:1094 Sw[::192.168.16.139]:1094 Sw[::192.168.16.217]:1094 Sw[::192.168.16.144]:1094 Sw[::192.168.16.134]:1094 Sw[::192.168.16.196]:1094 Sw[::192.168.16.195]:1094 Sw[::192.168.16.121]:1094' */xrootd/schroete/ 180425 08:38:22 23915 schroete.64810:7@CLIENT XrootdProtocol: 0100 rc=-1024 locate */xrootd/schroete/ 180425 08:38:22 23915 schroete.64810:7@CLIENT XrootdResponse: 0100 sending 311 data bytes 180425 08:38:22 23915 XrootdXeq: schroete.64810:7@CLIENT disc 0:00:00 180425 08:38:22 23915 schroete.64810:7@CLIENT XrdPoll: FD 7 detached from poller 0; num=0 <snap> 8:39am: xrdcp and stalling 5sec <snip> 180425 08:39:07 24065 XrdSched: running main accept inq=0 180425 08:39:07 23916 XrdInet: Accepted connection from 20@CLIENT 180425 08:39:07 23916 XrdProtocol: matched protocol xrootd 180425 08:39:07 23916 ?:20@CLIENT XrdPoll: FD 20 attached to poller 0; num=1 180425 08:39:07 23916 ?:20@CLIENT XrootdProtocol: 0000 req=login dlen=108 180425 08:39:07 23916 schroete.64831:20@CLIENT XrootdResponse: 0000 sending 16 data bytes 180425 08:39:07 23916 XrootdXeq: schroete.64831:20@CLIENT pvt IPv4 login 180425 08:39:07 23916 schroete.64831:20@CLIENT XrootdProtocol: 0100 req=stat dlen=17 180425 08:39:07 23916 schroete.64831:20@CLIENT ofs_stat: fn=/xrootd/schroete/ 180425 08:39:07 23923 Receive glogin1 19 bytes on 4095 180425 08:39:07 23923 Decode glogin1 redirects schroete.64831:20@CLIENT to 192.168.16.144:1094 /xrootd/schroete/ 180425 08:39:07 23916 schroete.64831:20@CLIENT XrootdProtocol: 0100 rc=-256 stat /xrootd/schroete/ 180425 08:39:07 23916 schroete.64831:20@CLIENT XrootdProtocol: 0100 redirecting to 192.168.16.144:1094 180425 08:39:07 23916 schroete.64831:20@CLIENT XrootdResponse: 0100 sending 18 data bytes; status=4004 180425 08:39:07 23916 schroete.64831:20@CLIENT XrootdProtocol: 0100 req=open dlen=43 180425 08:39:07 23916 schroete.64831:20@CLIENT XrootdProtocol: 0100 open unmat /xrootd/schroete//hd.tst?oss.asize=52428800 180425 08:39:07 23916 schroete.64831:20@CLIENT ofs_open: 102-40644 fn=/xrootd/schroete/hd.tst 180425 08:39:07 23923 Receive glogin1 4 bytes on 5119 180425 08:39:07 23923 Decode glogin1 delays schroete.64831:20@CLIENT 5 /xrootd/schroete/hd.tst ******************* 180425 08:39:07 23916 schroete.64831:20@CLIENT XrootdProtocol: 0100 stalling client for 5 sec ******************* 180425 08:39:07 23916 schroete.64831:20@CLIENT XrootdResponse: 0100 sending 4 data bytes; status=4005 180425 08:39:07 23916 schroete.64831:20@CLIENT ofs_close: use=0 fn=dummy 180425 08:39:10 23916 schroete.64831:20@CLIENT XrootdProtocol: 0100 request timeout; read 0 of 24 bytes 180425 08:39:10 23916 XrdPoll: Poller 0 enabled schroete.64831:20@CLIENT 180425 08:39:12 23915 XrdSched: running schroete.64831:20@CLIENT inq=0 180425 08:39:12 23915 schroete.64831:20@CLIENT XrootdProtocol: 0100 req=open dlen=43 180425 08:39:12 23915 schroete.64831:20@CLIENT XrootdProtocol: 0100 open unmat /xrootd/schroete//hd.tst?oss.asize=52428800 180425 08:39:12 23915 schroete.64831:20@CLIENT ofs_open: 102-40644 fn=/xrootd/schroete/hd.tst 180425 08:39:12 23923 Receive glogin1 19 bytes on 6143 180425 08:39:12 23923 Decode glogin1 redirects schroete.64831:20@CLIENT to 192.168.16.121:1094 /xrootd/schroete/hd.tst 180425 08:39:12 23915 schroete.64831:20@CLIENT XrootdProtocol: 0100 redirecting to 192.168.16.121:1094 180425 08:39:12 23915 schroete.64831:20@CLIENT XrootdResponse: 0100 sending 18 data bytes; status=4004 180425 08:39:12 23915 schroete.64831:20@CLIENT ofs_close: use=0 fn=dummy 180425 08:39:12 23915 XrootdXeq: schroete.64831:20@CLIENT disc 0:00:05 180425 08:39:12 23915 schroete.64831:20@CLIENT XrdPoll: FD 20 detached from poller 0; num=0 <snap> Am 25.04.2018 um 04:36 schrieb Andrew Hanushevsky: > Hi Heiko, > > It would seem that the stalls are occurring because the redirector a) > the redirector has not beed up long enough (default requires 30 > seconds to pass) or b) does not think it has any working data servers > (which will be the case of they login much later than you started the > copy). > > Andy > > > > -----Original Message----- From: Heiko Schröter > Sent: Thursday, April 19, 2018 5:37 AM > To: Michal Kamil Simon ; [log in to unmask] > Cc: [log in to unmask] ; [log in to unmask] > Subject: Re: Stalling client when copying files (xrdcp 4.8.2) > > I've been talking too fast. The stalling occurs with 4.8.3-rc1 as well. > But only for the first connection of a file transfer. > If you rm and recopy the file, the stalling does not occur. > > > Am 19.04.2018 um 10:06 schrieb Michal Kamil Simon: >> Hi Heiko, >> >> That's interesting, could you give me more details on your scenario, >> are you using xrdcp from a script or XrdCl C++ API (or Python bindings)? >> >> Could you also provide client side logs from a run when you observed >> stalling? >> >> Cheers, >> Michal >> ________________________________________ >> From: [log in to unmask] [[log in to unmask]] on >> behalf of Heiko Schröter [[log in to unmask]] >> Sent: 18 April 2018 20:26 >> To: [log in to unmask] >> Cc: [log in to unmask]; [log in to unmask] >> Subject: Re: Stalling client when copying files (xrdcp 4.8.2) >> >> This stalling does not occur with the 4.8.3-rc1. >> >> >> Am 17.04.2018 um 19:24 schrieb Heiko Schröter: >>> Hello, >>> >>> we do observe that when copying a file the client is stalled for >>> some time. >>> >>> 180417 19:17:28 17252 schroete.97360:7@qc08 XrootdProtocol: 0100 >>> stalling client for 5 sec >>> >>> Sometimes it is for 10sec and this gets repeated without a recognizable >>> pattern. >>> >>> The client is not stalled when the copied file is removed at once and >>> recopied. >>> >>> It looks like a similar issue as this one: >>> https://listserv.slac.stanford.edu/cgi-bin/wa?A2=ind1203&L=XROOTD-L&P=R598&1=XROOTD-L&9=A&I=-3&J=on&d=No+Match%3BMatch%3BMatches&z=4 >>> >>> >>> >>> Is this a settable parameter or something we did wron in our setup ? >>> >>> We have one redirector and 12 data server on a 10GBit network. Client >>> access is very limited because this is a test setup. ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1