Hi Matevz, OK, then all this probably means is that client a goes after file x and the proxy is very heavily loaded so it takes a bit of time to actually open the file at the remote location. While the open is taking place, client b tries to open the same file. So, client b is delayed until client a finishes opening the file. Nothing particularly wrong here. Andy On Fri, 27 Jun 2014, Matevz Tadel wrote: > Thanks Andy, > > This is actually the standard proxy, RAL was running 4.0.0-rc1 the last time > we talked about it. > > Andrew, have you upgraded to 4.0.0 yet? > > Matevz > > On 06/27/14 10:01, Andrew Hanushevsky wrote: >> Hi Matevz, >> >> No need to turn on debugging here. This particular stall occurs because a >> file >> is being opened and the OFS has found that the file is already open or >> being >> opened by another client. So, it tries to piggy-back the new open on that >> handle >> to avoid actually doing another physical open. The problem is that the >> other >> client has not yet released the handle for use; likely being hung up in the >> proxy code trying to do the open or perhaps a close. The latter problem I >> thought was solved by the disk caching proxy by doing the closes in the >> background to avoid holding on to the handle lock for long periods of time. >> >> This is not a fatal problem the client will eventually open the file. The >> ofs >> layer uses this as congenstion control when there is a lot of open/close >> contention for the same file. I suppose you can trace opens and closes to >> get >> better feeling of how long this takes: >> >> ofs.trace open close >> >> Assuming this is a disk caching proxy there may be tracing options for that >> to >> see what happens during the open/close sequence. >> >> Andy >> >> On Fri, 27 Jun 2014, Matevz Tadel wrote: >> >>> Hi, >>> >>> At RAL, they see the following on their outgoing proxy servers (repeating >>> for >>> about a minute before the file-open times-out at the application >>> side):<<FNORD >>> >>> When our xrootd proxy cluster is busy, there are sometimes messages like >>> this >>> in the logs: >>> >>> 140626 16:53:25 24465 ofs_Stall: Stall 3: File >>> 2EF5AF84-D65A-E311-AB3F-02163E00A0E1.root is being staged; estimated time >>> to >>> completion 3 seconds for >>> /store/mc/Fall13/QCD_Pt-5to10_Tune4C_13TeV_pythia8/GEN-SIM/POSTLS162_V1_castor-v1/10000/2EF5AF84-D65A-E311-AB3F-02163E00A0E1.root >>> >>> 140626 16:53:25 24465 pcms054.6545:147@lcg1353 XrootdProtocol: stalling >>> client >>> for 3 sec >>> 140626 16:53:25 24465 pcms054.6545:147@lcg1353 ofs_close: use=0 fn=dummy >>> >>> FNORD >>> >>> This probably means that the remote file can not be opened for some reason >>> (like being delayed by external redirector/server)? Would there be a >>> special >>> error if the socket can not be opened (due to fd or firewall limits ... or >>> some other internal limits)? Note that this only happens when the proxies >>> are >>> already under heavy load. >>> >>> What options should they set to debug this? >>> >>> pss.memcache debug ??? >>> xrd.trace conn >>> xrootd.trace redirect >>> >>> Matevz >>> >>> ######################################################################## >>> Use REPLY-ALL to reply to list >>> >>> To unsubscribe from the XROOTD-DEV list, click the following link: >>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 >>> > ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1