Print

Print


Hi Matevz,

No need to turn on debugging here. This particular stall occurs because a 
file is being opened and the OFS has found that the file is already open 
or being opened by another client. So, it tries to piggy-back the new open 
on that handle to avoid actually doing another physical open. The problem 
is that the other client has not yet released the handle for use; likely 
being hung up in the proxy code trying to do the open or perhaps a close. 
The latter problem I thought was solved by the disk caching proxy by doing 
the closes in the background to avoid holding on to the handle lock for 
long periods of time.

This is not a fatal problem the client will eventually open the file. The 
ofs layer uses this as congenstion control when there is a lot of 
open/close contention for the same file. I suppose you can trace opens and 
closes to get better feeling of how long this takes:

ofs.trace open close

Assuming this is a disk caching proxy there may be tracing options for 
that to see what happens during the open/close sequence.

Andy

  On Fri, 27 Jun 2014, Matevz Tadel wrote:

> Hi,
>
> At RAL, they see the following on their outgoing proxy servers (repeating for about a minute before the file-open times-out at the application side):<<FNORD
>
> When our xrootd proxy cluster is busy, there are sometimes messages like this in the logs:
>
> 140626 16:53:25 24465  ofs_Stall: Stall 3: File 2EF5AF84-D65A-E311-AB3F-02163E00A0E1.root is being staged; estimated time to completion 3 seconds for /store/mc/Fall13/QCD_Pt-5to10_Tune4C_13TeV_pythia8/GEN-SIM/POSTLS162_V1_castor-v1/10000/2EF5AF84-D65A-E311-AB3F-02163E00A0E1.root
> 140626 16:53:25 24465 pcms054.6545:147@lcg1353 XrootdProtocol: stalling client for 3 sec
> 140626 16:53:25 24465 pcms054.6545:147@lcg1353 ofs_close: use=0 fn=dummy
>
> FNORD
>
> This probably means that the remote file can not be opened for some reason (like being delayed by external redirector/server)? Would there be a special error if the socket can not be opened (due to fd or firewall limits ... or some other internal limits)? Note that this only happens when the proxies are already under heavy load.
>
> What options should they set to debug this?
>
> pss.memcache debug ???
> xrd.trace    conn
> xrootd.trace redirect
>
> Matevz
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-DEV list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1