Print

Print


Thanks Andy,

This is actually the standard proxy, RAL was running 4.0.0-rc1 the last time we 
talked about it.

Andrew, have you upgraded to 4.0.0 yet?

Matevz

On 06/27/14 10:01, Andrew Hanushevsky wrote:
> Hi Matevz,
>
> No need to turn on debugging here. This particular stall occurs because a file
> is being opened and the OFS has found that the file is already open or being
> opened by another client. So, it tries to piggy-back the new open on that handle
> to avoid actually doing another physical open. The problem is that the other
> client has not yet released the handle for use; likely being hung up in the
> proxy code trying to do the open or perhaps a close. The latter problem I
> thought was solved by the disk caching proxy by doing the closes in the
> background to avoid holding on to the handle lock for long periods of time.
>
> This is not a fatal problem the client will eventually open the file. The ofs
> layer uses this as congenstion control when there is a lot of open/close
> contention for the same file. I suppose you can trace opens and closes to get
> better feeling of how long this takes:
>
> ofs.trace open close
>
> Assuming this is a disk caching proxy there may be tracing options for that to
> see what happens during the open/close sequence.
>
> Andy
>
>   On Fri, 27 Jun 2014, Matevz Tadel wrote:
>
>> Hi,
>>
>> At RAL, they see the following on their outgoing proxy servers (repeating for
>> about a minute before the file-open times-out at the application side):<<FNORD
>>
>> When our xrootd proxy cluster is busy, there are sometimes messages like this
>> in the logs:
>>
>> 140626 16:53:25 24465  ofs_Stall: Stall 3: File
>> 2EF5AF84-D65A-E311-AB3F-02163E00A0E1.root is being staged; estimated time to
>> completion 3 seconds for
>> /store/mc/Fall13/QCD_Pt-5to10_Tune4C_13TeV_pythia8/GEN-SIM/POSTLS162_V1_castor-v1/10000/2EF5AF84-D65A-E311-AB3F-02163E00A0E1.root
>>
>> 140626 16:53:25 24465 pcms054.6545:147@lcg1353 XrootdProtocol: stalling client
>> for 3 sec
>> 140626 16:53:25 24465 pcms054.6545:147@lcg1353 ofs_close: use=0 fn=dummy
>>
>> FNORD
>>
>> This probably means that the remote file can not be opened for some reason
>> (like being delayed by external redirector/server)? Would there be a special
>> error if the socket can not be opened (due to fd or firewall limits ... or
>> some other internal limits)? Note that this only happens when the proxies are
>> already under heavy load.
>>
>> What options should they set to debug this?
>>
>> pss.memcache debug ???
>> xrd.trace    conn
>> xrootd.trace redirect
>>
>> Matevz
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-DEV list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1