Print

Print


Hi Brian,

It sort of works that way right now. As long as clients are asking for files 
we keep extending the deadline (sort of well, if this many clients are 
asking we better not fail mentality). So, it might be possible.

Andy


-----Original Message----- 
From: Brian Bockelman
Sent: Thursday, March 03, 2011 1:12 PM
To: Andrew Hanushevsky
Cc: xrootd-dev
Subject: Re: Delay for SelPrep


On Mar 3, 2011, at 3:02 PM, Andrew Hanushevsky wrote:

> Hi Brian,
>
> You can always increase the lookup delay time (cms.delay lookup <n>). This 
> is sort of a brute force solution but workable as long as the majority of 
> lookups are for files that actually exist somewhere. That said, why is the 
> server (and apparently the only one who has the file) taking 30 seconds to 
> respond? From the redirector's viewpoint the server would likely look dead 
> (indeed it's half way there as far as the algorithms are concerned).
>

Well, I was "lucky" enough to have about 100 clients startup at the same 
time; at startup each job asks the server to prep all the files it will 
read.  I think we just effectively flooded the system to force it to take 
longer than the normal lookup time.

> I suppose we could extend prepare() to allow it to set and extended 
> deadline that would only apply to select's that are preceded by a prepare. 
> But that sort of sounds hooky as well.
>

Is it possible to give the response, say, 60s to respond, but ask the client 
to check back every 5 seconds?

Brian

> Andy
>
> -----Original Message----- From: Brian Bockelman
> Sent: Thursday, March 03, 2011 10:26 AM
> To: xrootd-dev
> Subject: Delay for SelPrep
>
> Hi,
>
> I'm seeing lots of errors crop up from our Xrootd redirector.  See an 
> example below from a cmsd.
>
> Lots of servers have the file, yet the prep message indicates the file 
> failed.  In some cases, if the client tries to open the file immediately 
> after prep'ing it, it gets an error message.
>
> Is there a knob available to continue to delay the client if there's not 
> yet a result available?  If you see the timestamps below, the client would 
> have gotten a redirect if it had waited another 30s or so.
>
> Brian
>
> [bbockelm@xrootd-itb ~]$ grep 12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root 
> /var/log/xrootd/cmsd.log
> 110303 12:17:28 15775 cms125.17871:[log in to unmask] do_PrepAdd: parms: 
> b1ef5d81:0446.4d6fd626:903 * 0 rq 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:28 15775 Select seeking 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:28 15775 cms125.17871:[log in to unmask] do_Select: 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:28 15775 WT4File rc=0 
> path=/store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:33 15775 SelPrep: 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root 
> failed; No servers are available to read the file.
> 110303 12:17:34 15775 cms125.17871:[log in to unmask] do_Select: 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:34 15775 cms125.17871:[log in to unmask] do_Select: failed; No 
> servers are available to read the file. 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:46 15775 server.24015:23@red-gridftp4:1094 do_Have: 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:50 15775 server.12835:21@red-gridftp11:1094 do_Have: 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:51 15775 server.13920:22@red-gridftp9:1094 do_Have: 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:52 15775 server.2282:26@red-gridftp10:1094 do_Have: 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:52 15775 server.12574:25@red-gridftp1:1094 do_Have: 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:52 15775 server.1981:27@red-gridftp2:1094 do_Have: 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:52 15775 server.28934:16@red-gridftp5:1094 do_Have: 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:52 15775 server.3999:20@red-gridftp12:1094 do_Have: 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:52 15775 server.31009:19@red-gridftp6:1094 do_Have: 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:52 15775 server.16746:18@red-gridftp7:1094 do_Have: 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:52 15775 server.12739:17@red-gridftp8:1094 do_Have: 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:52 15775 server.21911:24@red-gridftp3:1094 do_Have: 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> [bbockelm@xrootd-itb ~]$ grep 12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root 
> /var/log/xrootd/xrootd.log
> 110303 12:17:28 15632 acc_Audit: cms125.17871:[log in to unmask] grant host 
> *@wngw.ifca.es read 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:28 15632 Prepare Finder: Sending xrootd-itb.unl.edu 
> b1ef5d81:0446.4d6fd626:903 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:28 15632 cms125.17871:[log in to unmask] ofs_open: 0-644 
> fn=/store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:29 15632 Decode xrootd-itb delays 
> cms125.17871:[log in to unmask] 5 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:34 15632 cms125.17871:[log in to unmask] ofs_open: 0-644 
> fn=/store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:34 15632 Decode xrootd-itb gave cms125.17871:[log in to unmask] 
> err -2 'No servers are available to read the file.' 
> /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root