Print

Print


On Mar 3, 2011, at 3:02 PM, Andrew Hanushevsky wrote:

> Hi Brian,
> 
> You can always increase the lookup delay time (cms.delay lookup <n>). This is sort of a brute force solution but workable as long as the majority of lookups are for files that actually exist somewhere. That said, why is the server (and apparently the only one who has the file) taking 30 seconds to respond? From the redirector's viewpoint the server would likely look dead (indeed it's half way there as far as the algorithms are concerned).
> 

Well, I was "lucky" enough to have about 100 clients startup at the same time; at startup each job asks the server to prep all the files it will read.  I think we just effectively flooded the system to force it to take longer than the normal lookup time.

> I suppose we could extend prepare() to allow it to set and extended deadline that would only apply to select's that are preceded by a prepare. But that sort of sounds hooky as well.
> 

Is it possible to give the response, say, 60s to respond, but ask the client to check back every 5 seconds?

Brian

> Andy
> 
> -----Original Message----- From: Brian Bockelman
> Sent: Thursday, March 03, 2011 10:26 AM
> To: xrootd-dev
> Subject: Delay for SelPrep
> 
> Hi,
> 
> I'm seeing lots of errors crop up from our Xrootd redirector.  See an example below from a cmsd.
> 
> Lots of servers have the file, yet the prep message indicates the file failed.  In some cases, if the client tries to open the file immediately after prep'ing it, it gets an error message.
> 
> Is there a knob available to continue to delay the client if there's not yet a result available?  If you see the timestamps below, the client would have gotten a redirect if it had waited another 30s or so.
> 
> Brian
> 
> [bbockelm@xrootd-itb ~]$ grep 12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root /var/log/xrootd/cmsd.log
> 110303 12:17:28 15775 cms125.17871:[log in to unmask] do_PrepAdd: parms: b1ef5d81:0446.4d6fd626:903 * 0 rq /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:28 15775 Select seeking /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:28 15775 cms125.17871:[log in to unmask] do_Select: /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:28 15775 WT4File rc=0 path=/store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:33 15775 SelPrep: /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root failed; No servers are available to read the file.
> 110303 12:17:34 15775 cms125.17871:[log in to unmask] do_Select: /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:34 15775 cms125.17871:[log in to unmask] do_Select: failed; No servers are available to read the file. /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:46 15775 server.24015:23@red-gridftp4:1094 do_Have: /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:50 15775 server.12835:21@red-gridftp11:1094 do_Have: /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:51 15775 server.13920:22@red-gridftp9:1094 do_Have: /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:52 15775 server.2282:26@red-gridftp10:1094 do_Have: /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:52 15775 server.12574:25@red-gridftp1:1094 do_Have: /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:52 15775 server.1981:27@red-gridftp2:1094 do_Have: /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:52 15775 server.28934:16@red-gridftp5:1094 do_Have: /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:52 15775 server.3999:20@red-gridftp12:1094 do_Have: /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:52 15775 server.31009:19@red-gridftp6:1094 do_Have: /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:52 15775 server.16746:18@red-gridftp7:1094 do_Have: /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:52 15775 server.12739:17@red-gridftp8:1094 do_Have: /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:52 15775 server.21911:24@red-gridftp3:1094 do_Have: /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> [bbockelm@xrootd-itb ~]$ grep 12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root /var/log/xrootd/xrootd.log
> 110303 12:17:28 15632 acc_Audit: cms125.17871:[log in to unmask] grant host *@wngw.ifca.es read /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:28 15632 Prepare Finder: Sending xrootd-itb.unl.edu b1ef5d81:0446.4d6fd626:903 /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:28 15632 cms125.17871:[log in to unmask] ofs_open: 0-644 fn=/store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:29 15632 Decode xrootd-itb delays cms125.17871:[log in to unmask] 5 /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:34 15632 cms125.17871:[log in to unmask] ofs_open: 0-644 fn=/store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root
> 110303 12:17:34 15632 Decode xrootd-itb gave cms125.17871:[log in to unmask] err -2 'No servers are available to read the file.' /store/test/xrootd/T2_US_Nebraska/store/data/Run2010B/Mu/AOD/Nov4ReReco_v1/0001/12A4A7E2-02EA-DF11-8A5B-E0CB4E29C4D0.root