Good work Manny. Could you check that if you create the parent directories the file you're trying to stage in before you try to stage it and touch the DIR_LOCK file in the directory. Then it's happy to stage the file in. Thanks, Chris. > -----Original Message----- > From: Emmanuel Olaiya [mailto:[log in to unmask]] > Sent: 07 June 2005 22:31 > To: Andrew Hanushevsky > Cc: Adye, TJ (Tim); Brew, CAJ (Chris); > [log in to unmask]; Bill Weeks > Subject: Re: PreStage Problems > > Hi Andy, Bill > > I took the versions of mps_Stage and mps_prep from > /afs/slac/package/xrd/xrootd/utils. These are mps_Stage and mps_prep > versions 1.9 and 1.8 respectively. > > I still see the problem Chris reported. Restarting the > directors and the > server (with prestaging on the server) I get the following message in > the prestage log when asking for a file that doesn't exist at RAL > > Starting new cycle, pstg proc = 0 > 21:17:41 [ 17543] getlock: locking file > >>/opt/xrootd/stageQ/PreStageQ.0.lock, flags 2 > 21:17:41 [ 17543] getlock: locking file > +</opt/xrootd/stageQ/PreStageQ.0.old, flags 2 > 21:17:41 [ 17543] unlock: unlocking file > /opt/xrootd/stageQ/PreStageQ.0.old > 21:17:41 [ 17543] unlock: unlocking file > /opt/xrootd/stageQ/PreStageQ.0.lock > 21:17:41 [ 17543] getlock: locking file > >>/opt/xrootd/stageQ/PreStageQ.1.lock, flags 2 > 21:17:41 [ 17543] unlock: unlocking file > /opt/xrootd/stageQ/PreStageQ.1.lock > 21:21:29 [ 17772] mps_Stage: cannot create 'store' in > '/store/PRskims/R14/16.1.1b/BToPPP/58/'; Permission denied > 21:21:29 [ 17772] mps_Stage: Invalid file system path, > '/store/PRskims/R14/16.1.1b/BToPPP/58/'. > 21:21:29 [ 17772] do_stagein: xfr failed for > /store/PRskims/R14/16.1.1b/BToPPP/58/BToPPP_5831.01.root, > rc=4, retry=1 > > Whilst my job just hangs. If I take the log file literally, > it is trying > to write to /store when it should be trying to write to > /base_directory/store. > > Doing further tests I can reproduce the problem I reported earlier. > Whilst still asking for the above file I turn off staging, > restart the > directors and servers and the request for the file continues > to hang (is > told to wait). Then I make another request for the same file and this > request is also continually told to wait: > > 050607 21:55:13 2915 odc_Locate: olaiya.8042:[log in to unmask] > asked to > wait 5 by xrootd107 > path=/store/PRskims/R14/16.1.1b/BToPPP/58/BToPPP_5831.01.root > 050607 21:55:14 2915 odc_Locate: > olaiya.23507:[log in to unmask] asked to > wait 5 by xrootd107 > path=/store/PRskims/R14/16.1.1b/BToPPP/58/BToPPP_5831.01.root > 050607 21:55:18 2915 odc_Locate: olaiya.8042:[log in to unmask] > asked to > wait 5 by xrootd107 > path=/store/PRskims/R14/16.1.1b/BToPPP/58/BToPPP_5831.01.root > ... > > > It is only after I kill the first request that anymore > requests for this > file return correctly with a message indicating that the file > cannot be > found. > > cheers > > Manny > > Andrew Hanushevsky wrote: > > Hi Tim, > > > > Bill Weeks should have the fix available. You can also find > the fixed mps > > scripts in /afs/slac/package/xrd/xrootd/utils (I think you > just need an > > update for mps_Stage and mps_prep). > > > > Otherwise, the earliest time I can get together with Many > is Monday. How > > about the afternoon, say 1:30pm? > > > > Andy > > > > On Tue, 7 Jun 2005, Adye, TJ (Tim) wrote: > > > > > >>Hi Guys, > >> > >>Did you manage to sort something out, despite the > cancellation of the > >>meeting? These are serious problems for us. > >> > >>Tim. > >> > >> > >>>-----Original Message----- > >>>From: [log in to unmask] > >>>[mailto:[log in to unmask]] On Behalf Of > >>>Emmanuel Olaiya > >>>Sent: 06 June 2005 22:57 > >>>To: Andy Hanushevsky > >>>Cc: Brew, CAJ (Chris); [log in to unmask]; Bill Weeks > >>>Subject: Re: PreStage Problems > >>> > >>>Hi Andy > >>> > >>>Yes, it would be good if you could have a look at this with > >>>me. We can > >>>arrange a time in the xrootd meeting tomorrow. > >>> > >>>cheers > >>> > >>>Manny > >>> > >>>Andy Hanushevsky wrote: > >>> > >>>>Hi Manny, > >>>> > >>>>I find this is quite mysterious as this should never be the > >>> > >>>case and, > >>> > >>>>frankly, appears to violate causality. I suspect something > >>> > >>>else is going > >>> > >>>>on. If this is reproducible then why don't we run a test with all > >>>>debugging turned on. Yes? > >>>> > >>>>Andy > >>>> > >>>>----- Original Message ----- From: "Emmanuel Olaiya" > >>> > >>><[log in to unmask]> > >>> > >>>>To: "Andrew Hanushevsky" <[log in to unmask]> > >>>>Cc: "Brew, CAJ (Chris)" <[log in to unmask]>; > >>>><[log in to unmask]>; "Bill Weeks" <[log in to unmask]> > >>>>Sent: Monday, June 06, 2005 1:41 PM > >>>>Subject: Re: PreStage Problems > >>>> > >>>> > >>>> > >>>>>Hi Andy > >>>>> > >>>>>I should have mentioned that we also remove the prestage > queue and > >>>>>restarted both the server and redirector. However the old > >>> > >>>request to > >>> > >>>>>wait did not change. Moreover, any similar new requests > >>> > >>>were also told > >>> > >>>>>to wait until the old request was terminated. > >>>>> > >>>>>cheers > >>>>> > >>>>>Manny > >>>>> > >>>>>Andrew Hanushevsky wrote: > >>>>> > >>>>> > >>>>>>Hi Manny, > >>>>>> > >>>>>>Yes, but who telling the client to wait? The redirector > >>> > >>>or the server > >>> > >>>>>>that > >>>>>>wanted to orginally stage the file in. When you restart the > >>>>>>redirector it > >>>>>>loses all it's memory but the data server does not. So, > >>> > >>>it will hapiily > >>> > >>>>>>tell the redirector that it has the file eventhough the file is > >>>>>>merely in > >>>>>>the pre-stage queue. As long as the file is in the > >>> > >>>prestage queue and > >>> > >>>>>>not on > >>>>>>disk, the only option is to direct clients to where the > >>> > >>>file will be > >>> > >>>>>>staged in and then the clients simply wait for the file > >>> > >>>(which in this > >>> > >>>>>>case will never appear). So, if you remove staging you > >>> > >>>also need to > >>> > >>>>>>remove > >>>>>>the prestage queue and restart the data server. > >>>>>> > >>>>>>Andy > >>>>>> > >>>>>>On Fri, 3 Jun 2005, Emmanuel Olaiya wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>>Hi Andy > >>>>>>> > >>>>>>>One other issue we have spotted at RAL. We removed the staging > >>>>>>>capabilities and restarted the director and server. > >>> > >>>However we found > >>> > >>>>>>>previous requests for a file that were told to wait > >>> > >>>continued being > >>> > >>>>>>>told > >>>>>>>to wait. We also found that if somebody else asked for > >>> > >>>this same file > >>> > >>>>>>>that was not on disk they were also told to wait rather > >>> > >>>than being told > >>> > >>>>>>>the file could not be found. We needed to kill the > >>> > >>>previous request and > >>> > >>>>>>>restart the server and directory for xrootd to know the > >>> > >>>file was not on > >>> > >>>>>>>disk. > >>>>>>> > >>>>>>>cheers > >>>>>>> > >>>>>>>Manny > >>>>>>> > >>>>>>>Andrew Hanushevsky wrote: > >>>>>>> > >>>>>>> > >>>>>>>>Hi Chris, > >>>>>>>> > >>>>>>>>Oh yeah, different problem. I think that Bill Weeks > fixed that. > >>>>>>>>Bill did > >>>>>>>>you fix that problem? > >>>>>>>> > >>>>>>>>Andy > >>>>>>>> > >>>>>>>>On Mon, 30 May 2005, Brew, CAJ (Chris) wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>Hi, > >>>>>>>>> > >>>>>>>>>I might be being stupid but I don't see how this > >>> > >>>relates to the > >>> > >>>>>>>>>problem. > >>>>>>>>>The files I wanted were on a different disk server > >>> > >>>which then went > >>> > >>>>>>>>>down. > >>>>>>>>>The server in question was registered with the OLB as > >>> > >>>being able to > >>> > >>>>>>>>>stage in the name space so the request was > redirected to it. If > >>>>>>>>>mps_Stage is used without the PreStage queuing system > >>> > >>>everything > >>> > >>>>>>>>>works > >>>>>>>>>as expected. If we try to go through the PreStage > >>> > >>>queue to limit the > >>> > >>>>>>>>>number of concurrent accesses to the tapestore the > >>> > >>>stage in fails. > >>> > >>>>>>>>>Apparently because the DIR_LOCK file does not exist (which it > >>>>>>>>>doesn't, > >>>>>>>>>since the file, and it's directory structure, has > >>> > >>>never existed on > >>> > >>>>>>>>>this > >>>>>>>>>server). > >>>>>>>>> > >>>>>>>>>Yours, > >>>>>>>>>Chris. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>>-----Original Message----- > >>>>>>>>>>From: Andrew Hanushevsky [mailto:[log in to unmask]] > >>>>>>>>>>Sent: 28 May 2005 07:39 > >>>>>>>>>>To: Brew, CAJ (Chris) > >>>>>>>>>>Cc: [log in to unmask]; abh; Olaiya, EO (Emmanuel) > >>>>>>>>>>Subject: RE: PreStage Problems > >>>>>>>>>> > >>>>>>>>>>Hi Chris, > >>>>>>>>>> > >>>>>>>>>>This was traced to overzealous testing. The syustem does not > >>>>>>>>>>put in a new > >>>>>>>>>>entry in the pre-stage queue until after about 10-20 minutes > >>>>>>>>>>have elapsed > >>>>>>>>>>since the last time the entry was added. So, this is not a > >>>>>>>>>>bug but a test > >>>>>>>>>>case that was not "real". Generally, files live in the disk > >>>>>>>>>>cache for at > >>>>>>>>>>least 10-20 minutes. > >>>>>>>>>> > >>>>>>>>>>Andy > >>>>>>>>>> > >>>>>>>>>>On Fri, 27 May 2005, Brew, CAJ (Chris) wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>Hi, > >>>>>>>>>>> > >>>>>>>>>>>At the meeting a couple of weeks ago, it was said > >>> > >>>that someone was > >>> > >>>>>>>>>>>looking into this but I haven't heard anything back. Is > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>there any new? > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>Thanks, > >>>>>>>>>>>Chris. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>-----Original Message----- > >>>>>>>>>>>>From: Brew, CAJ (Chris) > >>>>>>>>>>>>Sent: 17 May 2005 13:50 > >>>>>>>>>>>>To: [log in to unmask]; abh > >>>>>>>>>>>>Cc: Olaiya, EO (Emmanuel) > >>>>>>>>>>>>Subject: PreStage Problems > >>>>>>>>>>>> > >>>>>>>>>>>>Hi, > >>>>>>>>>>>> > >>>>>>>>>>>>I've been running some more tests of the staging > at RAL and > >>>>>>>>>>>>have run into a problem somewhere in the > >>>>>>>>>>>>mps_Stage/PreStage/prep system. > >>>>>>>>>>>> > >>>>>>>>>>>>Everything work fine staging file that was on the > system and > >>>>>>>>>>>>has been deleted but if I try to stage in a file > >>> > >>>that was one > >>> > >>>>>>>>>>>>a different server, hence the directory structure for the > >>>>>>>>>>>>file does not exist on the staging server it > fails and I see > >>>>>>>>>>>>the following error in the PreStage log file: > >>>>>>>>>>>> > >>>>>>>>>>>>12:45:43 [ 10859] mps_Stage: Open > >>>>>>>>>>>> > >>> > >>>'/stage/bdata-data50/kanga//store/SPskims/R12/16.0.2e/BtoKKKL/ > >>> > >>>>>>>>>>>>001005/200002/DIR_LOCK' r/w failed; No such file or > >>> > >>>directory. > >>> > >>>>>>>>>>>>12:45:43 [ 10859] do_stagein: xfr failed for > >>>>>>>>>>>> > >>> > >>>/store/SPskims/R12/16.0.2e/BtoKKKL/001005/200002/BtoKKKL_00100 > >>> > >>>>>>>>>>>>5_3247.01.root, rc=4, retry=1 > >>>>>>>>>>>>12:45:45 [ 3255] > >>>>>>>>>>>> > >>> > >>>file=/store/SPskims/R12/16.0.2e/BtoKKKL/001005/200002/BtoKKKL_ > >>> > >>>>>>>>>>>>0010053247.01.root, rc=1024, reqid=ef000001:1cd2.425d27e1 > >>>>>>>>>>>>:3762 > >>>>>>>>>>>> > >>>>>>>>>>>>If I create the directories and the DIR_LOCK file before > >>>>>>>>>>>>running the import, everything works. > >>>>>>>>>>>> > >>>>>>>>>>>>The config file I'm using on the server is below. > >>>>>>>>>>>> > >>>>>>>>>>>>Is there some setting I'm missing which is needed > to create > >>>>>>>>>>>>the directories/DIR_LOCK file or does the code > need fixing? > >>>>>>>>>>>> > >>>>>>>>>>>>Thanks, > >>>>>>>>>>>>Chris > >>>>>>>>>>>> > >>>>>>>>>>>>-- > >>>>>>>>>>>>Chris Brew ([log in to unmask]) +44 1235 446326 > >>>>>>>>>>>>Particle Physics Department > >>>>>>>>>>>>Rutherford Appleton Laboratory > >>>>>>>>>>>>Chilton, Didcot. Oxfordshire. > >>>>>>>>>>>>OX11 0QX. United Kingdom. > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >> >