Hi Andy, Bill I took the versions of mps_Stage and mps_prep from /afs/slac/package/xrd/xrootd/utils. These are mps_Stage and mps_prep versions 1.9 and 1.8 respectively. I still see the problem Chris reported. Restarting the directors and the server (with prestaging on the server) I get the following message in the prestage log when asking for a file that doesn't exist at RAL Starting new cycle, pstg proc = 0 21:17:41 [ 17543] getlock: locking file >>/opt/xrootd/stageQ/PreStageQ.0.lock, flags 2 21:17:41 [ 17543] getlock: locking file +</opt/xrootd/stageQ/PreStageQ.0.old, flags 2 21:17:41 [ 17543] unlock: unlocking file /opt/xrootd/stageQ/PreStageQ.0.old 21:17:41 [ 17543] unlock: unlocking file /opt/xrootd/stageQ/PreStageQ.0.lock 21:17:41 [ 17543] getlock: locking file >>/opt/xrootd/stageQ/PreStageQ.1.lock, flags 2 21:17:41 [ 17543] unlock: unlocking file /opt/xrootd/stageQ/PreStageQ.1.lock 21:21:29 [ 17772] mps_Stage: cannot create 'store' in '/store/PRskims/R14/16.1.1b/BToPPP/58/'; Permission denied 21:21:29 [ 17772] mps_Stage: Invalid file system path, '/store/PRskims/R14/16.1.1b/BToPPP/58/'. 21:21:29 [ 17772] do_stagein: xfr failed for /store/PRskims/R14/16.1.1b/BToPPP/58/BToPPP_5831.01.root, rc=4, retry=1 Whilst my job just hangs. If I take the log file literally, it is trying to write to /store when it should be trying to write to /base_directory/store. Doing further tests I can reproduce the problem I reported earlier. Whilst still asking for the above file I turn off staging, restart the directors and servers and the request for the file continues to hang (is told to wait). Then I make another request for the same file and this request is also continually told to wait: 050607 21:55:13 2915 odc_Locate: olaiya.8042:[log in to unmask] asked to wait 5 by xrootd107 path=/store/PRskims/R14/16.1.1b/BToPPP/58/BToPPP_5831.01.root 050607 21:55:14 2915 odc_Locate: olaiya.23507:[log in to unmask] asked to wait 5 by xrootd107 path=/store/PRskims/R14/16.1.1b/BToPPP/58/BToPPP_5831.01.root 050607 21:55:18 2915 odc_Locate: olaiya.8042:[log in to unmask] asked to wait 5 by xrootd107 path=/store/PRskims/R14/16.1.1b/BToPPP/58/BToPPP_5831.01.root ... It is only after I kill the first request that anymore requests for this file return correctly with a message indicating that the file cannot be found. cheers Manny Andrew Hanushevsky wrote: > Hi Tim, > > Bill Weeks should have the fix available. You can also find the fixed mps > scripts in /afs/slac/package/xrd/xrootd/utils (I think you just need an > update for mps_Stage and mps_prep). > > Otherwise, the earliest time I can get together with Many is Monday. How > about the afternoon, say 1:30pm? > > Andy > > On Tue, 7 Jun 2005, Adye, TJ (Tim) wrote: > > >>Hi Guys, >> >>Did you manage to sort something out, despite the cancellation of the >>meeting? These are serious problems for us. >> >>Tim. >> >> >>>-----Original Message----- >>>From: [log in to unmask] >>>[mailto:[log in to unmask]] On Behalf Of >>>Emmanuel Olaiya >>>Sent: 06 June 2005 22:57 >>>To: Andy Hanushevsky >>>Cc: Brew, CAJ (Chris); [log in to unmask]; Bill Weeks >>>Subject: Re: PreStage Problems >>> >>>Hi Andy >>> >>>Yes, it would be good if you could have a look at this with >>>me. We can >>>arrange a time in the xrootd meeting tomorrow. >>> >>>cheers >>> >>>Manny >>> >>>Andy Hanushevsky wrote: >>> >>>>Hi Manny, >>>> >>>>I find this is quite mysterious as this should never be the >>> >>>case and, >>> >>>>frankly, appears to violate causality. I suspect something >>> >>>else is going >>> >>>>on. If this is reproducible then why don't we run a test with all >>>>debugging turned on. Yes? >>>> >>>>Andy >>>> >>>>----- Original Message ----- From: "Emmanuel Olaiya" >>> >>><[log in to unmask]> >>> >>>>To: "Andrew Hanushevsky" <[log in to unmask]> >>>>Cc: "Brew, CAJ (Chris)" <[log in to unmask]>; >>>><[log in to unmask]>; "Bill Weeks" <[log in to unmask]> >>>>Sent: Monday, June 06, 2005 1:41 PM >>>>Subject: Re: PreStage Problems >>>> >>>> >>>> >>>>>Hi Andy >>>>> >>>>>I should have mentioned that we also remove the prestage queue and >>>>>restarted both the server and redirector. However the old >>> >>>request to >>> >>>>>wait did not change. Moreover, any similar new requests >>> >>>were also told >>> >>>>>to wait until the old request was terminated. >>>>> >>>>>cheers >>>>> >>>>>Manny >>>>> >>>>>Andrew Hanushevsky wrote: >>>>> >>>>> >>>>>>Hi Manny, >>>>>> >>>>>>Yes, but who telling the client to wait? The redirector >>> >>>or the server >>> >>>>>>that >>>>>>wanted to orginally stage the file in. When you restart the >>>>>>redirector it >>>>>>loses all it's memory but the data server does not. So, >>> >>>it will hapiily >>> >>>>>>tell the redirector that it has the file eventhough the file is >>>>>>merely in >>>>>>the pre-stage queue. As long as the file is in the >>> >>>prestage queue and >>> >>>>>>not on >>>>>>disk, the only option is to direct clients to where the >>> >>>file will be >>> >>>>>>staged in and then the clients simply wait for the file >>> >>>(which in this >>> >>>>>>case will never appear). So, if you remove staging you >>> >>>also need to >>> >>>>>>remove >>>>>>the prestage queue and restart the data server. >>>>>> >>>>>>Andy >>>>>> >>>>>>On Fri, 3 Jun 2005, Emmanuel Olaiya wrote: >>>>>> >>>>>> >>>>>> >>>>>>>Hi Andy >>>>>>> >>>>>>>One other issue we have spotted at RAL. We removed the staging >>>>>>>capabilities and restarted the director and server. >>> >>>However we found >>> >>>>>>>previous requests for a file that were told to wait >>> >>>continued being >>> >>>>>>>told >>>>>>>to wait. We also found that if somebody else asked for >>> >>>this same file >>> >>>>>>>that was not on disk they were also told to wait rather >>> >>>than being told >>> >>>>>>>the file could not be found. We needed to kill the >>> >>>previous request and >>> >>>>>>>restart the server and directory for xrootd to know the >>> >>>file was not on >>> >>>>>>>disk. >>>>>>> >>>>>>>cheers >>>>>>> >>>>>>>Manny >>>>>>> >>>>>>>Andrew Hanushevsky wrote: >>>>>>> >>>>>>> >>>>>>>>Hi Chris, >>>>>>>> >>>>>>>>Oh yeah, different problem. I think that Bill Weeks fixed that. >>>>>>>>Bill did >>>>>>>>you fix that problem? >>>>>>>> >>>>>>>>Andy >>>>>>>> >>>>>>>>On Mon, 30 May 2005, Brew, CAJ (Chris) wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>Hi, >>>>>>>>> >>>>>>>>>I might be being stupid but I don't see how this >>> >>>relates to the >>> >>>>>>>>>problem. >>>>>>>>>The files I wanted were on a different disk server >>> >>>which then went >>> >>>>>>>>>down. >>>>>>>>>The server in question was registered with the OLB as >>> >>>being able to >>> >>>>>>>>>stage in the name space so the request was redirected to it. If >>>>>>>>>mps_Stage is used without the PreStage queuing system >>> >>>everything >>> >>>>>>>>>works >>>>>>>>>as expected. If we try to go through the PreStage >>> >>>queue to limit the >>> >>>>>>>>>number of concurrent accesses to the tapestore the >>> >>>stage in fails. >>> >>>>>>>>>Apparently because the DIR_LOCK file does not exist (which it >>>>>>>>>doesn't, >>>>>>>>>since the file, and it's directory structure, has >>> >>>never existed on >>> >>>>>>>>>this >>>>>>>>>server). >>>>>>>>> >>>>>>>>>Yours, >>>>>>>>>Chris. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>-----Original Message----- >>>>>>>>>>From: Andrew Hanushevsky [mailto:[log in to unmask]] >>>>>>>>>>Sent: 28 May 2005 07:39 >>>>>>>>>>To: Brew, CAJ (Chris) >>>>>>>>>>Cc: [log in to unmask]; abh; Olaiya, EO (Emmanuel) >>>>>>>>>>Subject: RE: PreStage Problems >>>>>>>>>> >>>>>>>>>>Hi Chris, >>>>>>>>>> >>>>>>>>>>This was traced to overzealous testing. The syustem does not >>>>>>>>>>put in a new >>>>>>>>>>entry in the pre-stage queue until after about 10-20 minutes >>>>>>>>>>have elapsed >>>>>>>>>>since the last time the entry was added. So, this is not a >>>>>>>>>>bug but a test >>>>>>>>>>case that was not "real". Generally, files live in the disk >>>>>>>>>>cache for at >>>>>>>>>>least 10-20 minutes. >>>>>>>>>> >>>>>>>>>>Andy >>>>>>>>>> >>>>>>>>>>On Fri, 27 May 2005, Brew, CAJ (Chris) wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>Hi, >>>>>>>>>>> >>>>>>>>>>>At the meeting a couple of weeks ago, it was said >>> >>>that someone was >>> >>>>>>>>>>>looking into this but I haven't heard anything back. Is >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>there any new? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>Thanks, >>>>>>>>>>>Chris. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>-----Original Message----- >>>>>>>>>>>>From: Brew, CAJ (Chris) >>>>>>>>>>>>Sent: 17 May 2005 13:50 >>>>>>>>>>>>To: [log in to unmask]; abh >>>>>>>>>>>>Cc: Olaiya, EO (Emmanuel) >>>>>>>>>>>>Subject: PreStage Problems >>>>>>>>>>>> >>>>>>>>>>>>Hi, >>>>>>>>>>>> >>>>>>>>>>>>I've been running some more tests of the staging at RAL and >>>>>>>>>>>>have run into a problem somewhere in the >>>>>>>>>>>>mps_Stage/PreStage/prep system. >>>>>>>>>>>> >>>>>>>>>>>>Everything work fine staging file that was on the system and >>>>>>>>>>>>has been deleted but if I try to stage in a file >>> >>>that was one >>> >>>>>>>>>>>>a different server, hence the directory structure for the >>>>>>>>>>>>file does not exist on the staging server it fails and I see >>>>>>>>>>>>the following error in the PreStage log file: >>>>>>>>>>>> >>>>>>>>>>>>12:45:43 [ 10859] mps_Stage: Open >>>>>>>>>>>> >>> >>>'/stage/bdata-data50/kanga//store/SPskims/R12/16.0.2e/BtoKKKL/ >>> >>>>>>>>>>>>001005/200002/DIR_LOCK' r/w failed; No such file or >>> >>>directory. >>> >>>>>>>>>>>>12:45:43 [ 10859] do_stagein: xfr failed for >>>>>>>>>>>> >>> >>>/store/SPskims/R12/16.0.2e/BtoKKKL/001005/200002/BtoKKKL_00100 >>> >>>>>>>>>>>>5_3247.01.root, rc=4, retry=1 >>>>>>>>>>>>12:45:45 [ 3255] >>>>>>>>>>>> >>> >>>file=/store/SPskims/R12/16.0.2e/BtoKKKL/001005/200002/BtoKKKL_ >>> >>>>>>>>>>>>0010053247.01.root, rc=1024, reqid=ef000001:1cd2.425d27e1 >>>>>>>>>>>>:3762 >>>>>>>>>>>> >>>>>>>>>>>>If I create the directories and the DIR_LOCK file before >>>>>>>>>>>>running the import, everything works. >>>>>>>>>>>> >>>>>>>>>>>>The config file I'm using on the server is below. >>>>>>>>>>>> >>>>>>>>>>>>Is there some setting I'm missing which is needed to create >>>>>>>>>>>>the directories/DIR_LOCK file or does the code need fixing? >>>>>>>>>>>> >>>>>>>>>>>>Thanks, >>>>>>>>>>>>Chris >>>>>>>>>>>> >>>>>>>>>>>>-- >>>>>>>>>>>>Chris Brew ([log in to unmask]) +44 1235 446326 >>>>>>>>>>>>Particle Physics Department >>>>>>>>>>>>Rutherford Appleton Laboratory >>>>>>>>>>>>Chilton, Didcot. Oxfordshire. >>>>>>>>>>>>OX11 0QX. United Kingdom. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>