Good work Manny.
Could you check that if you create the parent directories the file
you're trying to stage in before you try to stage it and touch the
DIR_LOCK file in the directory. Then it's happy to stage the file in.
Thanks,
Chris.
> -----Original Message-----
> From: Emmanuel Olaiya [mailto:[log in to unmask]]
> Sent: 07 June 2005 22:31
> To: Andrew Hanushevsky
> Cc: Adye, TJ (Tim); Brew, CAJ (Chris);
> [log in to unmask]; Bill Weeks
> Subject: Re: PreStage Problems
>
> Hi Andy, Bill
>
> I took the versions of mps_Stage and mps_prep from
> /afs/slac/package/xrd/xrootd/utils. These are mps_Stage and mps_prep
> versions 1.9 and 1.8 respectively.
>
> I still see the problem Chris reported. Restarting the
> directors and the
> server (with prestaging on the server) I get the following message in
> the prestage log when asking for a file that doesn't exist at RAL
>
> Starting new cycle, pstg proc = 0
> 21:17:41 [ 17543] getlock: locking file
> >>/opt/xrootd/stageQ/PreStageQ.0.lock, flags 2
> 21:17:41 [ 17543] getlock: locking file
> +</opt/xrootd/stageQ/PreStageQ.0.old, flags 2
> 21:17:41 [ 17543] unlock: unlocking file
> /opt/xrootd/stageQ/PreStageQ.0.old
> 21:17:41 [ 17543] unlock: unlocking file
> /opt/xrootd/stageQ/PreStageQ.0.lock
> 21:17:41 [ 17543] getlock: locking file
> >>/opt/xrootd/stageQ/PreStageQ.1.lock, flags 2
> 21:17:41 [ 17543] unlock: unlocking file
> /opt/xrootd/stageQ/PreStageQ.1.lock
> 21:21:29 [ 17772] mps_Stage: cannot create 'store' in
> '/store/PRskims/R14/16.1.1b/BToPPP/58/'; Permission denied
> 21:21:29 [ 17772] mps_Stage: Invalid file system path,
> '/store/PRskims/R14/16.1.1b/BToPPP/58/'.
> 21:21:29 [ 17772] do_stagein: xfr failed for
> /store/PRskims/R14/16.1.1b/BToPPP/58/BToPPP_5831.01.root,
> rc=4, retry=1
>
> Whilst my job just hangs. If I take the log file literally,
> it is trying
> to write to /store when it should be trying to write to
> /base_directory/store.
>
> Doing further tests I can reproduce the problem I reported earlier.
> Whilst still asking for the above file I turn off staging,
> restart the
> directors and servers and the request for the file continues
> to hang (is
> told to wait). Then I make another request for the same file and this
> request is also continually told to wait:
>
> 050607 21:55:13 2915 odc_Locate: olaiya.8042:[log in to unmask]
> asked to
> wait 5 by xrootd107
> path=/store/PRskims/R14/16.1.1b/BToPPP/58/BToPPP_5831.01.root
> 050607 21:55:14 2915 odc_Locate:
> olaiya.23507:[log in to unmask] asked to
> wait 5 by xrootd107
> path=/store/PRskims/R14/16.1.1b/BToPPP/58/BToPPP_5831.01.root
> 050607 21:55:18 2915 odc_Locate: olaiya.8042:[log in to unmask]
> asked to
> wait 5 by xrootd107
> path=/store/PRskims/R14/16.1.1b/BToPPP/58/BToPPP_5831.01.root
> ...
>
>
> It is only after I kill the first request that anymore
> requests for this
> file return correctly with a message indicating that the file
> cannot be
> found.
>
> cheers
>
> Manny
>
> Andrew Hanushevsky wrote:
> > Hi Tim,
> >
> > Bill Weeks should have the fix available. You can also find
> the fixed mps
> > scripts in /afs/slac/package/xrd/xrootd/utils (I think you
> just need an
> > update for mps_Stage and mps_prep).
> >
> > Otherwise, the earliest time I can get together with Many
> is Monday. How
> > about the afternoon, say 1:30pm?
> >
> > Andy
> >
> > On Tue, 7 Jun 2005, Adye, TJ (Tim) wrote:
> >
> >
> >>Hi Guys,
> >>
> >>Did you manage to sort something out, despite the
> cancellation of the
> >>meeting? These are serious problems for us.
> >>
> >>Tim.
> >>
> >>
> >>>-----Original Message-----
> >>>From: [log in to unmask]
> >>>[mailto:[log in to unmask]] On Behalf Of
> >>>Emmanuel Olaiya
> >>>Sent: 06 June 2005 22:57
> >>>To: Andy Hanushevsky
> >>>Cc: Brew, CAJ (Chris); [log in to unmask]; Bill Weeks
> >>>Subject: Re: PreStage Problems
> >>>
> >>>Hi Andy
> >>>
> >>>Yes, it would be good if you could have a look at this with
> >>>me. We can
> >>>arrange a time in the xrootd meeting tomorrow.
> >>>
> >>>cheers
> >>>
> >>>Manny
> >>>
> >>>Andy Hanushevsky wrote:
> >>>
> >>>>Hi Manny,
> >>>>
> >>>>I find this is quite mysterious as this should never be the
> >>>
> >>>case and,
> >>>
> >>>>frankly, appears to violate causality. I suspect something
> >>>
> >>>else is going
> >>>
> >>>>on. If this is reproducible then why don't we run a test with all
> >>>>debugging turned on. Yes?
> >>>>
> >>>>Andy
> >>>>
> >>>>----- Original Message ----- From: "Emmanuel Olaiya"
> >>>
> >>><[log in to unmask]>
> >>>
> >>>>To: "Andrew Hanushevsky" <[log in to unmask]>
> >>>>Cc: "Brew, CAJ (Chris)" <[log in to unmask]>;
> >>>><[log in to unmask]>; "Bill Weeks" <[log in to unmask]>
> >>>>Sent: Monday, June 06, 2005 1:41 PM
> >>>>Subject: Re: PreStage Problems
> >>>>
> >>>>
> >>>>
> >>>>>Hi Andy
> >>>>>
> >>>>>I should have mentioned that we also remove the prestage
> queue and
> >>>>>restarted both the server and redirector. However the old
> >>>
> >>>request to
> >>>
> >>>>>wait did not change. Moreover, any similar new requests
> >>>
> >>>were also told
> >>>
> >>>>>to wait until the old request was terminated.
> >>>>>
> >>>>>cheers
> >>>>>
> >>>>>Manny
> >>>>>
> >>>>>Andrew Hanushevsky wrote:
> >>>>>
> >>>>>
> >>>>>>Hi Manny,
> >>>>>>
> >>>>>>Yes, but who telling the client to wait? The redirector
> >>>
> >>>or the server
> >>>
> >>>>>>that
> >>>>>>wanted to orginally stage the file in. When you restart the
> >>>>>>redirector it
> >>>>>>loses all it's memory but the data server does not. So,
> >>>
> >>>it will hapiily
> >>>
> >>>>>>tell the redirector that it has the file eventhough the file is
> >>>>>>merely in
> >>>>>>the pre-stage queue. As long as the file is in the
> >>>
> >>>prestage queue and
> >>>
> >>>>>>not on
> >>>>>>disk, the only option is to direct clients to where the
> >>>
> >>>file will be
> >>>
> >>>>>>staged in and then the clients simply wait for the file
> >>>
> >>>(which in this
> >>>
> >>>>>>case will never appear). So, if you remove staging you
> >>>
> >>>also need to
> >>>
> >>>>>>remove
> >>>>>>the prestage queue and restart the data server.
> >>>>>>
> >>>>>>Andy
> >>>>>>
> >>>>>>On Fri, 3 Jun 2005, Emmanuel Olaiya wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>Hi Andy
> >>>>>>>
> >>>>>>>One other issue we have spotted at RAL. We removed the staging
> >>>>>>>capabilities and restarted the director and server.
> >>>
> >>>However we found
> >>>
> >>>>>>>previous requests for a file that were told to wait
> >>>
> >>>continued being
> >>>
> >>>>>>>told
> >>>>>>>to wait. We also found that if somebody else asked for
> >>>
> >>>this same file
> >>>
> >>>>>>>that was not on disk they were also told to wait rather
> >>>
> >>>than being told
> >>>
> >>>>>>>the file could not be found. We needed to kill the
> >>>
> >>>previous request and
> >>>
> >>>>>>>restart the server and directory for xrootd to know the
> >>>
> >>>file was not on
> >>>
> >>>>>>>disk.
> >>>>>>>
> >>>>>>>cheers
> >>>>>>>
> >>>>>>>Manny
> >>>>>>>
> >>>>>>>Andrew Hanushevsky wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>>Hi Chris,
> >>>>>>>>
> >>>>>>>>Oh yeah, different problem. I think that Bill Weeks
> fixed that.
> >>>>>>>>Bill did
> >>>>>>>>you fix that problem?
> >>>>>>>>
> >>>>>>>>Andy
> >>>>>>>>
> >>>>>>>>On Mon, 30 May 2005, Brew, CAJ (Chris) wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>Hi,
> >>>>>>>>>
> >>>>>>>>>I might be being stupid but I don't see how this
> >>>
> >>>relates to the
> >>>
> >>>>>>>>>problem.
> >>>>>>>>>The files I wanted were on a different disk server
> >>>
> >>>which then went
> >>>
> >>>>>>>>>down.
> >>>>>>>>>The server in question was registered with the OLB as
> >>>
> >>>being able to
> >>>
> >>>>>>>>>stage in the name space so the request was
> redirected to it. If
> >>>>>>>>>mps_Stage is used without the PreStage queuing system
> >>>
> >>>everything
> >>>
> >>>>>>>>>works
> >>>>>>>>>as expected. If we try to go through the PreStage
> >>>
> >>>queue to limit the
> >>>
> >>>>>>>>>number of concurrent accesses to the tapestore the
> >>>
> >>>stage in fails.
> >>>
> >>>>>>>>>Apparently because the DIR_LOCK file does not exist (which it
> >>>>>>>>>doesn't,
> >>>>>>>>>since the file, and it's directory structure, has
> >>>
> >>>never existed on
> >>>
> >>>>>>>>>this
> >>>>>>>>>server).
> >>>>>>>>>
> >>>>>>>>>Yours,
> >>>>>>>>>Chris.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>-----Original Message-----
> >>>>>>>>>>From: Andrew Hanushevsky [mailto:[log in to unmask]]
> >>>>>>>>>>Sent: 28 May 2005 07:39
> >>>>>>>>>>To: Brew, CAJ (Chris)
> >>>>>>>>>>Cc: [log in to unmask]; abh; Olaiya, EO (Emmanuel)
> >>>>>>>>>>Subject: RE: PreStage Problems
> >>>>>>>>>>
> >>>>>>>>>>Hi Chris,
> >>>>>>>>>>
> >>>>>>>>>>This was traced to overzealous testing. The syustem does not
> >>>>>>>>>>put in a new
> >>>>>>>>>>entry in the pre-stage queue until after about 10-20 minutes
> >>>>>>>>>>have elapsed
> >>>>>>>>>>since the last time the entry was added. So, this is not a
> >>>>>>>>>>bug but a test
> >>>>>>>>>>case that was not "real". Generally, files live in the disk
> >>>>>>>>>>cache for at
> >>>>>>>>>>least 10-20 minutes.
> >>>>>>>>>>
> >>>>>>>>>>Andy
> >>>>>>>>>>
> >>>>>>>>>>On Fri, 27 May 2005, Brew, CAJ (Chris) wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>Hi,
> >>>>>>>>>>>
> >>>>>>>>>>>At the meeting a couple of weeks ago, it was said
> >>>
> >>>that someone was
> >>>
> >>>>>>>>>>>looking into this but I haven't heard anything back. Is
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>there any new?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>Thanks,
> >>>>>>>>>>>Chris.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>-----Original Message-----
> >>>>>>>>>>>>From: Brew, CAJ (Chris)
> >>>>>>>>>>>>Sent: 17 May 2005 13:50
> >>>>>>>>>>>>To: [log in to unmask]; abh
> >>>>>>>>>>>>Cc: Olaiya, EO (Emmanuel)
> >>>>>>>>>>>>Subject: PreStage Problems
> >>>>>>>>>>>>
> >>>>>>>>>>>>Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>>I've been running some more tests of the staging
> at RAL and
> >>>>>>>>>>>>have run into a problem somewhere in the
> >>>>>>>>>>>>mps_Stage/PreStage/prep system.
> >>>>>>>>>>>>
> >>>>>>>>>>>>Everything work fine staging file that was on the
> system and
> >>>>>>>>>>>>has been deleted but if I try to stage in a file
> >>>
> >>>that was one
> >>>
> >>>>>>>>>>>>a different server, hence the directory structure for the
> >>>>>>>>>>>>file does not exist on the staging server it
> fails and I see
> >>>>>>>>>>>>the following error in the PreStage log file:
> >>>>>>>>>>>>
> >>>>>>>>>>>>12:45:43 [ 10859] mps_Stage: Open
> >>>>>>>>>>>>
> >>>
> >>>'/stage/bdata-data50/kanga//store/SPskims/R12/16.0.2e/BtoKKKL/
> >>>
> >>>>>>>>>>>>001005/200002/DIR_LOCK' r/w failed; No such file or
> >>>
> >>>directory.
> >>>
> >>>>>>>>>>>>12:45:43 [ 10859] do_stagein: xfr failed for
> >>>>>>>>>>>>
> >>>
> >>>/store/SPskims/R12/16.0.2e/BtoKKKL/001005/200002/BtoKKKL_00100
> >>>
> >>>>>>>>>>>>5_3247.01.root, rc=4, retry=1
> >>>>>>>>>>>>12:45:45 [ 3255]
> >>>>>>>>>>>>
> >>>
> >>>file=/store/SPskims/R12/16.0.2e/BtoKKKL/001005/200002/BtoKKKL_
> >>>
> >>>>>>>>>>>>0010053247.01.root, rc=1024, reqid=ef000001:1cd2.425d27e1
> >>>>>>>>>>>>:3762
> >>>>>>>>>>>>
> >>>>>>>>>>>>If I create the directories and the DIR_LOCK file before
> >>>>>>>>>>>>running the import, everything works.
> >>>>>>>>>>>>
> >>>>>>>>>>>>The config file I'm using on the server is below.
> >>>>>>>>>>>>
> >>>>>>>>>>>>Is there some setting I'm missing which is needed
> to create
> >>>>>>>>>>>>the directories/DIR_LOCK file or does the code
> need fixing?
> >>>>>>>>>>>>
> >>>>>>>>>>>>Thanks,
> >>>>>>>>>>>>Chris
> >>>>>>>>>>>>
> >>>>>>>>>>>>--
> >>>>>>>>>>>>Chris Brew ([log in to unmask]) +44 1235 446326
> >>>>>>>>>>>>Particle Physics Department
> >>>>>>>>>>>>Rutherford Appleton Laboratory
> >>>>>>>>>>>>Chilton, Didcot. Oxfordshire.
> >>>>>>>>>>>>OX11 0QX. United Kingdom.
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>
>
|