Print

Print


Hi Guys,

Did you manage to sort something out, despite the cancellation of the
meeting? These are serious problems for us.

Tim. 

> -----Original Message-----
> From: [log in to unmask] 
> [mailto:[log in to unmask]] On Behalf Of 
> Emmanuel Olaiya
> Sent: 06 June 2005 22:57
> To: Andy Hanushevsky
> Cc: Brew, CAJ (Chris); [log in to unmask]; Bill Weeks
> Subject: Re: PreStage Problems
> 
> Hi Andy
> 
> Yes, it would be good if you could have a look at this with 
> me. We can 
> arrange a time in the xrootd meeting tomorrow.
> 
> cheers
> 
> Manny
> 
> Andy Hanushevsky wrote:
> > Hi Manny,
> > 
> > I find this is quite mysterious as this should never be the 
> case and, 
> > frankly, appears to violate causality. I suspect something 
> else is going 
> > on. If this is reproducible then why don't we run a test with all 
> > debugging turned on. Yes?
> > 
> > Andy
> > 
> > ----- Original Message ----- From: "Emmanuel Olaiya" 
> <[log in to unmask]>
> > To: "Andrew Hanushevsky" <[log in to unmask]>
> > Cc: "Brew, CAJ (Chris)" <[log in to unmask]>; 
> > <[log in to unmask]>; "Bill Weeks" <[log in to unmask]>
> > Sent: Monday, June 06, 2005 1:41 PM
> > Subject: Re: PreStage Problems
> > 
> > 
> >> Hi Andy
> >>
> >> I should have mentioned that we also remove the prestage queue and 
> >> restarted both the server and redirector. However the old 
> request to 
> >> wait did not change. Moreover, any similar new requests 
> were also told 
> >> to wait until the old request was terminated.
> >>
> >> cheers
> >>
> >> Manny
> >>
> >> Andrew Hanushevsky wrote:
> >>
> >>> Hi Manny,
> >>>
> >>> Yes, but who telling the client to wait? The redirector 
> or the server 
> >>> that
> >>> wanted to orginally stage the file in. When you restart the 
> >>> redirector it
> >>> loses all it's memory but the data server does not. So, 
> it will hapiily
> >>> tell the redirector that it has the file eventhough the file is 
> >>> merely in
> >>> the pre-stage queue. As long as the file is in the 
> prestage queue and 
> >>> not on
> >>> disk, the only option is to direct clients to where the 
> file will be
> >>> staged in and then the clients simply wait for the file 
> (which in this
> >>> case will never appear). So, if you remove staging you 
> also need to 
> >>> remove
> >>> the prestage queue and restart the data server.
> >>>
> >>> Andy
> >>>
> >>> On Fri, 3 Jun 2005, Emmanuel Olaiya wrote:
> >>>
> >>>
> >>>> Hi Andy
> >>>>
> >>>> One other issue we have spotted at RAL. We removed the staging
> >>>> capabilities and restarted the director and server. 
> However we found
> >>>> previous requests for a file that were told to wait 
> continued being 
> >>>> told
> >>>> to wait. We also found that if somebody else asked for 
> this same file
> >>>> that was not on disk they were also told to wait rather 
> than being told
> >>>> the file could not be found. We needed to kill the 
> previous request and
> >>>> restart the server and directory for xrootd to know the 
> file was not on
> >>>> disk.
> >>>>
> >>>> cheers
> >>>>
> >>>> Manny
> >>>>
> >>>> Andrew Hanushevsky wrote:
> >>>>
> >>>>> Hi Chris,
> >>>>>
> >>>>> Oh yeah, different problem. I think that Bill Weeks fixed that. 
> >>>>> Bill did
> >>>>> you fix that problem?
> >>>>>
> >>>>> Andy
> >>>>>
> >>>>> On Mon, 30 May 2005, Brew, CAJ (Chris) wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I might be being stupid but I don't see how this 
> relates to the 
> >>>>>> problem.
> >>>>>> The files I wanted were on a different disk server 
> which then went 
> >>>>>> down.
> >>>>>> The server in question was registered with the OLB as 
> being able to
> >>>>>> stage in the name space so the request was redirected to it. If
> >>>>>> mps_Stage is used without the PreStage queuing system 
> everything 
> >>>>>> works
> >>>>>> as expected. If we try to go through the PreStage 
> queue to limit the
> >>>>>> number of concurrent accesses to the tapestore the 
> stage in fails.
> >>>>>> Apparently because the DIR_LOCK file does not exist (which it 
> >>>>>> doesn't,
> >>>>>> since the file, and it's directory structure, has 
> never existed on 
> >>>>>> this
> >>>>>> server).
> >>>>>>
> >>>>>> Yours,
> >>>>>> Chris.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Andrew Hanushevsky [mailto:[log in to unmask]]
> >>>>>>> Sent: 28 May 2005 07:39
> >>>>>>> To: Brew, CAJ (Chris)
> >>>>>>> Cc: [log in to unmask]; abh; Olaiya, EO (Emmanuel)
> >>>>>>> Subject: RE: PreStage Problems
> >>>>>>>
> >>>>>>> Hi Chris,
> >>>>>>>
> >>>>>>> This was traced to overzealous testing. The syustem does not
> >>>>>>> put in a new
> >>>>>>> entry in the pre-stage queue until after about 10-20 minutes
> >>>>>>> have elapsed
> >>>>>>> since the last time the entry was added. So, this is not a
> >>>>>>> bug but a test
> >>>>>>> case that was not "real". Generally, files live in the disk
> >>>>>>> cache for at
> >>>>>>> least 10-20 minutes.
> >>>>>>>
> >>>>>>> Andy
> >>>>>>>
> >>>>>>> On Fri, 27 May 2005, Brew, CAJ (Chris) wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> At the meeting a couple of weeks ago, it was said 
> that someone was
> >>>>>>>> looking into this but I haven't heard anything back. Is
> >>>>>>>
> >>>>>>>
> >>>>>>> there any new?
> >>>>>>>
> >>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Chris.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: Brew, CAJ (Chris)
> >>>>>>>>> Sent: 17 May 2005 13:50
> >>>>>>>>> To: [log in to unmask]; abh
> >>>>>>>>> Cc: Olaiya, EO (Emmanuel)
> >>>>>>>>> Subject: PreStage Problems
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I've been running some more tests of the staging at RAL and
> >>>>>>>>> have run into a problem somewhere in the
> >>>>>>>>> mps_Stage/PreStage/prep system.
> >>>>>>>>>
> >>>>>>>>> Everything work fine staging file that was on the system and
> >>>>>>>>> has been deleted but if I try to stage in a file 
> that was one
> >>>>>>>>> a different server, hence the directory structure for the
> >>>>>>>>> file does not exist on the staging server it fails and I see
> >>>>>>>>> the following error in the PreStage log file:
> >>>>>>>>>
> >>>>>>>>> 12:45:43 [ 10859] mps_Stage: Open
> >>>>>>>>> 
> '/stage/bdata-data50/kanga//store/SPskims/R12/16.0.2e/BtoKKKL/
> >>>>>>>>> 001005/200002/DIR_LOCK' r/w failed; No such file or 
> directory.
> >>>>>>>>> 12:45:43 [ 10859] do_stagein: xfr failed for
> >>>>>>>>> 
> /store/SPskims/R12/16.0.2e/BtoKKKL/001005/200002/BtoKKKL_00100
> >>>>>>>>> 5_3247.01.root, rc=4, retry=1
> >>>>>>>>> 12:45:45 [  3255]
> >>>>>>>>> 
> file=/store/SPskims/R12/16.0.2e/BtoKKKL/001005/200002/BtoKKKL_
> >>>>>>>>> 0010053247.01.root, rc=1024, reqid=ef000001:1cd2.425d27e1
> >>>>>>>>> :3762
> >>>>>>>>>
> >>>>>>>>> If I create the directories and the DIR_LOCK file before
> >>>>>>>>> running the import, everything works.
> >>>>>>>>>
> >>>>>>>>> The config file I'm using on the server is below.
> >>>>>>>>>
> >>>>>>>>> Is there some setting I'm missing which is needed to create
> >>>>>>>>> the directories/DIR_LOCK file or does the code need fixing?
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Chris
> >>>>>>>>>
> >>>>>>>>> -- 
> >>>>>>>>> Chris Brew  ([log in to unmask])  +44 1235 446326
> >>>>>>>>> Particle Physics Department
> >>>>>>>>> Rutherford Appleton Laboratory
> >>>>>>>>> Chilton, Didcot. Oxfordshire.
> >>>>>>>>> OX11 0QX. United Kingdom.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>
> > 
>