Hi Guys,
Did you manage to sort something out, despite the cancellation of the
meeting? These are serious problems for us.
Tim.
> -----Original Message-----
> From: [log in to unmask]
> [mailto:[log in to unmask]] On Behalf Of
> Emmanuel Olaiya
> Sent: 06 June 2005 22:57
> To: Andy Hanushevsky
> Cc: Brew, CAJ (Chris); [log in to unmask]; Bill Weeks
> Subject: Re: PreStage Problems
>
> Hi Andy
>
> Yes, it would be good if you could have a look at this with
> me. We can
> arrange a time in the xrootd meeting tomorrow.
>
> cheers
>
> Manny
>
> Andy Hanushevsky wrote:
> > Hi Manny,
> >
> > I find this is quite mysterious as this should never be the
> case and,
> > frankly, appears to violate causality. I suspect something
> else is going
> > on. If this is reproducible then why don't we run a test with all
> > debugging turned on. Yes?
> >
> > Andy
> >
> > ----- Original Message ----- From: "Emmanuel Olaiya"
> <[log in to unmask]>
> > To: "Andrew Hanushevsky" <[log in to unmask]>
> > Cc: "Brew, CAJ (Chris)" <[log in to unmask]>;
> > <[log in to unmask]>; "Bill Weeks" <[log in to unmask]>
> > Sent: Monday, June 06, 2005 1:41 PM
> > Subject: Re: PreStage Problems
> >
> >
> >> Hi Andy
> >>
> >> I should have mentioned that we also remove the prestage queue and
> >> restarted both the server and redirector. However the old
> request to
> >> wait did not change. Moreover, any similar new requests
> were also told
> >> to wait until the old request was terminated.
> >>
> >> cheers
> >>
> >> Manny
> >>
> >> Andrew Hanushevsky wrote:
> >>
> >>> Hi Manny,
> >>>
> >>> Yes, but who telling the client to wait? The redirector
> or the server
> >>> that
> >>> wanted to orginally stage the file in. When you restart the
> >>> redirector it
> >>> loses all it's memory but the data server does not. So,
> it will hapiily
> >>> tell the redirector that it has the file eventhough the file is
> >>> merely in
> >>> the pre-stage queue. As long as the file is in the
> prestage queue and
> >>> not on
> >>> disk, the only option is to direct clients to where the
> file will be
> >>> staged in and then the clients simply wait for the file
> (which in this
> >>> case will never appear). So, if you remove staging you
> also need to
> >>> remove
> >>> the prestage queue and restart the data server.
> >>>
> >>> Andy
> >>>
> >>> On Fri, 3 Jun 2005, Emmanuel Olaiya wrote:
> >>>
> >>>
> >>>> Hi Andy
> >>>>
> >>>> One other issue we have spotted at RAL. We removed the staging
> >>>> capabilities and restarted the director and server.
> However we found
> >>>> previous requests for a file that were told to wait
> continued being
> >>>> told
> >>>> to wait. We also found that if somebody else asked for
> this same file
> >>>> that was not on disk they were also told to wait rather
> than being told
> >>>> the file could not be found. We needed to kill the
> previous request and
> >>>> restart the server and directory for xrootd to know the
> file was not on
> >>>> disk.
> >>>>
> >>>> cheers
> >>>>
> >>>> Manny
> >>>>
> >>>> Andrew Hanushevsky wrote:
> >>>>
> >>>>> Hi Chris,
> >>>>>
> >>>>> Oh yeah, different problem. I think that Bill Weeks fixed that.
> >>>>> Bill did
> >>>>> you fix that problem?
> >>>>>
> >>>>> Andy
> >>>>>
> >>>>> On Mon, 30 May 2005, Brew, CAJ (Chris) wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I might be being stupid but I don't see how this
> relates to the
> >>>>>> problem.
> >>>>>> The files I wanted were on a different disk server
> which then went
> >>>>>> down.
> >>>>>> The server in question was registered with the OLB as
> being able to
> >>>>>> stage in the name space so the request was redirected to it. If
> >>>>>> mps_Stage is used without the PreStage queuing system
> everything
> >>>>>> works
> >>>>>> as expected. If we try to go through the PreStage
> queue to limit the
> >>>>>> number of concurrent accesses to the tapestore the
> stage in fails.
> >>>>>> Apparently because the DIR_LOCK file does not exist (which it
> >>>>>> doesn't,
> >>>>>> since the file, and it's directory structure, has
> never existed on
> >>>>>> this
> >>>>>> server).
> >>>>>>
> >>>>>> Yours,
> >>>>>> Chris.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Andrew Hanushevsky [mailto:[log in to unmask]]
> >>>>>>> Sent: 28 May 2005 07:39
> >>>>>>> To: Brew, CAJ (Chris)
> >>>>>>> Cc: [log in to unmask]; abh; Olaiya, EO (Emmanuel)
> >>>>>>> Subject: RE: PreStage Problems
> >>>>>>>
> >>>>>>> Hi Chris,
> >>>>>>>
> >>>>>>> This was traced to overzealous testing. The syustem does not
> >>>>>>> put in a new
> >>>>>>> entry in the pre-stage queue until after about 10-20 minutes
> >>>>>>> have elapsed
> >>>>>>> since the last time the entry was added. So, this is not a
> >>>>>>> bug but a test
> >>>>>>> case that was not "real". Generally, files live in the disk
> >>>>>>> cache for at
> >>>>>>> least 10-20 minutes.
> >>>>>>>
> >>>>>>> Andy
> >>>>>>>
> >>>>>>> On Fri, 27 May 2005, Brew, CAJ (Chris) wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> At the meeting a couple of weeks ago, it was said
> that someone was
> >>>>>>>> looking into this but I haven't heard anything back. Is
> >>>>>>>
> >>>>>>>
> >>>>>>> there any new?
> >>>>>>>
> >>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Chris.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: Brew, CAJ (Chris)
> >>>>>>>>> Sent: 17 May 2005 13:50
> >>>>>>>>> To: [log in to unmask]; abh
> >>>>>>>>> Cc: Olaiya, EO (Emmanuel)
> >>>>>>>>> Subject: PreStage Problems
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I've been running some more tests of the staging at RAL and
> >>>>>>>>> have run into a problem somewhere in the
> >>>>>>>>> mps_Stage/PreStage/prep system.
> >>>>>>>>>
> >>>>>>>>> Everything work fine staging file that was on the system and
> >>>>>>>>> has been deleted but if I try to stage in a file
> that was one
> >>>>>>>>> a different server, hence the directory structure for the
> >>>>>>>>> file does not exist on the staging server it fails and I see
> >>>>>>>>> the following error in the PreStage log file:
> >>>>>>>>>
> >>>>>>>>> 12:45:43 [ 10859] mps_Stage: Open
> >>>>>>>>>
> '/stage/bdata-data50/kanga//store/SPskims/R12/16.0.2e/BtoKKKL/
> >>>>>>>>> 001005/200002/DIR_LOCK' r/w failed; No such file or
> directory.
> >>>>>>>>> 12:45:43 [ 10859] do_stagein: xfr failed for
> >>>>>>>>>
> /store/SPskims/R12/16.0.2e/BtoKKKL/001005/200002/BtoKKKL_00100
> >>>>>>>>> 5_3247.01.root, rc=4, retry=1
> >>>>>>>>> 12:45:45 [ 3255]
> >>>>>>>>>
> file=/store/SPskims/R12/16.0.2e/BtoKKKL/001005/200002/BtoKKKL_
> >>>>>>>>> 0010053247.01.root, rc=1024, reqid=ef000001:1cd2.425d27e1
> >>>>>>>>> :3762
> >>>>>>>>>
> >>>>>>>>> If I create the directories and the DIR_LOCK file before
> >>>>>>>>> running the import, everything works.
> >>>>>>>>>
> >>>>>>>>> The config file I'm using on the server is below.
> >>>>>>>>>
> >>>>>>>>> Is there some setting I'm missing which is needed to create
> >>>>>>>>> the directories/DIR_LOCK file or does the code need fixing?
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Chris
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Chris Brew ([log in to unmask]) +44 1235 446326
> >>>>>>>>> Particle Physics Department
> >>>>>>>>> Rutherford Appleton Laboratory
> >>>>>>>>> Chilton, Didcot. Oxfordshire.
> >>>>>>>>> OX11 0QX. United Kingdom.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>
> >
>
|