Hi Manny, No problem other than the rest of the week is tied up -- Seattle Tue/Wed, Oakland/Thu, home Fri. Thigh looking at log files is always possible here. Andy ----- Original Message ----- From: "Emmanuel Olaiya" <[log in to unmask]> To: "Andy Hanushevsky" <[log in to unmask]> Cc: "Brew, CAJ (Chris)" <[log in to unmask]>; <[log in to unmask]>; "Bill Weeks" <[log in to unmask]> Sent: Monday, June 06, 2005 2:57 PM Subject: Re: PreStage Problems > Hi Andy > > Yes, it would be good if you could have a look at this with me. We can > arrange a time in the xrootd meeting tomorrow. > > cheers > > Manny > > Andy Hanushevsky wrote: >> Hi Manny, >> >> I find this is quite mysterious as this should never be the case and, >> frankly, appears to violate causality. I suspect something else is going >> on. If this is reproducible then why don't we run a test with all >> debugging turned on. Yes? >> >> Andy >> >> ----- Original Message ----- From: "Emmanuel Olaiya" <[log in to unmask]> >> To: "Andrew Hanushevsky" <[log in to unmask]> >> Cc: "Brew, CAJ (Chris)" <[log in to unmask]>; >> <[log in to unmask]>; "Bill Weeks" <[log in to unmask]> >> Sent: Monday, June 06, 2005 1:41 PM >> Subject: Re: PreStage Problems >> >> >>> Hi Andy >>> >>> I should have mentioned that we also remove the prestage queue and >>> restarted both the server and redirector. However the old request to >>> wait did not change. Moreover, any similar new requests were also told >>> to wait until the old request was terminated. >>> >>> cheers >>> >>> Manny >>> >>> Andrew Hanushevsky wrote: >>> >>>> Hi Manny, >>>> >>>> Yes, but who telling the client to wait? The redirector or the server >>>> that >>>> wanted to orginally stage the file in. When you restart the redirector >>>> it >>>> loses all it's memory but the data server does not. So, it will hapiily >>>> tell the redirector that it has the file eventhough the file is merely >>>> in >>>> the pre-stage queue. As long as the file is in the prestage queue and >>>> not on >>>> disk, the only option is to direct clients to where the file will be >>>> staged in and then the clients simply wait for the file (which in this >>>> case will never appear). So, if you remove staging you also need to >>>> remove >>>> the prestage queue and restart the data server. >>>> >>>> Andy >>>> >>>> On Fri, 3 Jun 2005, Emmanuel Olaiya wrote: >>>> >>>> >>>>> Hi Andy >>>>> >>>>> One other issue we have spotted at RAL. We removed the staging >>>>> capabilities and restarted the director and server. However we found >>>>> previous requests for a file that were told to wait continued being >>>>> told >>>>> to wait. We also found that if somebody else asked for this same file >>>>> that was not on disk they were also told to wait rather than being >>>>> told >>>>> the file could not be found. We needed to kill the previous request >>>>> and >>>>> restart the server and directory for xrootd to know the file was not >>>>> on >>>>> disk. >>>>> >>>>> cheers >>>>> >>>>> Manny >>>>> >>>>> Andrew Hanushevsky wrote: >>>>> >>>>>> Hi Chris, >>>>>> >>>>>> Oh yeah, different problem. I think that Bill Weeks fixed that. Bill >>>>>> did >>>>>> you fix that problem? >>>>>> >>>>>> Andy >>>>>> >>>>>> On Mon, 30 May 2005, Brew, CAJ (Chris) wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I might be being stupid but I don't see how this relates to the >>>>>>> problem. >>>>>>> The files I wanted were on a different disk server which then went >>>>>>> down. >>>>>>> The server in question was registered with the OLB as being able to >>>>>>> stage in the name space so the request was redirected to it. If >>>>>>> mps_Stage is used without the PreStage queuing system everything >>>>>>> works >>>>>>> as expected. If we try to go through the PreStage queue to limit the >>>>>>> number of concurrent accesses to the tapestore the stage in fails. >>>>>>> Apparently because the DIR_LOCK file does not exist (which it >>>>>>> doesn't, >>>>>>> since the file, and it's directory structure, has never existed on >>>>>>> this >>>>>>> server). >>>>>>> >>>>>>> Yours, >>>>>>> Chris. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Andrew Hanushevsky [mailto:[log in to unmask]] >>>>>>>> Sent: 28 May 2005 07:39 >>>>>>>> To: Brew, CAJ (Chris) >>>>>>>> Cc: [log in to unmask]; abh; Olaiya, EO (Emmanuel) >>>>>>>> Subject: RE: PreStage Problems >>>>>>>> >>>>>>>> Hi Chris, >>>>>>>> >>>>>>>> This was traced to overzealous testing. The syustem does not >>>>>>>> put in a new >>>>>>>> entry in the pre-stage queue until after about 10-20 minutes >>>>>>>> have elapsed >>>>>>>> since the last time the entry was added. So, this is not a >>>>>>>> bug but a test >>>>>>>> case that was not "real". Generally, files live in the disk >>>>>>>> cache for at >>>>>>>> least 10-20 minutes. >>>>>>>> >>>>>>>> Andy >>>>>>>> >>>>>>>> On Fri, 27 May 2005, Brew, CAJ (Chris) wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> At the meeting a couple of weeks ago, it was said that someone was >>>>>>>>> looking into this but I haven't heard anything back. Is >>>>>>>> >>>>>>>> >>>>>>>> there any new? >>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Chris. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Brew, CAJ (Chris) >>>>>>>>>> Sent: 17 May 2005 13:50 >>>>>>>>>> To: [log in to unmask]; abh >>>>>>>>>> Cc: Olaiya, EO (Emmanuel) >>>>>>>>>> Subject: PreStage Problems >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I've been running some more tests of the staging at RAL and >>>>>>>>>> have run into a problem somewhere in the >>>>>>>>>> mps_Stage/PreStage/prep system. >>>>>>>>>> >>>>>>>>>> Everything work fine staging file that was on the system and >>>>>>>>>> has been deleted but if I try to stage in a file that was one >>>>>>>>>> a different server, hence the directory structure for the >>>>>>>>>> file does not exist on the staging server it fails and I see >>>>>>>>>> the following error in the PreStage log file: >>>>>>>>>> >>>>>>>>>> 12:45:43 [ 10859] mps_Stage: Open >>>>>>>>>> '/stage/bdata-data50/kanga//store/SPskims/R12/16.0.2e/BtoKKKL/ >>>>>>>>>> 001005/200002/DIR_LOCK' r/w failed; No such file or directory. >>>>>>>>>> 12:45:43 [ 10859] do_stagein: xfr failed for >>>>>>>>>> /store/SPskims/R12/16.0.2e/BtoKKKL/001005/200002/BtoKKKL_00100 >>>>>>>>>> 5_3247.01.root, rc=4, retry=1 >>>>>>>>>> 12:45:45 [ 3255] >>>>>>>>>> file=/store/SPskims/R12/16.0.2e/BtoKKKL/001005/200002/BtoKKKL_ >>>>>>>>>> 0010053247.01.root, rc=1024, reqid=ef000001:1cd2.425d27e1 >>>>>>>>>> :3762 >>>>>>>>>> >>>>>>>>>> If I create the directories and the DIR_LOCK file before >>>>>>>>>> running the import, everything works. >>>>>>>>>> >>>>>>>>>> The config file I'm using on the server is below. >>>>>>>>>> >>>>>>>>>> Is there some setting I'm missing which is needed to create >>>>>>>>>> the directories/DIR_LOCK file or does the code need fixing? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Chris >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Chris Brew ([log in to unmask]) +44 1235 446326 >>>>>>>>>> Particle Physics Department >>>>>>>>>> Rutherford Appleton Laboratory >>>>>>>>>> Chilton, Didcot. Oxfordshire. >>>>>>>>>> OX11 0QX. United Kingdom. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>> >> >