Print

Print


Hi Andy

Yes, it would be good if you could have a look at this with me. We can 
arrange a time in the xrootd meeting tomorrow.

cheers

Manny

Andy Hanushevsky wrote:
> Hi Manny,
> 
> I find this is quite mysterious as this should never be the case and, 
> frankly, appears to violate causality. I suspect something else is going 
> on. If this is reproducible then why don't we run a test with all 
> debugging turned on. Yes?
> 
> Andy
> 
> ----- Original Message ----- From: "Emmanuel Olaiya" <[log in to unmask]>
> To: "Andrew Hanushevsky" <[log in to unmask]>
> Cc: "Brew, CAJ (Chris)" <[log in to unmask]>; 
> <[log in to unmask]>; "Bill Weeks" <[log in to unmask]>
> Sent: Monday, June 06, 2005 1:41 PM
> Subject: Re: PreStage Problems
> 
> 
>> Hi Andy
>>
>> I should have mentioned that we also remove the prestage queue and 
>> restarted both the server and redirector. However the old request to 
>> wait did not change. Moreover, any similar new requests were also told 
>> to wait until the old request was terminated.
>>
>> cheers
>>
>> Manny
>>
>> Andrew Hanushevsky wrote:
>>
>>> Hi Manny,
>>>
>>> Yes, but who telling the client to wait? The redirector or the server 
>>> that
>>> wanted to orginally stage the file in. When you restart the 
>>> redirector it
>>> loses all it's memory but the data server does not. So, it will hapiily
>>> tell the redirector that it has the file eventhough the file is 
>>> merely in
>>> the pre-stage queue. As long as the file is in the prestage queue and 
>>> not on
>>> disk, the only option is to direct clients to where the file will be
>>> staged in and then the clients simply wait for the file (which in this
>>> case will never appear). So, if you remove staging you also need to 
>>> remove
>>> the prestage queue and restart the data server.
>>>
>>> Andy
>>>
>>> On Fri, 3 Jun 2005, Emmanuel Olaiya wrote:
>>>
>>>
>>>> Hi Andy
>>>>
>>>> One other issue we have spotted at RAL. We removed the staging
>>>> capabilities and restarted the director and server. However we found
>>>> previous requests for a file that were told to wait continued being 
>>>> told
>>>> to wait. We also found that if somebody else asked for this same file
>>>> that was not on disk they were also told to wait rather than being told
>>>> the file could not be found. We needed to kill the previous request and
>>>> restart the server and directory for xrootd to know the file was not on
>>>> disk.
>>>>
>>>> cheers
>>>>
>>>> Manny
>>>>
>>>> Andrew Hanushevsky wrote:
>>>>
>>>>> Hi Chris,
>>>>>
>>>>> Oh yeah, different problem. I think that Bill Weeks fixed that. 
>>>>> Bill did
>>>>> you fix that problem?
>>>>>
>>>>> Andy
>>>>>
>>>>> On Mon, 30 May 2005, Brew, CAJ (Chris) wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I might be being stupid but I don't see how this relates to the 
>>>>>> problem.
>>>>>> The files I wanted were on a different disk server which then went 
>>>>>> down.
>>>>>> The server in question was registered with the OLB as being able to
>>>>>> stage in the name space so the request was redirected to it. If
>>>>>> mps_Stage is used without the PreStage queuing system everything 
>>>>>> works
>>>>>> as expected. If we try to go through the PreStage queue to limit the
>>>>>> number of concurrent accesses to the tapestore the stage in fails.
>>>>>> Apparently because the DIR_LOCK file does not exist (which it 
>>>>>> doesn't,
>>>>>> since the file, and it's directory structure, has never existed on 
>>>>>> this
>>>>>> server).
>>>>>>
>>>>>> Yours,
>>>>>> Chris.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Andrew Hanushevsky [mailto:[log in to unmask]]
>>>>>>> Sent: 28 May 2005 07:39
>>>>>>> To: Brew, CAJ (Chris)
>>>>>>> Cc: [log in to unmask]; abh; Olaiya, EO (Emmanuel)
>>>>>>> Subject: RE: PreStage Problems
>>>>>>>
>>>>>>> Hi Chris,
>>>>>>>
>>>>>>> This was traced to overzealous testing. The syustem does not
>>>>>>> put in a new
>>>>>>> entry in the pre-stage queue until after about 10-20 minutes
>>>>>>> have elapsed
>>>>>>> since the last time the entry was added. So, this is not a
>>>>>>> bug but a test
>>>>>>> case that was not "real". Generally, files live in the disk
>>>>>>> cache for at
>>>>>>> least 10-20 minutes.
>>>>>>>
>>>>>>> Andy
>>>>>>>
>>>>>>> On Fri, 27 May 2005, Brew, CAJ (Chris) wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> At the meeting a couple of weeks ago, it was said that someone was
>>>>>>>> looking into this but I haven't heard anything back. Is
>>>>>>>
>>>>>>>
>>>>>>> there any new?
>>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Chris.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Brew, CAJ (Chris)
>>>>>>>>> Sent: 17 May 2005 13:50
>>>>>>>>> To: [log in to unmask]; abh
>>>>>>>>> Cc: Olaiya, EO (Emmanuel)
>>>>>>>>> Subject: PreStage Problems
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I've been running some more tests of the staging at RAL and
>>>>>>>>> have run into a problem somewhere in the
>>>>>>>>> mps_Stage/PreStage/prep system.
>>>>>>>>>
>>>>>>>>> Everything work fine staging file that was on the system and
>>>>>>>>> has been deleted but if I try to stage in a file that was one
>>>>>>>>> a different server, hence the directory structure for the
>>>>>>>>> file does not exist on the staging server it fails and I see
>>>>>>>>> the following error in the PreStage log file:
>>>>>>>>>
>>>>>>>>> 12:45:43 [ 10859] mps_Stage: Open
>>>>>>>>> '/stage/bdata-data50/kanga//store/SPskims/R12/16.0.2e/BtoKKKL/
>>>>>>>>> 001005/200002/DIR_LOCK' r/w failed; No such file or directory.
>>>>>>>>> 12:45:43 [ 10859] do_stagein: xfr failed for
>>>>>>>>> /store/SPskims/R12/16.0.2e/BtoKKKL/001005/200002/BtoKKKL_00100
>>>>>>>>> 5_3247.01.root, rc=4, retry=1
>>>>>>>>> 12:45:45 [  3255]
>>>>>>>>> file=/store/SPskims/R12/16.0.2e/BtoKKKL/001005/200002/BtoKKKL_
>>>>>>>>> 0010053247.01.root, rc=1024, reqid=ef000001:1cd2.425d27e1
>>>>>>>>> :3762
>>>>>>>>>
>>>>>>>>> If I create the directories and the DIR_LOCK file before
>>>>>>>>> running the import, everything works.
>>>>>>>>>
>>>>>>>>> The config file I'm using on the server is below.
>>>>>>>>>
>>>>>>>>> Is there some setting I'm missing which is needed to create
>>>>>>>>> the directories/DIR_LOCK file or does the code need fixing?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> Chris Brew  ([log in to unmask])  +44 1235 446326
>>>>>>>>> Particle Physics Department
>>>>>>>>> Rutherford Appleton Laboratory
>>>>>>>>> Chilton, Didcot. Oxfordshire.
>>>>>>>>> OX11 0QX. United Kingdom.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>
>