Print

Print


Hi Manny,

No problem other than the rest of the week is tied up -- Seattle Tue/Wed, 
Oakland/Thu, home Fri. Thigh looking at log files is always possible here.

Andy

----- Original Message ----- 
From: "Emmanuel Olaiya" <[log in to unmask]>
To: "Andy Hanushevsky" <[log in to unmask]>
Cc: "Brew, CAJ (Chris)" <[log in to unmask]>; <[log in to unmask]>; 
"Bill Weeks" <[log in to unmask]>
Sent: Monday, June 06, 2005 2:57 PM
Subject: Re: PreStage Problems


> Hi Andy
>
> Yes, it would be good if you could have a look at this with me. We can 
> arrange a time in the xrootd meeting tomorrow.
>
> cheers
>
> Manny
>
> Andy Hanushevsky wrote:
>> Hi Manny,
>>
>> I find this is quite mysterious as this should never be the case and, 
>> frankly, appears to violate causality. I suspect something else is going 
>> on. If this is reproducible then why don't we run a test with all 
>> debugging turned on. Yes?
>>
>> Andy
>>
>> ----- Original Message ----- From: "Emmanuel Olaiya" <[log in to unmask]>
>> To: "Andrew Hanushevsky" <[log in to unmask]>
>> Cc: "Brew, CAJ (Chris)" <[log in to unmask]>; 
>> <[log in to unmask]>; "Bill Weeks" <[log in to unmask]>
>> Sent: Monday, June 06, 2005 1:41 PM
>> Subject: Re: PreStage Problems
>>
>>
>>> Hi Andy
>>>
>>> I should have mentioned that we also remove the prestage queue and 
>>> restarted both the server and redirector. However the old request to 
>>> wait did not change. Moreover, any similar new requests were also told 
>>> to wait until the old request was terminated.
>>>
>>> cheers
>>>
>>> Manny
>>>
>>> Andrew Hanushevsky wrote:
>>>
>>>> Hi Manny,
>>>>
>>>> Yes, but who telling the client to wait? The redirector or the server 
>>>> that
>>>> wanted to orginally stage the file in. When you restart the redirector 
>>>> it
>>>> loses all it's memory but the data server does not. So, it will hapiily
>>>> tell the redirector that it has the file eventhough the file is merely 
>>>> in
>>>> the pre-stage queue. As long as the file is in the prestage queue and 
>>>> not on
>>>> disk, the only option is to direct clients to where the file will be
>>>> staged in and then the clients simply wait for the file (which in this
>>>> case will never appear). So, if you remove staging you also need to 
>>>> remove
>>>> the prestage queue and restart the data server.
>>>>
>>>> Andy
>>>>
>>>> On Fri, 3 Jun 2005, Emmanuel Olaiya wrote:
>>>>
>>>>
>>>>> Hi Andy
>>>>>
>>>>> One other issue we have spotted at RAL. We removed the staging
>>>>> capabilities and restarted the director and server. However we found
>>>>> previous requests for a file that were told to wait continued being 
>>>>> told
>>>>> to wait. We also found that if somebody else asked for this same file
>>>>> that was not on disk they were also told to wait rather than being 
>>>>> told
>>>>> the file could not be found. We needed to kill the previous request 
>>>>> and
>>>>> restart the server and directory for xrootd to know the file was not 
>>>>> on
>>>>> disk.
>>>>>
>>>>> cheers
>>>>>
>>>>> Manny
>>>>>
>>>>> Andrew Hanushevsky wrote:
>>>>>
>>>>>> Hi Chris,
>>>>>>
>>>>>> Oh yeah, different problem. I think that Bill Weeks fixed that. Bill 
>>>>>> did
>>>>>> you fix that problem?
>>>>>>
>>>>>> Andy
>>>>>>
>>>>>> On Mon, 30 May 2005, Brew, CAJ (Chris) wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I might be being stupid but I don't see how this relates to the 
>>>>>>> problem.
>>>>>>> The files I wanted were on a different disk server which then went 
>>>>>>> down.
>>>>>>> The server in question was registered with the OLB as being able to
>>>>>>> stage in the name space so the request was redirected to it. If
>>>>>>> mps_Stage is used without the PreStage queuing system everything 
>>>>>>> works
>>>>>>> as expected. If we try to go through the PreStage queue to limit the
>>>>>>> number of concurrent accesses to the tapestore the stage in fails.
>>>>>>> Apparently because the DIR_LOCK file does not exist (which it 
>>>>>>> doesn't,
>>>>>>> since the file, and it's directory structure, has never existed on 
>>>>>>> this
>>>>>>> server).
>>>>>>>
>>>>>>> Yours,
>>>>>>> Chris.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Andrew Hanushevsky [mailto:[log in to unmask]]
>>>>>>>> Sent: 28 May 2005 07:39
>>>>>>>> To: Brew, CAJ (Chris)
>>>>>>>> Cc: [log in to unmask]; abh; Olaiya, EO (Emmanuel)
>>>>>>>> Subject: RE: PreStage Problems
>>>>>>>>
>>>>>>>> Hi Chris,
>>>>>>>>
>>>>>>>> This was traced to overzealous testing. The syustem does not
>>>>>>>> put in a new
>>>>>>>> entry in the pre-stage queue until after about 10-20 minutes
>>>>>>>> have elapsed
>>>>>>>> since the last time the entry was added. So, this is not a
>>>>>>>> bug but a test
>>>>>>>> case that was not "real". Generally, files live in the disk
>>>>>>>> cache for at
>>>>>>>> least 10-20 minutes.
>>>>>>>>
>>>>>>>> Andy
>>>>>>>>
>>>>>>>> On Fri, 27 May 2005, Brew, CAJ (Chris) wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> At the meeting a couple of weeks ago, it was said that someone was
>>>>>>>>> looking into this but I haven't heard anything back. Is
>>>>>>>>
>>>>>>>>
>>>>>>>> there any new?
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Chris.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Brew, CAJ (Chris)
>>>>>>>>>> Sent: 17 May 2005 13:50
>>>>>>>>>> To: [log in to unmask]; abh
>>>>>>>>>> Cc: Olaiya, EO (Emmanuel)
>>>>>>>>>> Subject: PreStage Problems
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I've been running some more tests of the staging at RAL and
>>>>>>>>>> have run into a problem somewhere in the
>>>>>>>>>> mps_Stage/PreStage/prep system.
>>>>>>>>>>
>>>>>>>>>> Everything work fine staging file that was on the system and
>>>>>>>>>> has been deleted but if I try to stage in a file that was one
>>>>>>>>>> a different server, hence the directory structure for the
>>>>>>>>>> file does not exist on the staging server it fails and I see
>>>>>>>>>> the following error in the PreStage log file:
>>>>>>>>>>
>>>>>>>>>> 12:45:43 [ 10859] mps_Stage: Open
>>>>>>>>>> '/stage/bdata-data50/kanga//store/SPskims/R12/16.0.2e/BtoKKKL/
>>>>>>>>>> 001005/200002/DIR_LOCK' r/w failed; No such file or directory.
>>>>>>>>>> 12:45:43 [ 10859] do_stagein: xfr failed for
>>>>>>>>>> /store/SPskims/R12/16.0.2e/BtoKKKL/001005/200002/BtoKKKL_00100
>>>>>>>>>> 5_3247.01.root, rc=4, retry=1
>>>>>>>>>> 12:45:45 [  3255]
>>>>>>>>>> file=/store/SPskims/R12/16.0.2e/BtoKKKL/001005/200002/BtoKKKL_
>>>>>>>>>> 0010053247.01.root, rc=1024, reqid=ef000001:1cd2.425d27e1
>>>>>>>>>> :3762
>>>>>>>>>>
>>>>>>>>>> If I create the directories and the DIR_LOCK file before
>>>>>>>>>> running the import, everything works.
>>>>>>>>>>
>>>>>>>>>> The config file I'm using on the server is below.
>>>>>>>>>>
>>>>>>>>>> Is there some setting I'm missing which is needed to create
>>>>>>>>>> the directories/DIR_LOCK file or does the code need fixing?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Chris
>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>> Chris Brew  ([log in to unmask])  +44 1235 446326
>>>>>>>>>> Particle Physics Department
>>>>>>>>>> Rutherford Appleton Laboratory
>>>>>>>>>> Chilton, Didcot. Oxfordshire.
>>>>>>>>>> OX11 0QX. United Kingdom.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>
>>
>