Print

Print


Hi Andy

Ok, we can try and provide log files.

Will you be at the meeting tomorrow?

cheers

Manny

Andy Hanushevsky wrote:
> Hi Manny,
> 
> No problem other than the rest of the week is tied up -- Seattle 
> Tue/Wed, Oakland/Thu, home Fri. Thigh looking at log files is always 
> possible here.
> 
> Andy
> 
> ----- Original Message ----- From: "Emmanuel Olaiya" <[log in to unmask]>
> To: "Andy Hanushevsky" <[log in to unmask]>
> Cc: "Brew, CAJ (Chris)" <[log in to unmask]>; 
> <[log in to unmask]>; "Bill Weeks" <[log in to unmask]>
> Sent: Monday, June 06, 2005 2:57 PM
> Subject: Re: PreStage Problems
> 
> 
>> Hi Andy
>>
>> Yes, it would be good if you could have a look at this with me. We can 
>> arrange a time in the xrootd meeting tomorrow.
>>
>> cheers
>>
>> Manny
>>
>> Andy Hanushevsky wrote:
>>
>>> Hi Manny,
>>>
>>> I find this is quite mysterious as this should never be the case and, 
>>> frankly, appears to violate causality. I suspect something else is 
>>> going on. If this is reproducible then why don't we run a test with 
>>> all debugging turned on. Yes?
>>>
>>> Andy
>>>
>>> ----- Original Message ----- From: "Emmanuel Olaiya" <[log in to unmask]>
>>> To: "Andrew Hanushevsky" <[log in to unmask]>
>>> Cc: "Brew, CAJ (Chris)" <[log in to unmask]>; 
>>> <[log in to unmask]>; "Bill Weeks" <[log in to unmask]>
>>> Sent: Monday, June 06, 2005 1:41 PM
>>> Subject: Re: PreStage Problems
>>>
>>>
>>>> Hi Andy
>>>>
>>>> I should have mentioned that we also remove the prestage queue and 
>>>> restarted both the server and redirector. However the old request to 
>>>> wait did not change. Moreover, any similar new requests were also 
>>>> told to wait until the old request was terminated.
>>>>
>>>> cheers
>>>>
>>>> Manny
>>>>
>>>> Andrew Hanushevsky wrote:
>>>>
>>>>> Hi Manny,
>>>>>
>>>>> Yes, but who telling the client to wait? The redirector or the 
>>>>> server that
>>>>> wanted to orginally stage the file in. When you restart the 
>>>>> redirector it
>>>>> loses all it's memory but the data server does not. So, it will 
>>>>> hapiily
>>>>> tell the redirector that it has the file eventhough the file is 
>>>>> merely in
>>>>> the pre-stage queue. As long as the file is in the prestage queue 
>>>>> and not on
>>>>> disk, the only option is to direct clients to where the file will be
>>>>> staged in and then the clients simply wait for the file (which in this
>>>>> case will never appear). So, if you remove staging you also need to 
>>>>> remove
>>>>> the prestage queue and restart the data server.
>>>>>
>>>>> Andy
>>>>>
>>>>> On Fri, 3 Jun 2005, Emmanuel Olaiya wrote:
>>>>>
>>>>>
>>>>>> Hi Andy
>>>>>>
>>>>>> One other issue we have spotted at RAL. We removed the staging
>>>>>> capabilities and restarted the director and server. However we found
>>>>>> previous requests for a file that were told to wait continued 
>>>>>> being told
>>>>>> to wait. We also found that if somebody else asked for this same file
>>>>>> that was not on disk they were also told to wait rather than being 
>>>>>> told
>>>>>> the file could not be found. We needed to kill the previous 
>>>>>> request and
>>>>>> restart the server and directory for xrootd to know the file was 
>>>>>> not on
>>>>>> disk.
>>>>>>
>>>>>> cheers
>>>>>>
>>>>>> Manny
>>>>>>
>>>>>> Andrew Hanushevsky wrote:
>>>>>>
>>>>>>> Hi Chris,
>>>>>>>
>>>>>>> Oh yeah, different problem. I think that Bill Weeks fixed that. 
>>>>>>> Bill did
>>>>>>> you fix that problem?
>>>>>>>
>>>>>>> Andy
>>>>>>>
>>>>>>> On Mon, 30 May 2005, Brew, CAJ (Chris) wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I might be being stupid but I don't see how this relates to the 
>>>>>>>> problem.
>>>>>>>> The files I wanted were on a different disk server which then 
>>>>>>>> went down.
>>>>>>>> The server in question was registered with the OLB as being able to
>>>>>>>> stage in the name space so the request was redirected to it. If
>>>>>>>> mps_Stage is used without the PreStage queuing system everything 
>>>>>>>> works
>>>>>>>> as expected. If we try to go through the PreStage queue to limit 
>>>>>>>> the
>>>>>>>> number of concurrent accesses to the tapestore the stage in fails.
>>>>>>>> Apparently because the DIR_LOCK file does not exist (which it 
>>>>>>>> doesn't,
>>>>>>>> since the file, and it's directory structure, has never existed 
>>>>>>>> on this
>>>>>>>> server).
>>>>>>>>
>>>>>>>> Yours,
>>>>>>>> Chris.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Andrew Hanushevsky [mailto:[log in to unmask]]
>>>>>>>>> Sent: 28 May 2005 07:39
>>>>>>>>> To: Brew, CAJ (Chris)
>>>>>>>>> Cc: [log in to unmask]; abh; Olaiya, EO (Emmanuel)
>>>>>>>>> Subject: RE: PreStage Problems
>>>>>>>>>
>>>>>>>>> Hi Chris,
>>>>>>>>>
>>>>>>>>> This was traced to overzealous testing. The syustem does not
>>>>>>>>> put in a new
>>>>>>>>> entry in the pre-stage queue until after about 10-20 minutes
>>>>>>>>> have elapsed
>>>>>>>>> since the last time the entry was added. So, this is not a
>>>>>>>>> bug but a test
>>>>>>>>> case that was not "real". Generally, files live in the disk
>>>>>>>>> cache for at
>>>>>>>>> least 10-20 minutes.
>>>>>>>>>
>>>>>>>>> Andy
>>>>>>>>>
>>>>>>>>> On Fri, 27 May 2005, Brew, CAJ (Chris) wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> At the meeting a couple of weeks ago, it was said that someone 
>>>>>>>>>> was
>>>>>>>>>> looking into this but I haven't heard anything back. Is
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> there any new?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Chris.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Brew, CAJ (Chris)
>>>>>>>>>>> Sent: 17 May 2005 13:50
>>>>>>>>>>> To: [log in to unmask]; abh
>>>>>>>>>>> Cc: Olaiya, EO (Emmanuel)
>>>>>>>>>>> Subject: PreStage Problems
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I've been running some more tests of the staging at RAL and
>>>>>>>>>>> have run into a problem somewhere in the
>>>>>>>>>>> mps_Stage/PreStage/prep system.
>>>>>>>>>>>
>>>>>>>>>>> Everything work fine staging file that was on the system and
>>>>>>>>>>> has been deleted but if I try to stage in a file that was one
>>>>>>>>>>> a different server, hence the directory structure for the
>>>>>>>>>>> file does not exist on the staging server it fails and I see
>>>>>>>>>>> the following error in the PreStage log file:
>>>>>>>>>>>
>>>>>>>>>>> 12:45:43 [ 10859] mps_Stage: Open
>>>>>>>>>>> '/stage/bdata-data50/kanga//store/SPskims/R12/16.0.2e/BtoKKKL/
>>>>>>>>>>> 001005/200002/DIR_LOCK' r/w failed; No such file or directory.
>>>>>>>>>>> 12:45:43 [ 10859] do_stagein: xfr failed for
>>>>>>>>>>> /store/SPskims/R12/16.0.2e/BtoKKKL/001005/200002/BtoKKKL_00100
>>>>>>>>>>> 5_3247.01.root, rc=4, retry=1
>>>>>>>>>>> 12:45:45 [  3255]
>>>>>>>>>>> file=/store/SPskims/R12/16.0.2e/BtoKKKL/001005/200002/BtoKKKL_
>>>>>>>>>>> 0010053247.01.root, rc=1024, reqid=ef000001:1cd2.425d27e1
>>>>>>>>>>> :3762
>>>>>>>>>>>
>>>>>>>>>>> If I create the directories and the DIR_LOCK file before
>>>>>>>>>>> running the import, everything works.
>>>>>>>>>>>
>>>>>>>>>>> The config file I'm using on the server is below.
>>>>>>>>>>>
>>>>>>>>>>> Is there some setting I'm missing which is needed to create
>>>>>>>>>>> the directories/DIR_LOCK file or does the code need fixing?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Chris
>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>>> Chris Brew  ([log in to unmask])  +44 1235 446326
>>>>>>>>>>> Particle Physics Department
>>>>>>>>>>> Rutherford Appleton Laboratory
>>>>>>>>>>> Chilton, Didcot. Oxfordshire.
>>>>>>>>>>> OX11 0QX. United Kingdom.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>
>>
>