Print

Print


This was exactly the case why I originally had introduced the force option ;-)

-- Fons


abh wrote:
> Hi Fabrizio,
> 
> Yes, I think you should not use the flag (of course you should provide 
> for an option to turn it on). I understand the problem Alvise had. His 
> client crashed and never closed the connection. So, the server thought 
> the file was still opened for write and refused to let him rewrite it. 
> It is also possible for this to happen if you close a connection and 
> immediately try to open a new one. I think having the open to force the 
> copy is probably sufficiemt.
> 
> Andy
> 
> ----- Original Message ----- From: "Fabrizio Furano" 
> <[log in to unmask]>
> To: "Wilko Kroeger" <[log in to unmask]>
> Cc: "Peter Elmer" <[log in to unmask]>; <[log in to unmask]>; 
> "Andrew Hanushevsky" <[log in to unmask]>
> Sent: Monday, March 07, 2005 2:20 AM
> Subject: Re: crashes in xrdcp
> 
> 
>> Hi,
>>
>>  about the kxr_force, I remember that I put it into the flags to 
>> override some situation in which a write retry could not succeed 
>> because the former server had not already understood that the previous 
>> connection was down. That was part of a bunch of little problems 
>> spotted by Alvise in his tests.
>>
>>  Andy, do you agree for me to cut that flag off?
>>
>> Fabrizio
>>
>> Wilko Kroeger wrote:
>>
>>> Hello Pete
>>>
>>> Ok, thanks. I will try out the head.
>>> If possible we should also fix the problem that the kXR_force
>>> is used. It seems to me quite dangerous that two clients
>>> can write to the same file or one could over write piece
>>> of an existing file.
>>>
>>> Cheers,
>>>   Wilko
>>>
>>>
>>> On Thu, 3 Mar 2005, Peter Elmer wrote:
>>>
>>>
>>>>  Hi Wilko,
>>>>
>>>>  Just for the record, Fabrizio just wrote (as part of a CVS commit):
>>>>
>>>> On Thu, Mar 03, 2005 at 07:33:42PM +0000, Fabrizio Furano wrote:
>>>>
>>>>> Hi again,
>>>>
>>>>
>>>> <...>
>>>>
>>>>> With this one I am no longer able to make xrdcp crash under heavy load
>>>>> in the client/server machine. I am still investigating on the 
>>>>> occasional
>>>>> cpu eating, but it seems that that's more difficult, since in my 
>>>>> tests,
>>>>> the problem disappears when enabling the client side log, and for some
>>>>> strange reason I am not able to spot it by attaching gdb to the 
>>>>> process.
>>>>>
>>>>> Fabrizio
>>>>
>>>>
>>>>                                   Pete
>>>>
>>>>
>>>> On Mon, Feb 28, 2005 at 12:26:48AM -0800, Wilko Kroeger wrote:
>>>>
>>>>> Hello Fabrizio
>>>>>
>>>>> I run the xrdcp test again and I can reproduce crashes in xrdcp
>>>>> (some times it take 30-60 mins).
>>>>> I used the xrootd version 20050226-0825 and xrdcp is running on a 
>>>>> RHEL3
>>>>> machine. I read the same file over and over:
>>>>>  xrdcp -DIDebugLevel 2 
>>>>> root://${xrdhost}:2094///prod/test/small.test -  > /dev/null
>>>>>
>>>>> The size of the small.test file is:
>>>>>
>>>>>> ls -l small.test
>>>>>
>>>>>
>>>>> rw-r--r--   1 wilko  ec  31457280 Feb 27 18:09 
>>>>> /u1/wilko/kanga/prod/test/small.test
>>>>> which is 30 MB (30*1024*1024)
>>>>>
>>>>> I used debugLevel 1 and 2.
>>>>>
>>>>> You can find the core file and the debug output files in:
>>>>> ~wilko/bbdev/work/xrootd/core/20050227_2233_d1/
>>>>> ~wilko/bbdev/work/xrootd/core/20050227_2302_d1/
>>>>> ~wilko/bbdev/work/xrootd/core/20050227_2314_d2/
>>>>> ~wilko/bbdev/work/xrootd/core/20050227_2350_d2/
>>>>>
>>>>> each directory contains a core file and the debug output file
>>>>> (wk_log...). The ending d1 or d2 means debuglevel 1 or 2.
>>>>>
>>>>> With debug option = 1, gdb shows:
>>>>> #0  0x0018b17c in memcpy () from /lib/tls/libc.so.6
>>>>> #1  0x0806edbc in XrdClientReadCacheItem::GetPartialInterval(void 
>>>>> const*,
>>>>>    long long, long long) (this=0x9f107d0, buffer=0xb5750d08,
>>>>>    begin_offs=31457280, end_offs=31714559) at XrdClientReadCache.hh:93
>>>>>
>>>>> whereas with debugLevel=2, gdb shows:
>>>>>
>>>>> #0  0x00a4e027 in _int_free () from /lib/tls/libc.so.6
>>>>> #1  0x00a4d018 in free () from /lib/tls/libc.so.6
>>>>> #2  0x0806d984 in ~XrdClientReadCacheItem (this=0x96b3db8) at
>>>>>    XrdClientReadCache.cc:40
>>>>>
>>>>>
>>>>> On the xrootd site I see the error:
>>>>> 050227 23:54:39 064 XrdLink: Unable to receive from 
>>>>> wilko.30110:17@tori0001;
>>>>>       connection reset by peer
>>>>> 050227 23:54:39 064 XrootdXeq: wilko.30110:17@tori0001 disc 1:02:03 
>>>>> (link
>>>>>       read error)
>>>>>
>>>>> (the corresponding client crash was around 23:50)
>>>>>
>>>>>
>>>>> Thanks for looking into this,
>>>>>
>>>>> Wilko
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------- 
>>>>
>>>> Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 
>>>> 767-4644
>>>> Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, 
>>>> Switzerland
>>>> ------------------------------------------------------------------------- 
>>>>
>>>>
>>

-- 
Org:    CERN, European Laboratory for Particle Physics.
Mail:   1211 Geneve 23, Switzerland
E-Mail: [log in to unmask]              Phone: +41 22 7679248
WWW:    http://www.rademakers.org/fons/      Fax:   +41 22 7679480