Print

Print


Hi,

  about the kxr_force, I remember that I put it into the flags to 
override some situation in which a write retry could not succeed because 
the former server had not already understood that the previous 
connection was down. That was part of a bunch of little problems spotted 
by Alvise in his tests.

  Andy, do you agree for me to cut that flag off?

Fabrizio

Wilko Kroeger wrote:
> Hello Pete
> 
> Ok, thanks. I will try out the head.
> If possible we should also fix the problem that the kXR_force
> is used. It seems to me quite dangerous that two clients
> can write to the same file or one could over write piece
> of an existing file.
> 
> Cheers,
>   Wilko
> 
> 
> On Thu, 3 Mar 2005, Peter Elmer wrote:
> 
> 
>>  Hi Wilko,
>>
>>  Just for the record, Fabrizio just wrote (as part of a CVS commit):
>>
>>On Thu, Mar 03, 2005 at 07:33:42PM +0000, Fabrizio Furano wrote:
>>
>>>Hi again,
>>
>><...>
>>
>>> With this one I am no longer able to make xrdcp crash under heavy load
>>>in the client/server machine. I am still investigating on the occasional
>>>cpu eating, but it seems that that's more difficult, since in my tests,
>>>the problem disappears when enabling the client side log, and for some
>>>strange reason I am not able to spot it by attaching gdb to the process.
>>>
>>>Fabrizio
>>
>>                                   Pete
>>
>>
>>On Mon, Feb 28, 2005 at 12:26:48AM -0800, Wilko Kroeger wrote:
>>
>>>Hello Fabrizio
>>>
>>>I run the xrdcp test again and I can reproduce crashes in xrdcp
>>>(some times it take 30-60 mins).
>>>I used the xrootd version 20050226-0825 and xrdcp is running on a RHEL3
>>>machine. I read the same file over and over:
>>>  xrdcp -DIDebugLevel 2 root://${xrdhost}:2094///prod/test/small.test - > /dev/null
>>>
>>>The size of the small.test file is:
>>>
>>>>ls -l small.test
>>>
>>>rw-r--r--   1 wilko  ec  31457280 Feb 27 18:09 /u1/wilko/kanga/prod/test/small.test
>>>which is 30 MB (30*1024*1024)
>>>
>>>I used debugLevel 1 and 2.
>>>
>>>You can find the core file and the debug output files in:
>>>~wilko/bbdev/work/xrootd/core/20050227_2233_d1/
>>>~wilko/bbdev/work/xrootd/core/20050227_2302_d1/
>>>~wilko/bbdev/work/xrootd/core/20050227_2314_d2/
>>>~wilko/bbdev/work/xrootd/core/20050227_2350_d2/
>>>
>>>each directory contains a core file and the debug output file
>>>(wk_log...). The ending d1 or d2 means debuglevel 1 or 2.
>>>
>>>With debug option = 1, gdb shows:
>>>#0  0x0018b17c in memcpy () from /lib/tls/libc.so.6
>>>#1  0x0806edbc in XrdClientReadCacheItem::GetPartialInterval(void const*,
>>>    long long, long long) (this=0x9f107d0, buffer=0xb5750d08,
>>>    begin_offs=31457280, end_offs=31714559) at XrdClientReadCache.hh:93
>>>
>>>whereas with debugLevel=2, gdb shows:
>>>
>>>#0  0x00a4e027 in _int_free () from /lib/tls/libc.so.6
>>>#1  0x00a4d018 in free () from /lib/tls/libc.so.6
>>>#2  0x0806d984 in ~XrdClientReadCacheItem (this=0x96b3db8) at
>>>    XrdClientReadCache.cc:40
>>>
>>>
>>>On the xrootd site I see the error:
>>>050227 23:54:39 064 XrdLink: Unable to receive from wilko.30110:17@tori0001;
>>>       connection reset by peer
>>>050227 23:54:39 064 XrootdXeq: wilko.30110:17@tori0001 disc 1:02:03 (link
>>>       read error)
>>>
>>>(the corresponding client crash was around 23:50)
>>>
>>>
>>>Thanks for looking into this,
>>>
>>>Wilko
>>>
>>
>>
>>
>>-------------------------------------------------------------------------
>>Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 767-4644
>>Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
>>-------------------------------------------------------------------------
>>