Hi, about the kxr_force, I remember that I put it into the flags to override some situation in which a write retry could not succeed because the former server had not already understood that the previous connection was down. That was part of a bunch of little problems spotted by Alvise in his tests. Andy, do you agree for me to cut that flag off? Fabrizio Wilko Kroeger wrote: > Hello Pete > > Ok, thanks. I will try out the head. > If possible we should also fix the problem that the kXR_force > is used. It seems to me quite dangerous that two clients > can write to the same file or one could over write piece > of an existing file. > > Cheers, > Wilko > > > On Thu, 3 Mar 2005, Peter Elmer wrote: > > >> Hi Wilko, >> >> Just for the record, Fabrizio just wrote (as part of a CVS commit): >> >>On Thu, Mar 03, 2005 at 07:33:42PM +0000, Fabrizio Furano wrote: >> >>>Hi again, >> >><...> >> >>> With this one I am no longer able to make xrdcp crash under heavy load >>>in the client/server machine. I am still investigating on the occasional >>>cpu eating, but it seems that that's more difficult, since in my tests, >>>the problem disappears when enabling the client side log, and for some >>>strange reason I am not able to spot it by attaching gdb to the process. >>> >>>Fabrizio >> >> Pete >> >> >>On Mon, Feb 28, 2005 at 12:26:48AM -0800, Wilko Kroeger wrote: >> >>>Hello Fabrizio >>> >>>I run the xrdcp test again and I can reproduce crashes in xrdcp >>>(some times it take 30-60 mins). >>>I used the xrootd version 20050226-0825 and xrdcp is running on a RHEL3 >>>machine. I read the same file over and over: >>> xrdcp -DIDebugLevel 2 root://${xrdhost}:2094///prod/test/small.test - > /dev/null >>> >>>The size of the small.test file is: >>> >>>>ls -l small.test >>> >>>rw-r--r-- 1 wilko ec 31457280 Feb 27 18:09 /u1/wilko/kanga/prod/test/small.test >>>which is 30 MB (30*1024*1024) >>> >>>I used debugLevel 1 and 2. >>> >>>You can find the core file and the debug output files in: >>>~wilko/bbdev/work/xrootd/core/20050227_2233_d1/ >>>~wilko/bbdev/work/xrootd/core/20050227_2302_d1/ >>>~wilko/bbdev/work/xrootd/core/20050227_2314_d2/ >>>~wilko/bbdev/work/xrootd/core/20050227_2350_d2/ >>> >>>each directory contains a core file and the debug output file >>>(wk_log...). The ending d1 or d2 means debuglevel 1 or 2. >>> >>>With debug option = 1, gdb shows: >>>#0 0x0018b17c in memcpy () from /lib/tls/libc.so.6 >>>#1 0x0806edbc in XrdClientReadCacheItem::GetPartialInterval(void const*, >>> long long, long long) (this=0x9f107d0, buffer=0xb5750d08, >>> begin_offs=31457280, end_offs=31714559) at XrdClientReadCache.hh:93 >>> >>>whereas with debugLevel=2, gdb shows: >>> >>>#0 0x00a4e027 in _int_free () from /lib/tls/libc.so.6 >>>#1 0x00a4d018 in free () from /lib/tls/libc.so.6 >>>#2 0x0806d984 in ~XrdClientReadCacheItem (this=0x96b3db8) at >>> XrdClientReadCache.cc:40 >>> >>> >>>On the xrootd site I see the error: >>>050227 23:54:39 064 XrdLink: Unable to receive from wilko.30110:17@tori0001; >>> connection reset by peer >>>050227 23:54:39 064 XrootdXeq: wilko.30110:17@tori0001 disc 1:02:03 (link >>> read error) >>> >>>(the corresponding client crash was around 23:50) >>> >>> >>>Thanks for looking into this, >>> >>>Wilko >>> >> >> >> >>------------------------------------------------------------------------- >>Peter Elmer E-mail: [log in to unmask] Phone: +41 (22) 767-4644 >>Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland >>------------------------------------------------------------------------- >>