This was exactly the case why I originally had introduced the force option ;-) -- Fons abh wrote: > Hi Fabrizio, > > Yes, I think you should not use the flag (of course you should provide > for an option to turn it on). I understand the problem Alvise had. His > client crashed and never closed the connection. So, the server thought > the file was still opened for write and refused to let him rewrite it. > It is also possible for this to happen if you close a connection and > immediately try to open a new one. I think having the open to force the > copy is probably sufficiemt. > > Andy > > ----- Original Message ----- From: "Fabrizio Furano" > <[log in to unmask]> > To: "Wilko Kroeger" <[log in to unmask]> > Cc: "Peter Elmer" <[log in to unmask]>; <[log in to unmask]>; > "Andrew Hanushevsky" <[log in to unmask]> > Sent: Monday, March 07, 2005 2:20 AM > Subject: Re: crashes in xrdcp > > >> Hi, >> >> about the kxr_force, I remember that I put it into the flags to >> override some situation in which a write retry could not succeed >> because the former server had not already understood that the previous >> connection was down. That was part of a bunch of little problems >> spotted by Alvise in his tests. >> >> Andy, do you agree for me to cut that flag off? >> >> Fabrizio >> >> Wilko Kroeger wrote: >> >>> Hello Pete >>> >>> Ok, thanks. I will try out the head. >>> If possible we should also fix the problem that the kXR_force >>> is used. It seems to me quite dangerous that two clients >>> can write to the same file or one could over write piece >>> of an existing file. >>> >>> Cheers, >>> Wilko >>> >>> >>> On Thu, 3 Mar 2005, Peter Elmer wrote: >>> >>> >>>> Hi Wilko, >>>> >>>> Just for the record, Fabrizio just wrote (as part of a CVS commit): >>>> >>>> On Thu, Mar 03, 2005 at 07:33:42PM +0000, Fabrizio Furano wrote: >>>> >>>>> Hi again, >>>> >>>> >>>> <...> >>>> >>>>> With this one I am no longer able to make xrdcp crash under heavy load >>>>> in the client/server machine. I am still investigating on the >>>>> occasional >>>>> cpu eating, but it seems that that's more difficult, since in my >>>>> tests, >>>>> the problem disappears when enabling the client side log, and for some >>>>> strange reason I am not able to spot it by attaching gdb to the >>>>> process. >>>>> >>>>> Fabrizio >>>> >>>> >>>> Pete >>>> >>>> >>>> On Mon, Feb 28, 2005 at 12:26:48AM -0800, Wilko Kroeger wrote: >>>> >>>>> Hello Fabrizio >>>>> >>>>> I run the xrdcp test again and I can reproduce crashes in xrdcp >>>>> (some times it take 30-60 mins). >>>>> I used the xrootd version 20050226-0825 and xrdcp is running on a >>>>> RHEL3 >>>>> machine. I read the same file over and over: >>>>> xrdcp -DIDebugLevel 2 >>>>> root://${xrdhost}:2094///prod/test/small.test - > /dev/null >>>>> >>>>> The size of the small.test file is: >>>>> >>>>>> ls -l small.test >>>>> >>>>> >>>>> rw-r--r-- 1 wilko ec 31457280 Feb 27 18:09 >>>>> /u1/wilko/kanga/prod/test/small.test >>>>> which is 30 MB (30*1024*1024) >>>>> >>>>> I used debugLevel 1 and 2. >>>>> >>>>> You can find the core file and the debug output files in: >>>>> ~wilko/bbdev/work/xrootd/core/20050227_2233_d1/ >>>>> ~wilko/bbdev/work/xrootd/core/20050227_2302_d1/ >>>>> ~wilko/bbdev/work/xrootd/core/20050227_2314_d2/ >>>>> ~wilko/bbdev/work/xrootd/core/20050227_2350_d2/ >>>>> >>>>> each directory contains a core file and the debug output file >>>>> (wk_log...). The ending d1 or d2 means debuglevel 1 or 2. >>>>> >>>>> With debug option = 1, gdb shows: >>>>> #0 0x0018b17c in memcpy () from /lib/tls/libc.so.6 >>>>> #1 0x0806edbc in XrdClientReadCacheItem::GetPartialInterval(void >>>>> const*, >>>>> long long, long long) (this=0x9f107d0, buffer=0xb5750d08, >>>>> begin_offs=31457280, end_offs=31714559) at XrdClientReadCache.hh:93 >>>>> >>>>> whereas with debugLevel=2, gdb shows: >>>>> >>>>> #0 0x00a4e027 in _int_free () from /lib/tls/libc.so.6 >>>>> #1 0x00a4d018 in free () from /lib/tls/libc.so.6 >>>>> #2 0x0806d984 in ~XrdClientReadCacheItem (this=0x96b3db8) at >>>>> XrdClientReadCache.cc:40 >>>>> >>>>> >>>>> On the xrootd site I see the error: >>>>> 050227 23:54:39 064 XrdLink: Unable to receive from >>>>> wilko.30110:17@tori0001; >>>>> connection reset by peer >>>>> 050227 23:54:39 064 XrootdXeq: wilko.30110:17@tori0001 disc 1:02:03 >>>>> (link >>>>> read error) >>>>> >>>>> (the corresponding client crash was around 23:50) >>>>> >>>>> >>>>> Thanks for looking into this, >>>>> >>>>> Wilko >>>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------- >>>> >>>> Peter Elmer E-mail: [log in to unmask] Phone: +41 (22) >>>> 767-4644 >>>> Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, >>>> Switzerland >>>> ------------------------------------------------------------------------- >>>> >>>> >> -- Org: CERN, European Laboratory for Particle Physics. Mail: 1211 Geneve 23, Switzerland E-Mail: [log in to unmask] Phone: +41 22 7679248 WWW: http://www.rademakers.org/fons/ Fax: +41 22 7679480