Hi Andy, the old rootd never allowed two or more clients to open the same file for writing. We have the following rules: A file can only be opened for writing if the file is not already open for reading or writing by any other process. A file can only be opened for reading if the file is not already open for writing (where writing of course includes "update" mode in ROOT parlance). To be a full drop in replacement xrootd must be able to support the file update mode. I think this could be easily achieved if we allow r/w pools where the olbd never will trigger a replication. In such a pool the olbd can be used to keep track of the status of a file, if it is open for reading, in which case it cannot be opened by another process for writing or if a file is open for writing in which case it cannot be opened by anybody else. Cheers, Fons. Andrew Hanushevsky wrote: > Hi Pete, > > It's part of the root protocol. That part was extended to xroot as well. > > Andy > > On Sat, 5 Mar 2005, Peter Elmer wrote: > > >> Hi Wilko, >> >>On Fri, Mar 04, 2005 at 12:22:00PM -0800, Wilko Kroeger wrote: >> >>>Ok, thanks. I will try out the head. >>>If possible we should also fix the problem that the kXR_force >>>is used. It seems to me quite dangerous that two clients >>>can write to the same file or one could over write piece >>>of an existing file. >> >> I agree completely. I am hoping Fabrizio will address this problem >>as soon as possible, but am actually surprised that the server allows this... >> >> Pete >> >> >>>On Thu, 3 Mar 2005, Peter Elmer wrote: >>> >>> >>>> Hi Wilko, >>>> >>>> Just for the record, Fabrizio just wrote (as part of a CVS commit): >>>> >>>>On Thu, Mar 03, 2005 at 07:33:42PM +0000, Fabrizio Furano wrote: >>>> >>>>>Hi again, >>>> >>>><...> >>>> >>>>> With this one I am no longer able to make xrdcp crash under heavy load >>>>>in the client/server machine. I am still investigating on the occasional >>>>>cpu eating, but it seems that that's more difficult, since in my tests, >>>>>the problem disappears when enabling the client side log, and for some >>>>>strange reason I am not able to spot it by attaching gdb to the process. >>>>> >>>>>Fabrizio >>>> >>>> Pete >>>> >>>> >>>>On Mon, Feb 28, 2005 at 12:26:48AM -0800, Wilko Kroeger wrote: >>>> >>>>>Hello Fabrizio >>>>> >>>>>I run the xrdcp test again and I can reproduce crashes in xrdcp >>>>>(some times it take 30-60 mins). >>>>>I used the xrootd version 20050226-0825 and xrdcp is running on a RHEL3 >>>>>machine. I read the same file over and over: >>>>> xrdcp -DIDebugLevel 2 root://${xrdhost}:2094///prod/test/small.test - > /dev/null >>>>> >>>>>The size of the small.test file is: >>>>> >>>>>>ls -l small.test >>>>> >>>>>rw-r--r-- 1 wilko ec 31457280 Feb 27 18:09 /u1/wilko/kanga/prod/test/small.test >>>>>which is 30 MB (30*1024*1024) >>>>> >>>>>I used debugLevel 1 and 2. >>>>> >>>>>You can find the core file and the debug output files in: >>>>>~wilko/bbdev/work/xrootd/core/20050227_2233_d1/ >>>>>~wilko/bbdev/work/xrootd/core/20050227_2302_d1/ >>>>>~wilko/bbdev/work/xrootd/core/20050227_2314_d2/ >>>>>~wilko/bbdev/work/xrootd/core/20050227_2350_d2/ >>>>> >>>>>each directory contains a core file and the debug output file >>>>>(wk_log...). The ending d1 or d2 means debuglevel 1 or 2. >>>>> >>>>>With debug option = 1, gdb shows: >>>>>#0 0x0018b17c in memcpy () from /lib/tls/libc.so.6 >>>>>#1 0x0806edbc in XrdClientReadCacheItem::GetPartialInterval(void const*, >>>>> long long, long long) (this=0x9f107d0, buffer=0xb5750d08, >>>>> begin_offs=31457280, end_offs=31714559) at XrdClientReadCache.hh:93 >>>>> >>>>>whereas with debugLevel=2, gdb shows: >>>>> >>>>>#0 0x00a4e027 in _int_free () from /lib/tls/libc.so.6 >>>>>#1 0x00a4d018 in free () from /lib/tls/libc.so.6 >>>>>#2 0x0806d984 in ~XrdClientReadCacheItem (this=0x96b3db8) at >>>>> XrdClientReadCache.cc:40 >>>>> >>>>> >>>>>On the xrootd site I see the error: >>>>>050227 23:54:39 064 XrdLink: Unable to receive from wilko.30110:17@tori0001; >>>>> connection reset by peer >>>>>050227 23:54:39 064 XrootdXeq: wilko.30110:17@tori0001 disc 1:02:03 (link >>>>> read error) >>>>> >>>>>(the corresponding client crash was around 23:50) >>>>> >>>>> >>>>>Thanks for looking into this, >>>>> >>>>>Wilko >>>>> >> >> >> >>------------------------------------------------------------------------- >>Peter Elmer E-mail: [log in to unmask] Phone: +41 (22) 767-4644 >>Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland >>------------------------------------------------------------------------- >> -- Org: CERN, European Laboratory for Particle Physics. Mail: 1211 Geneve 23, Switzerland E-Mail: [log in to unmask] Phone: +41 22 7679248 WWW: http://www.rademakers.org/fons/ Fax: +41 22 7679480