Print

Print


Hi Andy,

   the old rootd never allowed two or more clients to open the same file 
for writing. We have the following rules:

A file can only be opened for writing if the file is not already open for 
reading or writing by any other process.
A file can only be opened for reading if the file is not already open for 
writing (where writing of course includes "update" mode in ROOT parlance).

To be a full drop in replacement xrootd must be able to support the file 
update mode. I think this could be easily achieved if we allow r/w pools 
where the olbd never will trigger a replication. In such a pool the olbd 
can be used to keep track of the status of a file, if it is open for 
reading, in which case it cannot be opened by another process for writing 
or if a file is open for writing in which case it cannot be opened by 
anybody else.


Cheers, Fons.



Andrew Hanushevsky wrote:
> Hi Pete,
> 
> It's part of the root protocol. That part was extended to xroot as well.
> 
> Andy
> 
> On Sat, 5 Mar 2005, Peter Elmer wrote:
> 
> 
>>  Hi Wilko,
>>
>>On Fri, Mar 04, 2005 at 12:22:00PM -0800, Wilko Kroeger wrote:
>>
>>>Ok, thanks. I will try out the head.
>>>If possible we should also fix the problem that the kXR_force
>>>is used. It seems to me quite dangerous that two clients
>>>can write to the same file or one could over write piece
>>>of an existing file.
>>
>>  I agree completely. I am hoping Fabrizio will address this problem
>>as soon as possible, but am actually surprised that the server allows this...
>>
>>                                   Pete
>>
>>
>>>On Thu, 3 Mar 2005, Peter Elmer wrote:
>>>
>>>
>>>>  Hi Wilko,
>>>>
>>>>  Just for the record, Fabrizio just wrote (as part of a CVS commit):
>>>>
>>>>On Thu, Mar 03, 2005 at 07:33:42PM +0000, Fabrizio Furano wrote:
>>>>
>>>>>Hi again,
>>>>
>>>><...>
>>>>
>>>>> With this one I am no longer able to make xrdcp crash under heavy load
>>>>>in the client/server machine. I am still investigating on the occasional
>>>>>cpu eating, but it seems that that's more difficult, since in my tests,
>>>>>the problem disappears when enabling the client side log, and for some
>>>>>strange reason I am not able to spot it by attaching gdb to the process.
>>>>>
>>>>>Fabrizio
>>>>
>>>>                                   Pete
>>>>
>>>>
>>>>On Mon, Feb 28, 2005 at 12:26:48AM -0800, Wilko Kroeger wrote:
>>>>
>>>>>Hello Fabrizio
>>>>>
>>>>>I run the xrdcp test again and I can reproduce crashes in xrdcp
>>>>>(some times it take 30-60 mins).
>>>>>I used the xrootd version 20050226-0825 and xrdcp is running on a RHEL3
>>>>>machine. I read the same file over and over:
>>>>>  xrdcp -DIDebugLevel 2 root://${xrdhost}:2094///prod/test/small.test - > /dev/null
>>>>>
>>>>>The size of the small.test file is:
>>>>>
>>>>>>ls -l small.test
>>>>>
>>>>>rw-r--r--   1 wilko  ec  31457280 Feb 27 18:09 /u1/wilko/kanga/prod/test/small.test
>>>>>which is 30 MB (30*1024*1024)
>>>>>
>>>>>I used debugLevel 1 and 2.
>>>>>
>>>>>You can find the core file and the debug output files in:
>>>>>~wilko/bbdev/work/xrootd/core/20050227_2233_d1/
>>>>>~wilko/bbdev/work/xrootd/core/20050227_2302_d1/
>>>>>~wilko/bbdev/work/xrootd/core/20050227_2314_d2/
>>>>>~wilko/bbdev/work/xrootd/core/20050227_2350_d2/
>>>>>
>>>>>each directory contains a core file and the debug output file
>>>>>(wk_log...). The ending d1 or d2 means debuglevel 1 or 2.
>>>>>
>>>>>With debug option = 1, gdb shows:
>>>>>#0  0x0018b17c in memcpy () from /lib/tls/libc.so.6
>>>>>#1  0x0806edbc in XrdClientReadCacheItem::GetPartialInterval(void const*,
>>>>>    long long, long long) (this=0x9f107d0, buffer=0xb5750d08,
>>>>>    begin_offs=31457280, end_offs=31714559) at XrdClientReadCache.hh:93
>>>>>
>>>>>whereas with debugLevel=2, gdb shows:
>>>>>
>>>>>#0  0x00a4e027 in _int_free () from /lib/tls/libc.so.6
>>>>>#1  0x00a4d018 in free () from /lib/tls/libc.so.6
>>>>>#2  0x0806d984 in ~XrdClientReadCacheItem (this=0x96b3db8) at
>>>>>    XrdClientReadCache.cc:40
>>>>>
>>>>>
>>>>>On the xrootd site I see the error:
>>>>>050227 23:54:39 064 XrdLink: Unable to receive from wilko.30110:17@tori0001;
>>>>>       connection reset by peer
>>>>>050227 23:54:39 064 XrootdXeq: wilko.30110:17@tori0001 disc 1:02:03 (link
>>>>>       read error)
>>>>>
>>>>>(the corresponding client crash was around 23:50)
>>>>>
>>>>>
>>>>>Thanks for looking into this,
>>>>>
>>>>>Wilko
>>>>>
>>
>>
>>
>>-------------------------------------------------------------------------
>>Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 767-4644
>>Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
>>-------------------------------------------------------------------------
>>

-- 
Org:    CERN, European Laboratory for Particle Physics.
Mail:   1211 Geneve 23, Switzerland
E-Mail: [log in to unmask]              Phone: +41 22 7679248
WWW:    http://www.rademakers.org/fons/      Fax:   +41 22 7679480