Hi, well, I don;t know this. From this it seems that the problem comes from the root-native client. I'll try anyway to reproduce it, even if the log shown does not include the activity but only the error messages. If I find something wrong I'll try to fix it, but I have no idea about the best way to propagate the fix for ROOT to the babar sw release. But, since that client version is quite old, I am a bit puzzled about not having seen other similar feedbacks. Fabrizio On Tue, 2005-10-11 at 17:42 -0700, Andy Hanushevsky wrote: > Hi Andreas, > > Don't know. Fabrizio? > > Andy > > ----- Original Message ----- > From: "Andreas Petzold" <[log in to unmask]> > To: "Andy Hanushevsky" <[log in to unmask]> > Cc: "Gregory Schott" <[log in to unmask]>; <[log in to unmask]>; "xrootd > mailing list" <[log in to unmask]> > Sent: Tuesday, October 11, 2005 4:09 PM > Subject: Re: your mail > > > > Hi, > > > > Andy Hanushevsky wrote: > >> Hi Gregory, > >> > >> This is a client problem. You are right, you should have been able to > >> restart the server with no problems. Fabrizio, do you see what happened > >> here? The file was opened, the server was restarted, the connection we > >> remade to that server, but the file was not re-opened. Instead, the > >> original file handle was used for the read. Apparently, there is a small > >> timing window where that could happen and that causes the job to crash. > >> Two solutions a) (the better one) close the tming window, b) (the > >> sloppier one) re-open the file if you get that particular error. > > > > hmm, does that mean we don't have a chance of fixing this for the current > > BABAR sw releases? > > > > Cheers, > > > > Andreas > > > >> > >> Andy > >> > >> ----- Original Message ----- From: "Gregory Schott" <[log in to unmask]> > >> To: "Miriam Fritsch" <[log in to unmask]> > >> Cc: "xrootd mailing list" <[log in to unmask]>; "SkimSOS" > >> <[log in to unmask]> > >> Sent: Tuesday, October 11, 2005 11:14 AM > >> Subject: Re: your mail > >> > >> > >>> Hello Miriam, > >>> > >>> OK. This was at the time one of the servers was restarted (it got ofline > >>> just a second or two). Andreas thought that in this case the currently > >>> reading processes would reconnect to the redirector for re-assignemrnt > >>> of a dataserver. Apparently it crashes instead. > >>> > >>> I am forwarding to the xrootd experts to ask them for their opinion. We > >>> are using the latest (July) production version and the config files > >>> looks like: > >>> > >>> $ cat config/redirector.cf > >>> olb.allow host babar2.gridka.de > >>> olb.allow host f01-014-108.gridka.de > >>> olb.allow host f01-016-102.gridka.de > >>> olb.allow host f01-016-101.gridka.de > >>> olb.allow host f01-014-106.gridka.de > >>> olb.allow host f01-016-108.gridka.de > >>> olb.allow host f01-016-109.gridka.de > >>> olb.allow host f01-016-106.gridka.de > >>> olb.allow host f01-016-107.gridka.de > >>> olb.allow host f01-014-103.gridka.de > >>> olb.allow host f01-014-107.gridka.de > >>> olb.allow host f01-005-151.gridka.de > >>> olb.allow host f01-010-110.gridka.de > >>> olb.allow host f01-005-115.gridka.de > >>> olb.allow host f01-010-107.gridka.de > >>> olb.allow host l01-001-122.gridka.de > >>> olb.port 3121 > >>> > >>> odc.manager l01-001-122.gridka.de 3121 > >>> > >>> xrootd.fslib /home/xrootd/software/current/lib/libXrdOfs.so > >>> xrootd.export /prod > >>> xrootd.export /store > >>> > >>> odc.trace redirect > >>> --- > >>> $ cat config/dataserver.cfg > >>> odc.manager l01-001-122.gridka.de 3121 > >>> > >>> olb.allow host babar2.gridka.de > >>> olb.allow host f01-014-108.gridka.de > >>> olb.allow host f01-016-102.gridka.de > >>> olb.allow host f01-016-101.gridka.de > >>> olb.allow host f01-014-106.gridka.de > >>> olb.allow host f01-016-108.gridka.de > >>> olb.allow host f01-016-109.gridka.de > >>> olb.allow host f01-016-106.gridka.de > >>> olb.allow host f01-016-107.gridka.de > >>> olb.allow host f01-014-103.gridka.de > >>> olb.allow host f01-014-107.gridka.de > >>> olb.allow host f01-005-151.gridka.de > >>> olb.allow host 10.65.10.110 > >>> olb.allow host f01-010-110.gridka.de > >>> olb.allow host 10.65.5.115 > >>> olb.allow host f01-005-115.gridka.de > >>> olb.allow host f01-010-107.gridka.de > >>> olb.allow host l01-001-122.gridka.de > >>> > >>> olb.path r /store > >>> olb.path w /prod > >>> olb.port 3121 > >>> olb.sched cpu 100 > >>> olb.subscribe l01-001-122.gridka.de > >>> olb.wait > >>> > >>> ofs.redirect remote if l01-001-122.gridka.de > >>> ofs.redirect target > >>> > >>> oss.alloc * * 80 > >>> oss.fdlimit * max > >>> oss.localroot /home/xrootd/disk/kanga-export/EventStore/ > >>> > >>> xrd.protocol xrootd * > >>> xrootd.async off > >>> xrootd.export /prod > >>> xrootd.export /store > >>> xrootd.fslib /home/xrootd/software/current/lib/libXrdOfs.so > >>> xrootd.chksum crc32 /home/xrootd/bin/getCRC32.sh > >>> > >>> odc.trace redirect > >>> --- > >>> > >>> Did anything also happen at 18:33 or 18:45 when the redirector got > >>> reset? In principle nothing happened from your point of view. > >>> > >>> Cheers, > >>> > >>> -- Gregory > >>> > >>> > >>> > >>> On Tue, 11 Oct 2005, Miriam Fritsch wrote: > >>> > >>>> > >>>> Hi Gregory, > >>>> > >>>> some jobs crash with the following error message: > >>>> > >>>> --------------------------------------------------------------------------- > >>>> > >>>> 18:21:37.524 EvtCounter: processing event # 12085 [ > >>>> 1d:ffffffff:04ee72/3f73bb1d:V ] > >>>> 2005-10-11 18:21:37 19228 Err : TXMessage::ReadRaw - Error > >>>> reading 8 bytes > >>>> 2005-10-11 18:21:37 19228 Err : ReadPartialAnswer - Error > >>>> reading msg from connmgr (server [f01-010-107.gridka.de:1094]). > >>>> 18:21:44.575 EvtCounter: processing event # 12086 [ > >>>> 1d:ffffffff:04ee72/3f73be86:J ] > >>>> 2005-10-11 18:21:44 19228 Err : TXNetFile::ReadBuffer - Server > >>>> [f01-010-107.gridka.de:1094] did not return OK message for last reque > >>>> st. > >>>> 2005-10-11 18:21:44 19228 Err : SendGenCommand - Server > >>>> declared error 3004: 'read does not refer to an open file' > >>>> -- JOB > >>>> DONE -------------------------------------------------------------- > >>>> > >>>> Cheers, > >>>> > >>>> Miriam > >>>> > >>>> > >>>> ************************************************************************* > >>>> > >>>> Dr. Miriam Fritsch > >>>> > >>>> Institut fuer Experimentalphysik I > >>>> Ruhr-Universitaet Bochum, Germany email: > >>>> [log in to unmask] > >>>> c/o SLAC tel: +1 (650) 926-3565 > >>>> 2575 Sand Hill Road #34 fax: +1 (650) 926-3882 > >>>> Menlo Park, CA 94025, USA home: +1 (650) 324-2813 > >>>> > >>>> ************************************************************************* > >>>> > >>>> > >>> > > > >