Hi, in principle I agree. Maybe some introduced bug, maybe not. Which client are you using, Gregory? Are you using the one which is coming from Root 4? Fabrizio On Tue, 2005-10-11 at 14:48 -0700, Andy Hanushevsky wrote: > Hi Gregory, > > This is a client problem. You are right, you should have been able to > restart the server with no problems. Fabrizio, do you see what happened > here? The file was opened, the server was restarted, the connection we > remade to that server, but the file was not re-opened. Instead, the original > file handle was used for the read. Apparently, there is a small timing > window where that could happen and that causes the job to crash. Two > solutions a) (the better one) close the tming window, b) (the sloppier one) > re-open the file if you get that particular error. > > Andy > > ----- Original Message ----- > From: "Gregory Schott" <[log in to unmask]> > To: "Miriam Fritsch" <[log in to unmask]> > Cc: "xrootd mailing list" <[log in to unmask]>; "SkimSOS" > <[log in to unmask]> > Sent: Tuesday, October 11, 2005 11:14 AM > Subject: Re: your mail > > > > Hello Miriam, > > > > OK. This was at the time one of the servers was restarted (it got ofline > > just a second or two). Andreas thought that in this case the currently > > reading processes would reconnect to the redirector for re-assignemrnt of > > a dataserver. Apparently it crashes instead. > > > > I am forwarding to the xrootd experts to ask them for their opinion. We > > are using the latest (July) production version and the config files looks > > like: > > > > $ cat config/redirector.cf > > olb.allow host babar2.gridka.de > > olb.allow host f01-014-108.gridka.de > > olb.allow host f01-016-102.gridka.de > > olb.allow host f01-016-101.gridka.de > > olb.allow host f01-014-106.gridka.de > > olb.allow host f01-016-108.gridka.de > > olb.allow host f01-016-109.gridka.de > > olb.allow host f01-016-106.gridka.de > > olb.allow host f01-016-107.gridka.de > > olb.allow host f01-014-103.gridka.de > > olb.allow host f01-014-107.gridka.de > > olb.allow host f01-005-151.gridka.de > > olb.allow host f01-010-110.gridka.de > > olb.allow host f01-005-115.gridka.de > > olb.allow host f01-010-107.gridka.de > > olb.allow host l01-001-122.gridka.de > > olb.port 3121 > > > > odc.manager l01-001-122.gridka.de 3121 > > > > xrootd.fslib /home/xrootd/software/current/lib/libXrdOfs.so > > xrootd.export /prod > > xrootd.export /store > > > > odc.trace redirect > > --- > > $ cat config/dataserver.cfg > > odc.manager l01-001-122.gridka.de 3121 > > > > olb.allow host babar2.gridka.de > > olb.allow host f01-014-108.gridka.de > > olb.allow host f01-016-102.gridka.de > > olb.allow host f01-016-101.gridka.de > > olb.allow host f01-014-106.gridka.de > > olb.allow host f01-016-108.gridka.de > > olb.allow host f01-016-109.gridka.de > > olb.allow host f01-016-106.gridka.de > > olb.allow host f01-016-107.gridka.de > > olb.allow host f01-014-103.gridka.de > > olb.allow host f01-014-107.gridka.de > > olb.allow host f01-005-151.gridka.de > > olb.allow host 10.65.10.110 > > olb.allow host f01-010-110.gridka.de > > olb.allow host 10.65.5.115 > > olb.allow host f01-005-115.gridka.de > > olb.allow host f01-010-107.gridka.de > > olb.allow host l01-001-122.gridka.de > > > > olb.path r /store > > olb.path w /prod > > olb.port 3121 > > olb.sched cpu 100 > > olb.subscribe l01-001-122.gridka.de > > olb.wait > > > > ofs.redirect remote if l01-001-122.gridka.de > > ofs.redirect target > > > > oss.alloc * * 80 > > oss.fdlimit * max > > oss.localroot /home/xrootd/disk/kanga-export/EventStore/ > > > > xrd.protocol xrootd * > > xrootd.async off > > xrootd.export /prod > > xrootd.export /store > > xrootd.fslib /home/xrootd/software/current/lib/libXrdOfs.so > > xrootd.chksum crc32 /home/xrootd/bin/getCRC32.sh > > > > odc.trace redirect > > --- > > > > Did anything also happen at 18:33 or 18:45 when the redirector got reset? > > In principle nothing happened from your point of view. > > > > Cheers, > > > > -- Gregory > > > > > > > > On Tue, 11 Oct 2005, Miriam Fritsch wrote: > > > >> > >> Hi Gregory, > >> > >> some jobs crash with the following error message: > >> > >> --------------------------------------------------------------------------- > >> 18:21:37.524 EvtCounter: processing event # 12085 [ > >> 1d:ffffffff:04ee72/3f73bb1d:V ] > >> 2005-10-11 18:21:37 19228 Err : TXMessage::ReadRaw - Error > >> reading 8 bytes > >> 2005-10-11 18:21:37 19228 Err : ReadPartialAnswer - Error > >> reading msg from connmgr (server [f01-010-107.gridka.de:1094]). > >> 18:21:44.575 EvtCounter: processing event # 12086 [ > >> 1d:ffffffff:04ee72/3f73be86:J ] > >> 2005-10-11 18:21:44 19228 Err : TXNetFile::ReadBuffer - Server > >> [f01-010-107.gridka.de:1094] did not return OK message for last reque > >> st. > >> 2005-10-11 18:21:44 19228 Err : SendGenCommand - Server > >> declared error 3004: 'read does not refer to an open file' > >> -- JOB > >> DONE -------------------------------------------------------------- > >> > >> Cheers, > >> > >> Miriam > >> > >> > >> ************************************************************************* > >> > >> Dr. Miriam Fritsch > >> > >> Institut fuer Experimentalphysik I > >> Ruhr-Universitaet Bochum, Germany email: [log in to unmask] > >> c/o SLAC tel: +1 (650) 926-3565 > >> 2575 Sand Hill Road #34 fax: +1 (650) 926-3882 > >> Menlo Park, CA 94025, USA home: +1 (650) 324-2813 > >> > >> ************************************************************************* > >> > >> > >