[I've taken this off the skimsos-hn] Hi Fabrizio, Fabrizio Furano wrote: > Hi, > > in principle I agree. Maybe some introduced bug, maybe not. > > Which client are you using, Gregory? Are you using the one which is > coming from Root 4? we are using the 18series (18.2.1b) BABAR releases, so I guess that means we are using the client from ROOT 4.03-02. Cheers, Andreas > > Fabrizio > > On Tue, 2005-10-11 at 14:48 -0700, Andy Hanushevsky wrote: > >>Hi Gregory, >> >>This is a client problem. You are right, you should have been able to >>restart the server with no problems. Fabrizio, do you see what happened >>here? The file was opened, the server was restarted, the connection we >>remade to that server, but the file was not re-opened. Instead, the original >>file handle was used for the read. Apparently, there is a small timing >>window where that could happen and that causes the job to crash. Two >>solutions a) (the better one) close the tming window, b) (the sloppier one) >>re-open the file if you get that particular error. >> >>Andy >> >>----- Original Message ----- >>From: "Gregory Schott" <[log in to unmask]> >>To: "Miriam Fritsch" <[log in to unmask]> >>Cc: "xrootd mailing list" <[log in to unmask]>; "SkimSOS" >><[log in to unmask]> >>Sent: Tuesday, October 11, 2005 11:14 AM >>Subject: Re: your mail >> >> >> >>>Hello Miriam, >>> >>>OK. This was at the time one of the servers was restarted (it got ofline >>>just a second or two). Andreas thought that in this case the currently >>>reading processes would reconnect to the redirector for re-assignemrnt of >>>a dataserver. Apparently it crashes instead. >>> >>>I am forwarding to the xrootd experts to ask them for their opinion. We >>>are using the latest (July) production version and the config files looks >>>like: >>> >>>$ cat config/redirector.cf >>>olb.allow host babar2.gridka.de >>>olb.allow host f01-014-108.gridka.de >>>olb.allow host f01-016-102.gridka.de >>>olb.allow host f01-016-101.gridka.de >>>olb.allow host f01-014-106.gridka.de >>>olb.allow host f01-016-108.gridka.de >>>olb.allow host f01-016-109.gridka.de >>>olb.allow host f01-016-106.gridka.de >>>olb.allow host f01-016-107.gridka.de >>>olb.allow host f01-014-103.gridka.de >>>olb.allow host f01-014-107.gridka.de >>>olb.allow host f01-005-151.gridka.de >>>olb.allow host f01-010-110.gridka.de >>>olb.allow host f01-005-115.gridka.de >>>olb.allow host f01-010-107.gridka.de >>>olb.allow host l01-001-122.gridka.de >>>olb.port 3121 >>> >>>odc.manager l01-001-122.gridka.de 3121 >>> >>>xrootd.fslib /home/xrootd/software/current/lib/libXrdOfs.so >>>xrootd.export /prod >>>xrootd.export /store >>> >>>odc.trace redirect >>>--- >>>$ cat config/dataserver.cfg >>>odc.manager l01-001-122.gridka.de 3121 >>> >>>olb.allow host babar2.gridka.de >>>olb.allow host f01-014-108.gridka.de >>>olb.allow host f01-016-102.gridka.de >>>olb.allow host f01-016-101.gridka.de >>>olb.allow host f01-014-106.gridka.de >>>olb.allow host f01-016-108.gridka.de >>>olb.allow host f01-016-109.gridka.de >>>olb.allow host f01-016-106.gridka.de >>>olb.allow host f01-016-107.gridka.de >>>olb.allow host f01-014-103.gridka.de >>>olb.allow host f01-014-107.gridka.de >>>olb.allow host f01-005-151.gridka.de >>>olb.allow host 10.65.10.110 >>>olb.allow host f01-010-110.gridka.de >>>olb.allow host 10.65.5.115 >>>olb.allow host f01-005-115.gridka.de >>>olb.allow host f01-010-107.gridka.de >>>olb.allow host l01-001-122.gridka.de >>> >>>olb.path r /store >>>olb.path w /prod >>>olb.port 3121 >>>olb.sched cpu 100 >>>olb.subscribe l01-001-122.gridka.de >>>olb.wait >>> >>>ofs.redirect remote if l01-001-122.gridka.de >>>ofs.redirect target >>> >>>oss.alloc * * 80 >>>oss.fdlimit * max >>>oss.localroot /home/xrootd/disk/kanga-export/EventStore/ >>> >>>xrd.protocol xrootd * >>>xrootd.async off >>>xrootd.export /prod >>>xrootd.export /store >>>xrootd.fslib /home/xrootd/software/current/lib/libXrdOfs.so >>>xrootd.chksum crc32 /home/xrootd/bin/getCRC32.sh >>> >>>odc.trace redirect >>>--- >>> >>>Did anything also happen at 18:33 or 18:45 when the redirector got reset? >>>In principle nothing happened from your point of view. >>> >>>Cheers, >>> >>>-- Gregory >>> >>> >>> >>>On Tue, 11 Oct 2005, Miriam Fritsch wrote: >>> >>> >>>>Hi Gregory, >>>> >>>>some jobs crash with the following error message: >>>> >>>>--------------------------------------------------------------------------- >>>>18:21:37.524 EvtCounter: processing event # 12085 [ >>>>1d:ffffffff:04ee72/3f73bb1d:V ] >>>>2005-10-11 18:21:37 19228 Err : TXMessage::ReadRaw - Error >>>>reading 8 bytes >>>>2005-10-11 18:21:37 19228 Err : ReadPartialAnswer - Error >>>>reading msg from connmgr (server [f01-010-107.gridka.de:1094]). >>>>18:21:44.575 EvtCounter: processing event # 12086 [ >>>>1d:ffffffff:04ee72/3f73be86:J ] >>>>2005-10-11 18:21:44 19228 Err : TXNetFile::ReadBuffer - Server >>>>[f01-010-107.gridka.de:1094] did not return OK message for last reque >>>>st. >>>>2005-10-11 18:21:44 19228 Err : SendGenCommand - Server >>>>declared error 3004: 'read does not refer to an open file' >>>>-- JOB >>>>DONE -------------------------------------------------------------- >>>> >>>>Cheers, >>>> >>>>Miriam >>>> >>>> >>>>************************************************************************* >>>> >>>>Dr. Miriam Fritsch >>>> >>>>Institut fuer Experimentalphysik I >>>>Ruhr-Universitaet Bochum, Germany email: [log in to unmask] >>>>c/o SLAC tel: +1 (650) 926-3565 >>>>2575 Sand Hill Road #34 fax: +1 (650) 926-3882 >>>>Menlo Park, CA 94025, USA home: +1 (650) 324-2813 >>>> >>>>************************************************************************* >>>> >>>> >>>