Print

Print


Hi Gregory,

A simple gcore on the one you kept is sufficient. I see that you are running 
20050328. I beielve we had some fixes in that area since that release. We 
are close to certifying a new production release that may very well be 
better to run overall. Anyway, if you can, put the gcore in afs along with 
the executable and shared library that you are using; thanks.

Andy

----- Original Message ----- 
From: "Gregory Schott" <[log in to unmask]>
To: "xrootd mailing list" <[log in to unmask]>
Sent: Thursday, June 30, 2005 2:29 AM
Subject: Manager appears to be dead


>
> Hello,
>
> Doing some checks on the NAS boxes at GridKa I noticed from the log files 
> that 3/7 file servers were stalled. I restarted 2 of these and kept one 
> stalled in case you want me to do some test. I appended below the logfiles 
> for the relevant period. After that time, the "Manager appears to be dead" 
> message continues till June 30th.
>
> I have another question: what is the status of xrd monitoring? Where can I 
> find documentation on how to use it and would it have detected that the 
> dataserver was stalled?
>
> --------
>
> dataserver (f01-016-106): seems to be stalled since 25/06/2005, I kept the 
> served stalled in case xrd experts ask me for doing any test for 
> understanding this issue
>
> 050625 00:00:00 1124 olb_Config: (c) 2004 SLAC olbd version 
> 20050328-0656_dbg executing as Server
> 050625 07:26:53 1130 olb_Manager: Manager l01-001-122.gridka.de appears to 
> be dead.
> 050625 07:29:06 1130 olb_Manager: Manager appears to be dead.
>
> redirector l01-001-122.gridka.de: old log for this period (nothing in xrd 
> log):
>
> 050625 07:07:29 14906 olb_Server: f01-014-103.gridka.de:1094 load=0; cpu=0 
> i/o=0 inq=0 mem=0 pag=0 dsk=0 tot=0
> 050625 07:27:30 14906 olb_Manager: f01-016-108.gridka.de:1094 scheduled 
> for removal; not responding
> 050625 07:27:30 14906 olb_Manager: f01-014-107.gridka.de:1094 scheduled 
> for removal; not responding
> 050625 07:27:30 14906 olb_Manager: f01-016-106.gridka.de:1094 scheduled 
> for removal; not responding
> 050625 07:27:30 14906 olb_Manager: f01-014-106.gridka.de:1094 scheduled 
> for removal; not responding
> 050625 07:37:30 14906 olb_Server: f01-016-108.gridka.de:1094 dropped.
> 050625 07:37:30 14906 olb_Server: f01-014-107.gridka.de:1094 dropped.
> 050625 07:37:30 14906 olb_Server: f01-016-106.gridka.de:1094 dropped.
> 050625 07:37:30 14906 olb_Server: f01-014-106.gridka.de:1094 dropped.
> 050625 07:56:30 14906 olb_GetLine: Unable to read request; connection 
> timed out
> 050625 07:56:30 14906 olb_GetLine: Unable to read request; connection 
> timed out
> 050625 07:56:30 14906 olb_GetLine: Unable to read request; connection 
> timed out
> 050625 07:56:30 14906 olb_Manager: server f01-016-106.gridka.de:1094 
> forced out.
> 050625 07:56:30 14906 olb_Manager: server f01-014-107.gridka.de:1094 
> forced out.
> 050625 07:56:30 14906 olb_Manager: server f01-016-108.gridka.de:1094 
> forced out.
> 050625 07:56:39 14906 olb_GetLine: Unable to read request; connection 
> timed out
> 050625 07:56:39 14906 olb_Manager: server f01-014-106.gridka.de:1094 
> forced out.
> 050625 08:57:30 14906 olb_Server: 10.65.5.115:1094 load=0; cpu=0 i/o=0 
> inq=0 mem=0 pag=0 dsk=0 tot=0
> 050625 08:57:30 14906 olb_Server: f01-016-109.gridka.de:1094 load=0; cpu=0 
> i/o=0 inq=0 mem=0 pag=0 dsk=0 tot=0
>
> --------
>
> Cheers,
>   Gregory
>