Hello,
Doing some checks on the NAS boxes at GridKa I noticed from the log files
that 3/7 file servers were stalled. I restarted 2 of these and kept one
stalled in case you want me to do some test. I appended below the logfiles
for the relevant period. After that time, the "Manager appears to be dead"
message continues till June 30th.
I have another question: what is the status of xrd monitoring? Where can I
find documentation on how to use it and would it have detected that the
dataserver was stalled?
--------
dataserver (f01-016-106): seems to be stalled since 25/06/2005, I kept the
served stalled in case xrd experts ask me for doing any test for understanding
this issue
050625 00:00:00 1124 olb_Config: (c) 2004 SLAC olbd version 20050328-0656_dbg executing as Server
050625 07:26:53 1130 olb_Manager: Manager l01-001-122.gridka.de appears to be dead.
050625 07:29:06 1130 olb_Manager: Manager appears to be dead.
redirector l01-001-122.gridka.de: old log for this period (nothing in xrd log):
050625 07:07:29 14906 olb_Server: f01-014-103.gridka.de:1094 load=0; cpu=0 i/o=0 inq=0 mem=0 pag=0 dsk=0 tot=0
050625 07:27:30 14906 olb_Manager: f01-016-108.gridka.de:1094 scheduled for removal; not responding
050625 07:27:30 14906 olb_Manager: f01-014-107.gridka.de:1094 scheduled for removal; not responding
050625 07:27:30 14906 olb_Manager: f01-016-106.gridka.de:1094 scheduled for removal; not responding
050625 07:27:30 14906 olb_Manager: f01-014-106.gridka.de:1094 scheduled for removal; not responding
050625 07:37:30 14906 olb_Server: f01-016-108.gridka.de:1094 dropped.
050625 07:37:30 14906 olb_Server: f01-014-107.gridka.de:1094 dropped.
050625 07:37:30 14906 olb_Server: f01-016-106.gridka.de:1094 dropped.
050625 07:37:30 14906 olb_Server: f01-014-106.gridka.de:1094 dropped.
050625 07:56:30 14906 olb_GetLine: Unable to read request; connection timed out
050625 07:56:30 14906 olb_GetLine: Unable to read request; connection timed out
050625 07:56:30 14906 olb_GetLine: Unable to read request; connection timed out
050625 07:56:30 14906 olb_Manager: server f01-016-106.gridka.de:1094 forced out.
050625 07:56:30 14906 olb_Manager: server f01-014-107.gridka.de:1094 forced out.
050625 07:56:30 14906 olb_Manager: server f01-016-108.gridka.de:1094 forced out.
050625 07:56:39 14906 olb_GetLine: Unable to read request; connection timed out
050625 07:56:39 14906 olb_Manager: server f01-014-106.gridka.de:1094 forced out.
050625 08:57:30 14906 olb_Server: 10.65.5.115:1094 load=0; cpu=0 i/o=0 inq=0 mem=0 pag=0 dsk=0 tot=0
050625 08:57:30 14906 olb_Server: f01-016-109.gridka.de:1094 load=0; cpu=0 i/o=0 inq=0 mem=0 pag=0 dsk=0 tot=0
--------
Cheers,
Gregory
|