Print

Print


  Hi Gregory,

   Regarding which process to gcore, it is probably 7040, but try running
'pstree' and take the one at the top of the tree.

  (Andy will have to respond regarding stopping/starting and additional
tests.)

                                   Pete

On Thu, Jul 07, 2005 at 07:10:15PM +0200, Gregory Schott wrote:
> Hello Andy,
> 
> Just to let you know, I couldn't yet do the gcore as it was not installed 
> on these NAS boxes. The GridKa admins were apparently busy... I found out 
> today that gcore exists only since Redhat 8.0 and these NAS boxes are RedHat
> 7.3. I tried to compile from source but I am still missing some dependances.
> 
> I have two questions:
> - on which of the processes should I run gcore (there are many running)
> 
> 000 S xrootd    7040     1  0  85  10    -  4780 schedu Jul04 ?        00:00:01 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 000 S xrootd    7041     1  0  85  10    -  3417 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/olbd -s -l /tmp/f01-014-106.olblog -c config/d
> 040 S xrootd    7042  7041  0  85  10    -  3417 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/olbd -s -l /tmp/f01-014-106.olblog -c config/d
> 040 S xrootd    7043  7042  0  85  10    -  3417 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/olbd -s -l /tmp/f01-014-106.olblog -c config/d
> 040 S xrootd    7044  7040  0  85  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7045  7044  0  85  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7046  7044  0  85  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7048  7044  0  85  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7049  7044  0  85  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7050  7044  0  85  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7051  7044  0  85  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7052  7044  0  90  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7053  7044  0  90  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7054  7044  0  85  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7055  7044  0  85  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7056  7042  0  85  10    -  3417 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/olbd -s -l /tmp/f01-014-106.olblog -c config/d
> 040 S xrootd    7057  7042  0  90  10    -  3417 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/olbd -s -l /tmp/f01-014-106.olblog -c config/d
> 040 S xrootd    7058  7042  0  85  10    -  3417 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/olbd -s -l /tmp/f01-014-106.olblog -c config/d
> 040 S xrootd    7062  7042  0  85  10    -  3417 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/olbd -s -l /tmp/f01-014-106.olblog -c config/d
> 040 S xrootd   25073  7044  0  85  10    -  4780 rt_sig Jul06 ?        00:01:34 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd   28474  7044  0  85  10    -  4780 rt_sig Jul06 ?        00:01:30 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    6991  7044  0  85  10    -  4780 rt_sig Jul06 ?        00:00:31 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    9026  7044  0  85  10    -  4780 rt_sig Jul06 ?        00:00:10 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    9486  7044  0  85  10    -  4780 rt_sig Jul06 ?        00:00:07 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 
> - can I stop and restart xrootd after the gcore or do you need some 
> additional test?
> 
> I'll send you the gcore once I have it.
> 
> Cheers,
>    Gregory
> 
> 
> On Thu, 30 Jun 2005, Andy Hanushevsky wrote:
> 
> > Hi Gregory,
> >
> > A simple gcore on the one you kept is sufficient. I see that you are running 
> > 20050328. I beielve we had some fixes in that area since that release. We are 
> > close to certifying a new production release that may very well be better to 
> > run overall. Anyway, if you can, put the gcore in afs along with the 
> > executable and shared library that you are using; thanks.
> >
> > Andy
> >
> > ----- Original Message ----- From: "Gregory Schott" <[log in to unmask]>
> > To: "xrootd mailing list" <[log in to unmask]>
> > Sent: Thursday, June 30, 2005 2:29 AM
> > Subject: Manager appears to be dead
> >
> >
> >> 
> >> Hello,
> >> 
> >> Doing some checks on the NAS boxes at GridKa I noticed from the log files 
> >> that 3/7 file servers were stalled. I restarted 2 of these and kept one 
> >> stalled in case you want me to do some test. I appended below the logfiles 
> >> for the relevant period. After that time, the "Manager appears to be dead" 
> >> message continues till June 30th.
> >> 
> >> I have another question: what is the status of xrd monitoring? Where can I 
> >> find documentation on how to use it and would it have detected that the 
> >> dataserver was stalled?
> >> 
> >> --------
> >> 
> >> dataserver (f01-016-106): seems to be stalled since 25/06/2005, I kept the 
> >> served stalled in case xrd experts ask me for doing any test for 
> >> understanding this issue
> >> 
> >> 050625 00:00:00 1124 olb_Config: (c) 2004 SLAC olbd version 
> >> 20050328-0656_dbg executing as Server
> >> 050625 07:26:53 1130 olb_Manager: Manager l01-001-122.gridka.de appears to 
> >> be dead.
> >> 050625 07:29:06 1130 olb_Manager: Manager appears to be dead.
> >> 
> >> redirector l01-001-122.gridka.de: old log for this period (nothing in xrd 
> >> log):
> >> 
> >> 050625 07:07:29 14906 olb_Server: f01-014-103.gridka.de:1094 load=0; cpu=0 
> >> i/o=0 inq=0 mem=0 pag=0 dsk=0 tot=0
> >> 050625 07:27:30 14906 olb_Manager: f01-016-108.gridka.de:1094 scheduled for 
> >> removal; not responding
> >> 050625 07:27:30 14906 olb_Manager: f01-014-107.gridka.de:1094 scheduled for 
> >> removal; not responding
> >> 050625 07:27:30 14906 olb_Manager: f01-016-106.gridka.de:1094 scheduled for 
> >> removal; not responding
> >> 050625 07:27:30 14906 olb_Manager: f01-014-106.gridka.de:1094 scheduled for 
> >> removal; not responding
> >> 050625 07:37:30 14906 olb_Server: f01-016-108.gridka.de:1094 dropped.
> >> 050625 07:37:30 14906 olb_Server: f01-014-107.gridka.de:1094 dropped.
> >> 050625 07:37:30 14906 olb_Server: f01-016-106.gridka.de:1094 dropped.
> >> 050625 07:37:30 14906 olb_Server: f01-014-106.gridka.de:1094 dropped.
> >> 050625 07:56:30 14906 olb_GetLine: Unable to read request; connection timed 
> >> out
> >> 050625 07:56:30 14906 olb_GetLine: Unable to read request; connection timed 
> >> out
> >> 050625 07:56:30 14906 olb_GetLine: Unable to read request; connection timed 
> >> out
> >> 050625 07:56:30 14906 olb_Manager: server f01-016-106.gridka.de:1094 forced 
> >> out.
> >> 050625 07:56:30 14906 olb_Manager: server f01-014-107.gridka.de:1094 forced 
> >> out.
> >> 050625 07:56:30 14906 olb_Manager: server f01-016-108.gridka.de:1094 forced 
> >> out.
> >> 050625 07:56:39 14906 olb_GetLine: Unable to read request; connection timed 
> >> out
> >> 050625 07:56:39 14906 olb_Manager: server f01-014-106.gridka.de:1094 forced 
> >> out.
> >> 050625 08:57:30 14906 olb_Server: 10.65.5.115:1094 load=0; cpu=0 i/o=0 
> >> inq=0 mem=0 pag=0 dsk=0 tot=0
> >> 050625 08:57:30 14906 olb_Server: f01-016-109.gridka.de:1094 load=0; cpu=0 
> >> i/o=0 inq=0 mem=0 pag=0 dsk=0 tot=0
> >> 
> >> --------
> >> 
> >> Cheers,
> >>   Gregory
> >> 
> >



-------------------------------------------------------------------------
Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 767-4644
Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
-------------------------------------------------------------------------