Print

Print


Hi Gregory,

Normally, try to use the lowest number one; though any should do if Linux
is working right (each process is actually a thread - that's been "fixed"
in later releases). You can restart right after. Just make sure a) the
executable is compiled after a configure with "--build=debug" and b)
supply the executable; thanks.

Andy

On Thu, 7 Jul 2005, Gregory Schott wrote:

> Hello Andy,
>
> Just to let you know, I couldn't yet do the gcore as it was not installed
> on these NAS boxes. The GridKa admins were apparently busy... I found out
> today that gcore exists only since Redhat 8.0 and these NAS boxes are RedHat
> 7.3. I tried to compile from source but I am still missing some dependances.
>
> I have two questions:
> - on which of the processes should I run gcore (there are many running)
>
> 000 S xrootd    7040     1  0  85  10    -  4780 schedu Jul04 ?        00:00:01 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 000 S xrootd    7041     1  0  85  10    -  3417 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/olbd -s -l /tmp/f01-014-106.olblog -c config/d
> 040 S xrootd    7042  7041  0  85  10    -  3417 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/olbd -s -l /tmp/f01-014-106.olblog -c config/d
> 040 S xrootd    7043  7042  0  85  10    -  3417 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/olbd -s -l /tmp/f01-014-106.olblog -c config/d
> 040 S xrootd    7044  7040  0  85  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7045  7044  0  85  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7046  7044  0  85  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7048  7044  0  85  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7049  7044  0  85  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7050  7044  0  85  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7051  7044  0  85  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7052  7044  0  90  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7053  7044  0  90  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7054  7044  0  85  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7055  7044  0  85  10    -  4780 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    7056  7042  0  85  10    -  3417 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/olbd -s -l /tmp/f01-014-106.olblog -c config/d
> 040 S xrootd    7057  7042  0  90  10    -  3417 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/olbd -s -l /tmp/f01-014-106.olblog -c config/d
> 040 S xrootd    7058  7042  0  85  10    -  3417 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/olbd -s -l /tmp/f01-014-106.olblog -c config/d
> 040 S xrootd    7062  7042  0  85  10    -  3417 schedu Jul04 ?        00:00:00 /home/xrootd/software/current/bin/olbd -s -l /tmp/f01-014-106.olblog -c config/d
> 040 S xrootd   25073  7044  0  85  10    -  4780 rt_sig Jul06 ?        00:01:34 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd   28474  7044  0  85  10    -  4780 rt_sig Jul06 ?        00:01:30 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    6991  7044  0  85  10    -  4780 rt_sig Jul06 ?        00:00:31 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    9026  7044  0  85  10    -  4780 rt_sig Jul06 ?        00:00:10 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
> 040 S xrootd    9486  7044  0  85  10    -  4780 rt_sig Jul06 ?        00:00:07 /home/xrootd/software/current/bin/xrootd -p 1094 -l /tmp/f01-014-106.xrdlog -c c
>
> - can I stop and restart xrootd after the gcore or do you need some
> additional test?
>
> I'll send you the gcore once I have it.
>
> Cheers,
>    Gregory
>
>
> On Thu, 30 Jun 2005, Andy Hanushevsky wrote:
>
> > Hi Gregory,
> >
> > A simple gcore on the one you kept is sufficient. I see that you are running
> > 20050328. I beielve we had some fixes in that area since that release. We are
> > close to certifying a new production release that may very well be better to
> > run overall. Anyway, if you can, put the gcore in afs along with the
> > executable and shared library that you are using; thanks.
> >
> > Andy
> >
> > ----- Original Message ----- From: "Gregory Schott" <[log in to unmask]>
> > To: "xrootd mailing list" <[log in to unmask]>
> > Sent: Thursday, June 30, 2005 2:29 AM
> > Subject: Manager appears to be dead
> >
> >
> >>
> >> Hello,
> >>
> >> Doing some checks on the NAS boxes at GridKa I noticed from the log files
> >> that 3/7 file servers were stalled. I restarted 2 of these and kept one
> >> stalled in case you want me to do some test. I appended below the logfiles
> >> for the relevant period. After that time, the "Manager appears to be dead"
> >> message continues till June 30th.
> >>
> >> I have another question: what is the status of xrd monitoring? Where can I
> >> find documentation on how to use it and would it have detected that the
> >> dataserver was stalled?
> >>
> >> --------
> >>
> >> dataserver (f01-016-106): seems to be stalled since 25/06/2005, I kept the
> >> served stalled in case xrd experts ask me for doing any test for
> >> understanding this issue
> >>
> >> 050625 00:00:00 1124 olb_Config: (c) 2004 SLAC olbd version
> >> 20050328-0656_dbg executing as Server
> >> 050625 07:26:53 1130 olb_Manager: Manager l01-001-122.gridka.de appears to
> >> be dead.
> >> 050625 07:29:06 1130 olb_Manager: Manager appears to be dead.
> >>
> >> redirector l01-001-122.gridka.de: old log for this period (nothing in xrd
> >> log):
> >>
> >> 050625 07:07:29 14906 olb_Server: f01-014-103.gridka.de:1094 load=0; cpu=0
> >> i/o=0 inq=0 mem=0 pag=0 dsk=0 tot=0
> >> 050625 07:27:30 14906 olb_Manager: f01-016-108.gridka.de:1094 scheduled for
> >> removal; not responding
> >> 050625 07:27:30 14906 olb_Manager: f01-014-107.gridka.de:1094 scheduled for
> >> removal; not responding
> >> 050625 07:27:30 14906 olb_Manager: f01-016-106.gridka.de:1094 scheduled for
> >> removal; not responding
> >> 050625 07:27:30 14906 olb_Manager: f01-014-106.gridka.de:1094 scheduled for
> >> removal; not responding
> >> 050625 07:37:30 14906 olb_Server: f01-016-108.gridka.de:1094 dropped.
> >> 050625 07:37:30 14906 olb_Server: f01-014-107.gridka.de:1094 dropped.
> >> 050625 07:37:30 14906 olb_Server: f01-016-106.gridka.de:1094 dropped.
> >> 050625 07:37:30 14906 olb_Server: f01-014-106.gridka.de:1094 dropped.
> >> 050625 07:56:30 14906 olb_GetLine: Unable to read request; connection timed
> >> out
> >> 050625 07:56:30 14906 olb_GetLine: Unable to read request; connection timed
> >> out
> >> 050625 07:56:30 14906 olb_GetLine: Unable to read request; connection timed
> >> out
> >> 050625 07:56:30 14906 olb_Manager: server f01-016-106.gridka.de:1094 forced
> >> out.
> >> 050625 07:56:30 14906 olb_Manager: server f01-014-107.gridka.de:1094 forced
> >> out.
> >> 050625 07:56:30 14906 olb_Manager: server f01-016-108.gridka.de:1094 forced
> >> out.
> >> 050625 07:56:39 14906 olb_GetLine: Unable to read request; connection timed
> >> out
> >> 050625 07:56:39 14906 olb_Manager: server f01-014-106.gridka.de:1094 forced
> >> out.
> >> 050625 08:57:30 14906 olb_Server: 10.65.5.115:1094 load=0; cpu=0 i/o=0
> >> inq=0 mem=0 pag=0 dsk=0 tot=0
> >> 050625 08:57:30 14906 olb_Server: f01-016-109.gridka.de:1094 load=0; cpu=0
> >> i/o=0 inq=0 mem=0 pag=0 dsk=0 tot=0
> >>
> >> --------
> >>
> >> Cheers,
> >>   Gregory
> >>
> >
>