Print

Print


  Hi Dan,

  Andy is travelling today and tomorrow, so he'll probably be able to take
a look at this on Monday. Were there any messages in the (redirector) xrootd 
log file? I see the core file is time-stamped "Apr  6 20:01", but the log for 
20050406 isn't in the "logs" area in your afs area.

  One thing occurs to me already: how many clients are now hitting your
redirector? Is it possible that you are hitting the default file descriptor 
limit?

  http://xrootd.slac.stanford.edu/hardware_os_config.html

In that case, you might see some messages in the xrootd logs about problems 
starting new threads, for example.

  BTW, I found the logs in your area: you are doing a reasonable number of 
file opens. (261k on 20050326, for example.) Very nice. The average rate
isn't huge, but presumably there are some peaks as things start in bunches
and whatnot. It also looks like there are about 2k files being opened, so
most of them are probably cached in memory, too. Is it all pileup for MC? At 
least the redirects seem fairly balanced over the servers:

  30589 s5n01.hep.wisc.edu:1094
  32969 s5n03.hep.wisc.edu:1094
  32960 s5n04.hep.wisc.edu:1094
  32938 s5n05.hep.wisc.edu:1094
  32942 s5n06.hep.wisc.edu:1094
  32936 s5n07.hep.wisc.edu:1094
  32948 s5n08.hep.wisc.edu:1094
  32939 s5n09.hep.wisc.edu:1094

  I also see from the 20050405 log file that there were something like 381 
different machines connecting from all over campus (and perhaps some may have 
had more than one application connecting if they are dual-cpu). That is still 
a bit short of 1024 even if I put in the factor of 2, but it depends on what 
else is happening on the machine. You can probably check this from the xrootd 
log and with 'lsof'.

                                   Pete

On Fri, Apr 08, 2005 at 03:46:39PM -0500, Dan Bradley wrote:
> I am getting occasional crashes of xrootd on the redirector.  I am 
> running version 20050328-0656 under Scientific Linux 3.0.4.
> 
> The redirector crashes with the following stack dump:
> 
> #0  0x080845cf in typeinfo name for XrdXrootdPrepare ()
> #1  0x0807266f in XrdProtocol_Select::Process (this=0x80917d8, 
> lp=0x83df5dc)
>    at XrdProtocol.cc:165
> #2  0x0806d622 in XrdLink::DoIt (this=0x83df5dc) at XrdLink.cc:296
> #3  0x080739bc in XrdScheduler::Run (this=0x8091640) at XrdScheduler.cc:293
> #4  0x08072a1c in XrdStartWorking (carg=0x8091640) at XrdScheduler.cc:82
> #5  0x0807f4be in XrdOucThread_Xeq (myargs=0x839eba0) at 
> XrdOucPthread.cc:80
> #6  0x00f4adec in start_thread () from /lib/tls/libpthread.so.0
> #7  0x0032ea2a in clone () from /lib/tls/libc.so.6
> 
> You may find the core file here:
> 
> /afs/hep.wisc.edu/cms/sw/xrootd/debug/core.5250
> 
> The binary is here:
> 
> /afs/hep.wisc.edu/cms/sw/xrootd/bin/xrootd



-------------------------------------------------------------------------
Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 767-4644
Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
-------------------------------------------------------------------------