Hi Wen, OK. Does this only happen in SLC5 or does it happen in all kernels. Andy On Fri, 9 Oct 2009, wen guan wrote: > Hi Andrew, > > For newly started XrdCnsd, when doing a xrdcp, the > disappeared "poll" thread will appear again and an shadow will be > created in cns server. But for old Xrdcnsd, when doing xrdcp, no > "poll" thread will be started. no shadow will be created in cns > server. > > Wen > > On Fri, Oct 9, 2009 at 12:58 AM, Andrew Hanushevsky <[log in to unmask]> wrote: >> Hi Wen, >> >> I guess I don't understand the actual problem. When you say 'not respond" >> what do you actually see to come to that conclusion. As for the analysis, >> and stack traces, these are all perfectly normal. The "poll" thread need not >> exist at all times and, in fact, will disappear after a long period of >> inactivity. I will look some more but I need to know what "no response" >> appears like. >> >> Andy >> >> ----- Original Message ----- From: "wen guan" <[log in to unmask]> >> To: <[log in to unmask]> >> Sent: Tuesday, October 06, 2009 12:46 PM >> Subject: XrdCnsd problem in slc5 with version xrootd-20090623 >> >> >>> Hi, >>> >>> The XrdCnsd in xrootd-20090623 failed to create cns frequestly. >>> When starting XrdCnsd, it works fine. But when trying it next day, it >>> will not response. Below is the test result. >>> >>> It seems the "poll" thread cannot be started for old XrdCnsd(started >>> more than one day and there is no active in long time(maybe 780s)). >>> For (*), Process 10816 was an old XrdCnsd(started more than one day) >>> process which didn't response. After killed it, a new process 13665 >>> was created which worked fine. From here we can see process 13665 had >>> a thread "poll"(thread 2) but process 10816 didn't have it. >>> For (**), When there was no files copied to xrootd for 10 minutes, >>> the "poll" thread in Process 13665 disappeared. But when I did a >>> xrdcp to this server. We will saw this "poll" thread again. But for >>> process which has on active for long time, the "poll" thread cannot be >>> restart again. >>> >>> (*) >>> [root@pcuwdata12 ~]# ps -ef|grep xrootd-2 >>> xrootd 10798 1 0 Oct02 ? 00:21:21 >>> /opt/xrootd-20090623/bin//xrootd -b -d -R xrootd -l >>> /tmp/log/xrootd/xrd.log -c /opt/xrootd-20090623/conf/xrdcluster.cfg >>> xrootd 10809 1 0 Oct02 ? 00:00:05 >>> /opt/xrootd-20090623/bin//cmsd -b -d -R xrootd -l >>> /tmp/log/xrootd/cms.log -c /opt/xrootd-20090623/conf/xrdcluster.cfg >>> xrootd 10816 10798 0 Oct02 ? 00:00:00 >>> /opt/xrootd-20090623/bin/XrdCnsd -d -l /tmp/log/xrootd/cnsd.log >>> root://pcuwsun04.cern.ch:1095 >>> root 13452 13340 0 20:59 pts/1 00:00:00 grep xrootd-2 >>> [root@pcuwdata12 ~]# pstack 10816 >>> Thread 7 (Thread 0x410cf940 (LWP 10826)): >>> #0 0x000000370e40ae00 in pthread_cond_timedwait@@GLIBC_2.3.2 () >>> #1 0x0000000000479707 in XrdSysCondVar::Wait () >>> #2 0x000000000047b8e9 in XrdScheduler::TimeSched () >>> #3 0x000000000047a21d in XrdStartTSched () >>> #4 0x0000000000479637 in XrdSysThread_Xeq () >>> #5 0x000000370e4064a7 in start_thread () from /lib64/libpthread.so.0 >>> #6 0x000000370d8d3c2d in clone () from /lib64/libc.so.6 >>> Thread 6 (Thread 0x416ed940 (LWP 10827)): >>> #0 0x000000370e40c9b1 in sem_wait () from /lib64/libpthread.so.0 >>> #1 0x000000000041ffb7 in XrdSysSemaphore::Wait () >>> #2 0x000000000047ae11 in XrdScheduler::Run () >>> #3 0x000000000047a241 in XrdStartWorking () >>> #4 0x0000000000479637 in XrdSysThread_Xeq () >>> #5 0x000000370e4064a7 in start_thread () from /lib64/libpthread.so.0 >>> #6 0x000000370d8d3c2d in clone () from /lib64/libc.so.6 >>> Thread 5 (Thread 0x417ee940 (LWP 10828)): >>> #0 0x000000370e40c9b1 in sem_wait () from /lib64/libpthread.so.0 >>> #1 0x000000000041ffb7 in XrdSysSemaphore::Wait () >>> #2 0x000000000047ae11 in XrdScheduler::Run () >>> #3 0x000000000047a241 in XrdStartWorking () >>> #4 0x0000000000479637 in XrdSysThread_Xeq () >>> #5 0x000000370e4064a7 in start_thread () from /lib64/libpthread.so.0 >>> #6 0x000000370d8d3c2d in clone () from /lib64/libc.so.6 >>> Thread 4 (Thread 0x418ef940 (LWP 10834)): >>> #0 0x000000370e40d5cb in read () from /lib64/libpthread.so.0 >>> #1 0x00000000004736b9 in XrdOucStream::GetLine () >>> #2 0x000000000041eb06 in XrdCnsDaemon::getEvents () >>> #3 0x000000000041dcfc in CnsEvents () >>> #4 0x0000000000479637 in XrdSysThread_Xeq () >>> #5 0x000000370e4064a7 in start_thread () from /lib64/libpthread.so.0 >>> #6 0x000000370d8d3c2d in clone () from /lib64/libc.so.6 >>> Thread 3 (Thread 0x419f0940 (LWP 10835)): >>> #0 0x000000370e40d174 in __lll_lock_wait () from /lib64/libpthread.so.0 >>> #1 0x000000370e408aca in _L_lock_1034 () from /lib64/libpthread.so.0 >>> #2 0x000000370e40898c in pthread_mutex_lock () from >>> /lib64/libpthread.so.0 >>> #3 0x000000000041ff5b in XrdSysMutex::Lock () >>> #4 0x000000000041f6d4 in XrdCnsEvent::Queue () >>> #5 0x000000000041ee43 in XrdCnsDaemon::getEvents () >>> #6 0x000000000041dcfc in CnsEvents () >>> #7 0x0000000000479637 in XrdSysThread_Xeq () >>> #8 0x000000370e4064a7 in start_thread () from /lib64/libpthread.so.0 >>> #9 0x000000370d8d3c2d in clone () from /lib64/libc.so.6 >>> Thread 2 (Thread 0x41af1940 (LWP 11659)): >>> #0 0x000000370d89a0b1 in nanosleep () from /lib64/libc.so.6 >>> #1 0x000000370d899ed4 in sleep () from /lib64/libc.so.6 >>> #2 0x000000000044117a in GarbageCollectorThread () >>> #3 0x000000000045413a in XrdClientThreadDispatcher () >>> #4 0x0000000000479637 in XrdSysThread_Xeq () >>> #5 0x000000370e4064a7 in start_thread () from /lib64/libpthread.so.0 >>> #6 0x000000370d8d3c2d in clone () from /lib64/libc.so.6 >>> Thread 1 (Thread 0x2b53e8698db0 (LWP 10816)): >>> #0 0x000000370e40c9b1 in sem_wait () from /lib64/libpthread.so.0 >>> #1 0x000000000041ffb7 in XrdSysSemaphore::Wait () >>> #2 0x000000000041f8bd in XrdCnsEvent::Remove () >>> #3 0x000000000041ea05 in XrdCnsDaemon::doRequests () >>> #4 0x00000000004202d1 in main () >>> [root@pcuwdata12 ~]# kill 10816 >>> [root@pcuwdata12 ~]# kill 10816 >>> -bash: kill: (10816) - No such process >>> [root@pcuwdata12 ~]# ps -ef|grep xrootd-2 >>> xrootd 10798 1 0 Oct02 ? 00:21:21 >>> /opt/xrootd-20090623/bin//xrootd -b -d -R xrootd -l >>> /tmp/log/xrootd/xrd.log -c /opt/xrootd-20090623/conf/xrdcluster.cfg >>> xrootd 10809 1 0 Oct02 ? 00:00:05 >>> /opt/xrootd-20090623/bin//cmsd -b -d -R xrootd -l >>> /tmp/log/xrootd/cms.log -c /opt/xrootd-20090623/conf/xrdcluster.cfg >>> xrootd 13665 10798 0 21:03 ? 00:00:00 >>> /opt/xrootd-20090623/bin/XrdCnsd -d -l /tmp/log/xrootd/cnsd.log >>> root://pcuwsun04.cern.ch:1095 >>> root 13678 13340 0 21:03 pts/1 00:00:00 grep xrootd-2 >>> [root@pcuwdata12 ~]# pstack 13665 >>> Thread 8 (Thread 0x4074b940 (LWP 13666)): >>> #0 0x000000370e40ae00 in pthread_cond_timedwait@@GLIBC_2.3.2 () >>> #1 0x0000000000479707 in XrdSysCondVar::Wait () >>> #2 0x000000000047b8e9 in XrdScheduler::TimeSched () >>> #3 0x000000000047a21d in XrdStartTSched () >>> #4 0x0000000000479637 in XrdSysThread_Xeq () >>> #5 0x000000370e4064a7 in start_thread () from /lib64/libpthread.so.0 >>> #6 0x000000370d8d3c2d in clone () from /lib64/libc.so.6 >>> Thread 7 (Thread 0x41b0f940 (LWP 13667)): >>> #0 0x000000370e40c9b1 in sem_wait () from /lib64/libpthread.so.0 >>> #1 0x000000000041ffb7 in XrdSysSemaphore::Wait () >>> #2 0x000000000047ae11 in XrdScheduler::Run () >>> #3 0x000000000047a241 in XrdStartWorking () >>> #4 0x0000000000479637 in XrdSysThread_Xeq () >>> #5 0x000000370e4064a7 in start_thread () from /lib64/libpthread.so.0 >>> #6 0x000000370d8d3c2d in clone () from /lib64/libc.so.6 >>> Thread 6 (Thread 0x41d8c940 (LWP 13668)): >>> #0 0x000000370e40c9b1 in sem_wait () from /lib64/libpthread.so.0 >>> #1 0x000000000041ffb7 in XrdSysSemaphore::Wait () >>> #2 0x000000000047ae11 in XrdScheduler::Run () >>> #3 0x000000000047a241 in XrdStartWorking () >>> #4 0x0000000000479637 in XrdSysThread_Xeq () >>> #5 0x000000370e4064a7 in start_thread () from /lib64/libpthread.so.0 >>> #6 0x000000370d8d3c2d in clone () from /lib64/libc.so.6 >>> Thread 5 (Thread 0x41e8d940 (LWP 13669)): >>> #0 0x000000370e40d5cb in read () from /lib64/libpthread.so.0 >>> #1 0x00000000004736b9 in XrdOucStream::GetLine () >>> #2 0x000000000041eb06 in XrdCnsDaemon::getEvents () >>> #3 0x000000000041dcfc in CnsEvents () >>> #4 0x0000000000479637 in XrdSysThread_Xeq () >>> #5 0x000000370e4064a7 in start_thread () from /lib64/libpthread.so.0 >>> #6 0x000000370d8d3c2d in clone () from /lib64/libc.so.6 >>> Thread 4 (Thread 0x41f8e940 (LWP 13670)): >>> #0 0x000000370e40d5cb in read () from /lib64/libpthread.so.0 >>> #1 0x00000000004736b9 in XrdOucStream::GetLine () >>> #2 0x000000000041eb06 in XrdCnsDaemon::getEvents () >>> #3 0x000000000041dcfc in CnsEvents () >>> #4 0x0000000000479637 in XrdSysThread_Xeq () >>> #5 0x000000370e4064a7 in start_thread () from /lib64/libpthread.so.0 >>> #6 0x000000370d8d3c2d in clone () from /lib64/libc.so.6 >>> Thread 3 (Thread 0x4208f940 (LWP 13671)): >>> #0 0x000000370d89a0b1 in nanosleep () from /lib64/libc.so.6 >>> #1 0x000000370d899ed4 in sleep () from /lib64/libc.so.6 >>> #2 0x000000000044117a in GarbageCollectorThread () >>> #3 0x000000000045413a in XrdClientThreadDispatcher () >>> #4 0x0000000000479637 in XrdSysThread_Xeq () >>> #5 0x000000370e4064a7 in start_thread () from /lib64/libpthread.so.0 >>> #6 0x000000370d8d3c2d in clone () from /lib64/libc.so.6 >>> Thread 2 (Thread 0x41279940 (LWP 13672)): >>> #0 0x000000370d8cae46 in poll () from /lib64/libc.so.6 >>> #1 0x000000000046b945 in XrdClientSock::RecvRaw () >>> #2 0x000000000044e08c in XrdClientPhyConnection::ReadRaw () >>> #3 0x000000000044ac3e in XrdClientMessage::ReadRaw () >>> #4 0x000000000044e84c in XrdClientPhyConnection::BuildMessage () >>> #5 0x000000000044b730 in SocketReaderThread () >>> #6 0x000000000045413a in XrdClientThreadDispatcher () >>> #7 0x0000000000479637 in XrdSysThread_Xeq () >>> #8 0x000000370e4064a7 in start_thread () from /lib64/libpthread.so.0 >>> #9 0x000000370d8d3c2d in clone () from /lib64/libc.so.6 >>> Thread 1 (Thread 0x2b06bb478db0 (LWP 13665)): >>> #0 0x000000370e40c9b1 in sem_wait () from /lib64/libpthread.so.0 >>> #1 0x000000000041ffb7 in XrdSysSemaphore::Wait () >>> #2 0x000000000041f8bd in XrdCnsEvent::Remove () >>> #3 0x000000000041ea05 in XrdCnsDaemon::doRequests () >>> #4 0x00000000004202d1 in main () >>> [root@pcuwdata12 ~]# >>> [root@pcuwdata12 ~]# >>> >>> >>> (***) cnsd.log >>> 091006 20:56:10 1828 XrdCnsSched: running underused thread monitor inq=0 >>> 091006 20:56:10 1828 XrdCnsSched: 2 threads; 1 idle >>> 091006 20:56:10 1828 XrdCnsSched: scheduling underused thread monitor >>> in 780 seconds >>> >>> >>> Thanks >>> Wen >>> >> >> >