And here is another one -- this one ran for 4 hours before crashing, the first one died immediately on start. Core was generated by `/opt/xrootd/bin/x86_64_linux_26/cmsd -b -l /var/log/xrootd/cmsd.log -k 10 50m -'. Program terminated with signal 6, Aborted. #0 0x0000003470c30265 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x0000003470c30265 in raise () from /lib64/libc.so.6 #1 0x0000003470c31d10 in abort () from /lib64/libc.so.6 #2 0x00002aaaab757fd7 in os::abort(bool) () from /usr/lib64/libjvm.so #3 0x00002aaaab8ab05d in VMError::report_and_die() () from /usr/lib64/libjvm.so #4 0x00002aaaab75e655 in JVM_handle_linux_signal () from /usr/lib64/libjvm.so #5 0x00002aaaab75abae in signalHandler(int, siginfo*, void*) () from /usr/lib64/libjvm.so #6 <signal handler called> #7 0x00002aaaab836362 in SystemDictionary::resolve_or_null(symbolHandle, Handle, Handle, Thread*) () from /usr/lib64/libjvm.so #8 0x00002aaaab835fc4 in SystemDictionary::resolve_or_fail(symbolHandle, Handle, Handle, bool, Thread*) () from /usr/lib64/libjvm.so #9 0x00002aaaab836322 in SystemDictionary::resolve_or_fail(symbolHandle, bool, Thread*) () from /usr/lib64/libjvm.so #10 0x00002aaaab864a39 in JavaThread::allocate_threadObj(Handle, char*, bool, Thread*) () from /usr/lib64/libjvm.so #11 0x00002aaaab573322 in attach_current_thread () from /usr/lib64/libjvm.so #12 0x00002aaaaaf1653f in getJNIEnv () from /usr/lib64/libhdfs.so #13 0x00002aaaaaf15c90 in hdfsConnectAsUser () from /usr/lib64/libhdfs.so #14 0x00002aaaaace0b76 in XrdHdfsSys::Stat(char const*, stat*, int) () from /usr/lib64/libXrdHdfs.so #15 0x000000000041e182 in XrdCmsBaseFS::Exists (this=0x689840, Path=0x79d4100 "/store/user/appeltel/HIAllPhysics/Flow_Skim_Run2010_HIAllPhysics-batch1/2337c3cdf19221b3c4af10fbcbd13096/flowskim_1911_2_7Fb.root", fnPos=-129, UpAT=<value optimized out>) at XrdCmsBaseFS.cc:146 #16 0x0000000000430ad5 in XrdCmsNode::do_State (this=0x79c97e0, Arg=...) at XrdCmsNode.cc:1145 #17 0x0000000000437245 in XrdCmsProtocol::Execute (this=0x79c7570, Arg=...) at XrdCmsProtocol.cc:195 #18 0x000000000042b90e in XrdCmsJob::DoIt (this=0x79d3b10) at XrdCmsJob.cc:84 #19 0x000000000045a123 in XrdScheduler::Run (this=0x6a89e0) at XrdScheduler.cc:292 #20 0x000000000045a2a9 in XrdStartWorking (carg=0x0) at XrdScheduler.cc:73 #21 0x000000000046cac7 in XrdSysThread_Xeq (myargs=<value optimized out>) at XrdSysPthread.cc:87 #22 0x000000347180673d in start_thread () from /lib64/libpthread.so.0 #23 0x0000003470cd44bd in clone () from /lib64/libc.so.6 On 07/11/11 13:24, Andrew Hanushevsky wrote: > Hi Matevz, > > I suspect that this is occurring in one of the plug-ins. This should cause an > abort and a core file (make sure you have not set the core limit to 0) and would > provide a trace-back. Alternatively, you can run this under gdb to get it to > stop when this occurs. Did any code related to the plug-ins change? > > Andy > > -----Original Message----- From: Matevz Tadel > Sent: Monday, July 11, 2011 1:00 PM > To: xrootd-l > Subject: cmsd head disappearing > > Hi, > > I noticed this a couple of times now ... cmsd process (running from trunk on > Centos 5) just disappears, nothing useful in the logs. E.g., here is a log from > cmsd that died within seconds of being started: > http://matevz.web.cern.ch/matevz/tmp/cmsd.log > > Ah, just got one on another machine which wrote some stuff out: > > 110711 12:41:25 2947 Admin_Login initial request: 'login p 15979 port 1094' > 110711 12:41:25 2947 Update FrontEnd Parm1=1 Parm2=1094 > 110711 12:41:25 2947 State: Status changed to active > 110711 12:41:25 2947 Inform xrootd status > 110711 12:41:25 2947 do_Login:: Primary server 15979 logged in; data port is 1094 > 110711 12:41:26 2947 Dispatch manager.0:22@xrootd for state dlen=90 > 110711 12:41:26 2947 manager.0:22@xrootd do_State: > /store/data/Run2010B/Mu/AOD/Apr21ReReco-v1/0003/6E8B9B8A-EC70-E011-B079-E0CB4E4408F7.root > > pure virtual method called > terminate called without an active exception > > Any clues? Can I do something to help trace it down? > > Cheers, > Matevž