Caught it this time :) Indeed it crashes through the hdfs plugin ... or even in java itself, blamy. This seems strange: [1219] root@uaf-5 /tmp# rpm -qf /usr/lib64/libjvm.so hadoop-fuse-0.19.1-17.el5 I thought we are using direct hadoop libraries. Cheers, Matevz Core was generated by `/opt/xrootd/bin/x86_64_linux_26/cmsd -b -l /var/log/xrootd/cmsd.log -k 10 50m -'. Program terminated with signal 11, Segmentation fault. #0 0x00002aaaab324566 in ChunkPool::allocate(unsigned long) () from /usr/lib64/libjvm.so (gdb) bt #0 0x00002aaaab324566 in ChunkPool::allocate(unsigned long) () from /usr/lib64/libjvm.so #1 0x00002aaaab323d6a in Chunk::operator new(unsigned long, unsigned long) () from /usr/lib64/libjvm.so #2 0x00002aaaab32404e in Arena::Arena() () from /usr/lib64/libjvm.so #3 0x00002aaaab863ba8 in Thread::Thread() () from /usr/lib64/libjvm.so #4 0x00002aaaab8657ff in JavaThread::JavaThread(bool) () from /usr/lib64/libjvm.so #5 0x00002aaaab5731f2 in attach_current_thread () from /usr/lib64/libjvm.so #6 0x00002aaaaaf1653f in getJNIEnv () from /usr/lib64/libhdfs.so #7 0x00002aaaaaf15c90 in hdfsConnectAsUser () from /usr/lib64/libhdfs.so #8 0x00002aaaaace0b76 in XrdHdfsSys::Stat(char const*, stat*, int) () from /usr/lib64/libXrdHdfs.so #9 0x000000000041e182 in XrdCmsBaseFS::Exists (this=0x689840, Path=0x1ca4e900 "/store/mc/Summer11/DYToEE_M-20_TuneZ2_7TeV-pythia6/GEN-SIM-RECO/PU_S3_START42_V11-v2/0001/86959A4D-707C-E011-9011-003048C692F2.root", fnPos=-131, UpAT=<value optimized out>) at XrdCmsBaseFS.cc:146 #10 0x0000000000430ad5 in XrdCmsNode::do_State (this=0x1ca215f0, Arg=...) at XrdCmsNode.cc:1145 #11 0x0000000000437245 in XrdCmsProtocol::Execute (this=0x1ca21240, Arg=...) at XrdCmsProtocol.cc:195 #12 0x000000000042b90e in XrdCmsJob::DoIt (this=0x1ca20e10) at XrdCmsJob.cc:84 #13 0x000000000045a123 in XrdScheduler::Run (this=0x6a89e0) at XrdScheduler.cc:292 #14 0x000000000045a2a9 in XrdStartWorking (carg=0x1) at XrdScheduler.cc:73 #15 0x000000000046cac7 in XrdSysThread_Xeq (myargs=<value optimized out>) at XrdSysPthread.cc:87 #16 0x000000347180673d in start_thread () from /lib64/libpthread.so.0 #17 0x0000003470cd44bd in clone () from /lib64/libc.so.6 (gdb) info thr 17 Thread 16141 0x000000347180db3b in accept () from /lib64/libpthread.so.0 16 Thread 16142 0x000000347180b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 15 Thread 16143 0x000000347180b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 14 Thread 16144 0x0000003470ccb696 in poll () from /lib64/libc.so.6 13 Thread 16145 0x0000003470cc6050 in __open_nocancel () from /lib64/libc.so.6 12 Thread 16146 0x0000003470cd48a8 in epoll_wait () from /lib64/libc.so.6 11 Thread 16147 0x0000003470cd48a8 in epoll_wait () from /lib64/libc.so.6 10 Thread 16148 0x0000003470cd48a8 in epoll_wait () from /lib64/libc.so.6 9 Thread 16149 0x000000347180d91b in read () from /lib64/libpthread.so.0 8 Thread 16150 0x000000347180d91b in read () from /lib64/libpthread.so.0 7 Thread 16151 0x000000347180cd01 in sem_wait () from /lib64/libpthread.so.0 6 Thread 16152 0x000000347180db3b in accept () from /lib64/libpthread.so.0 5 Thread 16154 0x000000347180cd01 in sem_wait () from /lib64/libpthread.so.0 4 Thread 16177 0x000000347180d91b in read () from /lib64/libpthread.so.0 3 Thread 16178 0x000000347180cd01 in sem_wait () from /lib64/libpthread.so.0 2 Thread 16181 0x000000347180cd01 in sem_wait () from /lib64/libpthread.so.0 * 1 Thread 16180 0x00002aaaab324566 in ChunkPool::allocate(unsigned long) () from /usr/lib64/libjvm.so On 07/11/11 13:24, Andrew Hanushevsky wrote: > Hi Matevz, > > I suspect that this is occurring in one of the plug-ins. This should cause an > abort and a core file (make sure you have not set the core limit to 0) and would > provide a trace-back. Alternatively, you can run this under gdb to get it to > stop when this occurs. Did any code related to the plug-ins change? > > Andy > > -----Original Message----- From: Matevz Tadel > Sent: Monday, July 11, 2011 1:00 PM > To: xrootd-l > Subject: cmsd head disappearing > > Hi, > > I noticed this a couple of times now ... cmsd process (running from trunk on > Centos 5) just disappears, nothing useful in the logs. E.g., here is a log from > cmsd that died within seconds of being started: > http://matevz.web.cern.ch/matevz/tmp/cmsd.log > > Ah, just got one on another machine which wrote some stuff out: > > 110711 12:41:25 2947 Admin_Login initial request: 'login p 15979 port 1094' > 110711 12:41:25 2947 Update FrontEnd Parm1=1 Parm2=1094 > 110711 12:41:25 2947 State: Status changed to active > 110711 12:41:25 2947 Inform xrootd status > 110711 12:41:25 2947 do_Login:: Primary server 15979 logged in; data port is 1094 > 110711 12:41:26 2947 Dispatch manager.0:22@xrootd for state dlen=90 > 110711 12:41:26 2947 manager.0:22@xrootd do_State: > /store/data/Run2010B/Mu/AOD/Apr21ReReco-v1/0003/6E8B9B8A-EC70-E011-B079-E0CB4E4408F7.root > > pure virtual method called > terminate called without an active exception > > Any clues? Can I do something to help trace it down? > > Cheers, > Matevž