Print

Print


Caught it this time :) Indeed it crashes through the hdfs plugin ... or even in java itself, blamy.

This seems strange:
  [1219] root@uaf-5 /tmp# rpm -qf /usr/lib64/libjvm.so
  hadoop-fuse-0.19.1-17.el5
I thought we are using direct hadoop libraries.

Cheers,
Matevz


Core was generated by `/opt/xrootd/bin/x86_64_linux_26/cmsd -b -l /var/log/xrootd/cmsd.log -k 10 50m -'.
Program terminated with signal 11, Segmentation fault.
#0  0x00002aaaab324566 in ChunkPool::allocate(unsigned long) () from /usr/lib64/libjvm.so
(gdb) bt
#0  0x00002aaaab324566 in ChunkPool::allocate(unsigned long) () from /usr/lib64/libjvm.so
#1  0x00002aaaab323d6a in Chunk::operator new(unsigned long, unsigned long) () from /usr/lib64/libjvm.so
#2  0x00002aaaab32404e in Arena::Arena() () from /usr/lib64/libjvm.so
#3  0x00002aaaab863ba8 in Thread::Thread() () from /usr/lib64/libjvm.so
#4  0x00002aaaab8657ff in JavaThread::JavaThread(bool) () from /usr/lib64/libjvm.so
#5  0x00002aaaab5731f2 in attach_current_thread () from /usr/lib64/libjvm.so
#6  0x00002aaaaaf1653f in getJNIEnv () from /usr/lib64/libhdfs.so
#7  0x00002aaaaaf15c90 in hdfsConnectAsUser () from /usr/lib64/libhdfs.so
#8  0x00002aaaaace0b76 in XrdHdfsSys::Stat(char const*, stat*, int) () from /usr/lib64/libXrdHdfs.so
#9  0x000000000041e182 in XrdCmsBaseFS::Exists (this=0x689840, Path=0x1ca4e900 "/store/mc/Summer11/DYToEE_M-20_TuneZ2_7TeV-pythia6/GEN-SIM-RECO/PU_S3_START42_V11-v2/0001/86959A4D-707C-E011-9011-003048C692F2.root", fnPos=-131, UpAT=<value optimized out>)
    at XrdCmsBaseFS.cc:146
#10 0x0000000000430ad5 in XrdCmsNode::do_State (this=0x1ca215f0, Arg=...) at XrdCmsNode.cc:1145
#11 0x0000000000437245 in XrdCmsProtocol::Execute (this=0x1ca21240, Arg=...) at XrdCmsProtocol.cc:195
#12 0x000000000042b90e in XrdCmsJob::DoIt (this=0x1ca20e10) at XrdCmsJob.cc:84
#13 0x000000000045a123 in XrdScheduler::Run (this=0x6a89e0) at XrdScheduler.cc:292
#14 0x000000000045a2a9 in XrdStartWorking (carg=0x1) at XrdScheduler.cc:73
#15 0x000000000046cac7 in XrdSysThread_Xeq (myargs=<value optimized out>) at XrdSysPthread.cc:87
#16 0x000000347180673d in start_thread () from /lib64/libpthread.so.0
#17 0x0000003470cd44bd in clone () from /lib64/libc.so.6
(gdb) info thr
  17 Thread 16141  0x000000347180db3b in accept () from /lib64/libpthread.so.0
  16 Thread 16142  0x000000347180b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  15 Thread 16143  0x000000347180b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  14 Thread 16144  0x0000003470ccb696 in poll () from /lib64/libc.so.6
  13 Thread 16145  0x0000003470cc6050 in __open_nocancel () from /lib64/libc.so.6
  12 Thread 16146  0x0000003470cd48a8 in epoll_wait () from /lib64/libc.so.6
  11 Thread 16147  0x0000003470cd48a8 in epoll_wait () from /lib64/libc.so.6
  10 Thread 16148  0x0000003470cd48a8 in epoll_wait () from /lib64/libc.so.6
  9 Thread 16149  0x000000347180d91b in read () from /lib64/libpthread.so.0
  8 Thread 16150  0x000000347180d91b in read () from /lib64/libpthread.so.0
  7 Thread 16151  0x000000347180cd01 in sem_wait () from /lib64/libpthread.so.0
  6 Thread 16152  0x000000347180db3b in accept () from /lib64/libpthread.so.0
  5 Thread 16154  0x000000347180cd01 in sem_wait () from /lib64/libpthread.so.0
  4 Thread 16177  0x000000347180d91b in read () from /lib64/libpthread.so.0
  3 Thread 16178  0x000000347180cd01 in sem_wait () from /lib64/libpthread.so.0
  2 Thread 16181  0x000000347180cd01 in sem_wait () from /lib64/libpthread.so.0
* 1 Thread 16180  0x00002aaaab324566 in ChunkPool::allocate(unsigned long) () from /usr/lib64/libjvm.so


On 07/11/11 13:24, Andrew Hanushevsky wrote:
> Hi Matevz,
> 
> I suspect that this is occurring in one of the plug-ins. This should cause an
> abort and a core file (make sure you have not set the core limit to 0) and would
> provide a trace-back. Alternatively,  you can run this under gdb to get it to
> stop when this occurs. Did any code related to the plug-ins change?
> 
> Andy
> 
> -----Original Message----- From: Matevz Tadel
> Sent: Monday, July 11, 2011 1:00 PM
> To: xrootd-l
> Subject: cmsd head disappearing
> 
> Hi,
> 
> I noticed this a couple of times now ... cmsd process (running from trunk on
> Centos 5) just disappears, nothing useful in the logs. E.g., here is a log from
> cmsd that died within seconds of being started:
>  http://matevz.web.cern.ch/matevz/tmp/cmsd.log
> 
> Ah, just got one on another machine which wrote some stuff out:
> 
> 110711 12:41:25 2947 Admin_Login initial request: 'login p 15979 port 1094'
> 110711 12:41:25 2947 Update FrontEnd Parm1=1 Parm2=1094
> 110711 12:41:25 2947 State: Status changed to active
> 110711 12:41:25 2947 Inform xrootd status
> 110711 12:41:25 2947 do_Login:: Primary server 15979 logged in; data port is 1094
> 110711 12:41:26 2947 Dispatch manager.0:22@xrootd for state dlen=90
> 110711 12:41:26 2947 manager.0:22@xrootd do_State:
> /store/data/Run2010B/Mu/AOD/Apr21ReReco-v1/0003/6E8B9B8A-EC70-E011-B079-E0CB4E4408F7.root
> 
> pure virtual method called
> terminate called without an active exception
> 
> Any clues? Can I do something to help trace it down?
> 
> Cheers,
> Matevž