Print

Print


Hi Alden,

Looks like the SEGV happens in the root package. Given that this occurs 
less than more and that the call stack at the time is quite deep, I 
suspect that the thread stack size may be too small. There is a way to 
increase that in root but I am not a root expert; so thers should weigh 
in.

Andy

On Wed, 22 Sep 2010, Alden Stradling wrote:

> We're trying to track down some errors on our OS X cluster (10.6.4, Mac Pro Early 2009). Most of the runs go great, but we are seeing occasional segfaults, as seen below. I will have a larger assortment of error logs soon, but in the meantime -- does this look familiar to anyone?
>
>
> Thanks,
>
> Alden
>
> ===========================================================
> There was a crash.
> This is the entire stack trace of all threads:
> ===========================================================
>
> Thread 5 (process 807):
> #0  0x00007fff845f0eca in poll ()
> #1  0x000000010574f312 in XrdClientSock::RecvRaw ()
> #2  0x00000001057703b5 in XrdClientPhyConnection::ReadRaw ()
> #3  0x000000010576d5c4 in XrdClientMessage::ReadRaw ()
> #4  0x000000010576fa48 in XrdClientPhyConnection::BuildMessage ()
> #5  0x000000010577012b in SocketReaderThread ()
> #6  0x000000010579bcf6 in XrdSysThread_Xeq ()
> #7  0x00007fff845a7456 in _pthread_start ()
> #8  0x00007fff845a7309 in thread_start ()
>
> Thread 4 (process 807):
> #0  0x00007fff845f0eca in poll ()
> #1  0x000000010574f312 in XrdClientSock::RecvRaw ()
> #2  0x00000001057703b5 in XrdClientPhyConnection::ReadRaw ()
> #3  0x000000010576d5c4 in XrdClientMessage::ReadRaw ()
> #4  0x000000010576fa48 in XrdClientPhyConnection::BuildMessage ()
> #5  0x000000010577012b in SocketReaderThread ()
> #6  0x000000010579bcf6 in XrdSysThread_Xeq ()
> #7  0x00007fff845a7456 in _pthread_start ()
> #8  0x00007fff845a7309 in thread_start ()
>
> Thread 3 (process 807):
> #0  0x00007fff845f0eca in poll ()
> #1  0x000000010574f312 in XrdClientSock::RecvRaw ()
> #2  0x00000001057703b5 in XrdClientPhyConnection::ReadRaw ()
> #3  0x000000010576d5c4 in XrdClientMessage::ReadRaw ()
> #4  0x000000010576fa48 in XrdClientPhyConnection::BuildMessage ()
> #5  0x000000010577012b in SocketReaderThread ()
> #6  0x000000010579bcf6 in XrdSysThread_Xeq ()
> #7  0x00007fff845a7456 in _pthread_start ()
> #8  0x00007fff845a7309 in thread_start ()
>
> Thread 2 (process 807):
> #0  0x00007fff845a8eb6 in __semwait_signal ()
> #1  0x00007fff845a8d45 in nanosleep ()
> #2  0x00007fff845f5b14 in sleep ()
> #3  0x00000001057667ac in GarbageCollectorThread ()
> #4  0x000000010579bcf6 in XrdSysThread_Xeq ()
> #5  0x00007fff845a7456 in _pthread_start ()
> #6  0x00007fff845a7309 in thread_start ()
>
> Thread 1 (process 807):
> #0  0x00007fff845ebc90 in wait4 ()
> #1  0x00007fff8460023e in system ()
> #2  0x000000010110b782 in TUnixSystem::StackTrace ()
> #3  0x000000010110a26a in TUnixSystem::DispatchSignals ()
> #4  <signal handler called>
> #5  0x00000001020950cb in TKey::Create ()
> #6  0x000000010209849d in TKey::TKey ()
> #7  0x0000000102077075 in TDirectoryFile::WriteKeys ()
> #8  0x00000001020768fb in TDirectoryFile::SaveSelf ()
> #9  0x0000000102079337 in TDirectoryFile::Write ()
> #10 0x00000001020815f0 in TFile::Write ()
> #11 0x0000000101293f7e in G__G__Base2_10_0_53 ()
> #12 0x000000010190d9ea in Cint::G__CallFunc::Execute ()
> #13 0x00000001005b38c1 in PyROOT::TIntExecutor::Execute ()
> #14 0x00000001005b9762 in PyROOT::TMethodHolder<PyROOT::TScopeAdapter, PyROOT::TMemberAdapter>::CallSafe ()
> #15 0x00000001005b9916 in PyROOT::TMethodHolder<PyROOT::TScopeAdapter, PyROOT::TMemberAdapter>::Execute ()
> #16 0x00000001005b73f5 in PyROOT::TMethodHolder<PyROOT::TScopeAdapter, PyROOT::TMemberAdapter>::operator() ()
> #17 0x00000001005bdf1b in PyROOT::(anonymous namespace)::mp_call ()
> #18 0x000000010000aff3 in PyObject_Call ()
> #19 0x000000010008a51a in PyEval_EvalFrameEx ()
> #20 0x00000001000892e1 in PyEval_EvalFrameEx ()
> #21 0x00000001000892e1 in PyEval_EvalFrameEx ()
> #22 0x000000010008acce in PyEval_EvalCodeEx ()
> #23 0x000000010008935e in PyEval_EvalFrameEx ()
> #24 0x000000010008acce in PyEval_EvalCodeEx ()
> #25 0x000000010008935e in PyEval_EvalFrameEx ()
> #26 0x00000001000892e1 in PyEval_EvalFrameEx ()
> #27 0x000000010008acce in PyEval_EvalCodeEx ()
> #28 0x000000010008ad61 in PyEval_EvalCode ()
> #29 0x00000001000a265a in Py_CompileString ()
> #30 0x00000001000a2723 in PyRun_FileExFlags ()
> #31 0x0000000100083196 in _PyBuiltin_Init ()
> #32 0x0000000100089187 in PyEval_EvalFrameEx ()
> #33 0x000000010008acce in PyEval_EvalCodeEx ()
> #34 0x000000010008ad61 in PyEval_EvalCode ()
> #35 0x00000001000a265a in Py_CompileString ()
> #36 0x00000001000a2723 in PyRun_FileExFlags ()
> #37 0x00000001000a423d in PyRun_SimpleFileExFlags ()
> #38 0x00000001000b0286 in Py_Main ()
> #39 0x0000000100000e6c in start ()
> ===========================================================
> The lines below might hint at the cause of the crash.
> If they do not help you then please submit a bug report at
> http://root.cern.ch/bugs. Please post the ENTIRE stack trace
> from above as an attachment in addition to anything else
> that might help us fixing this issue.
> ===========================================================
> #5  0x00000001020950cb in TKey::Create ()
> #6  0x000000010209849d in TKey::TKey ()
> #7  0x0000000102077075 in TDirectoryFile::WriteKeys ()
> #8  0x00000001020768fb in TDirectoryFile::SaveSelf ()
> #9  0x0000000102079337 in TDirectoryFile::Write ()
> #10 0x00000001020815f0 in TFile::Write ()
> ===========================================================
>
>
>
> ==> 18632.arnor.cern.ch.afarbin.output <==
> 11800
>
> ==> 18519.arnor.cern.ch.afarbin.output <==
> Traceback (most recent call last):
>  File "/Volumes/DataA_1/afarbin/Runs/SPyRoot/trunk/bin/PyRootBatch", line 70, in <module>
>    execfile(sys.argv[1])
>  File "Do0LeptonAnalysis.py", line 434, in <module>
>    RH= RunThisStep(NEvents,SampleNumber)
>  File "Do0LeptonAnalysis.py", line 428, in RunThisStep
>    RH.Loop(MaxEntries=NEvents, doPickle=True, pickleDir=NFSPath+out_dir)
>  File "/Volumes/DataA_1/afarbin/Runs/SPyRoot/trunk/python/RunHandler.py", line 62, in Loop
>    res = self.Algo.Loop(newSample, MaxEntries, gd, firstEntry)
>  File "/Volumes/DataA_1/afarbin/Runs/SPyRoot/trunk/python/TTreeAlgorithm.py", line 377, in Loop
>
> ==> 18824.arnor.cern.ch.afarbin.output <==
> 4400
>
> ==> 18519.arnor.cern.ch.afarbin.output <==
>    if not self.finalize(TheSample, AllEntriesData, GlobalData):
>  File "/Volumes/DataA_1/afarbin/Runs/SPyRoot/trunk/python/TTreeAlgorithm.py", line 236, in finalize
>    if not Alg.finalize(TheSample, AllEntriesData, GlobalData):
>  File "/Volumes/DataA_1/afarbin/Runs/SPyRoot/trunk/python/WriterAlgorithm.py", line 179, in finalize
>    self.file.Write()
> TypeError: none of the 2 overloaded methods succeeded. Full details:
>  problem in C++; program state has been reset
>  problem in C++; program state has been reset
>
>