Hi Alden,
Looks like the SEGV happens in the root package. Given that this occurs
less than more and that the call stack at the time is quite deep, I
suspect that the thread stack size may be too small. There is a way to
increase that in root but I am not a root expert; so thers should weigh
in.
Andy
On Wed, 22 Sep 2010, Alden Stradling wrote:
> We're trying to track down some errors on our OS X cluster (10.6.4, Mac Pro Early 2009). Most of the runs go great, but we are seeing occasional segfaults, as seen below. I will have a larger assortment of error logs soon, but in the meantime -- does this look familiar to anyone?
>
>
> Thanks,
>
> Alden
>
> ===========================================================
> There was a crash.
> This is the entire stack trace of all threads:
> ===========================================================
>
> Thread 5 (process 807):
> #0 0x00007fff845f0eca in poll ()
> #1 0x000000010574f312 in XrdClientSock::RecvRaw ()
> #2 0x00000001057703b5 in XrdClientPhyConnection::ReadRaw ()
> #3 0x000000010576d5c4 in XrdClientMessage::ReadRaw ()
> #4 0x000000010576fa48 in XrdClientPhyConnection::BuildMessage ()
> #5 0x000000010577012b in SocketReaderThread ()
> #6 0x000000010579bcf6 in XrdSysThread_Xeq ()
> #7 0x00007fff845a7456 in _pthread_start ()
> #8 0x00007fff845a7309 in thread_start ()
>
> Thread 4 (process 807):
> #0 0x00007fff845f0eca in poll ()
> #1 0x000000010574f312 in XrdClientSock::RecvRaw ()
> #2 0x00000001057703b5 in XrdClientPhyConnection::ReadRaw ()
> #3 0x000000010576d5c4 in XrdClientMessage::ReadRaw ()
> #4 0x000000010576fa48 in XrdClientPhyConnection::BuildMessage ()
> #5 0x000000010577012b in SocketReaderThread ()
> #6 0x000000010579bcf6 in XrdSysThread_Xeq ()
> #7 0x00007fff845a7456 in _pthread_start ()
> #8 0x00007fff845a7309 in thread_start ()
>
> Thread 3 (process 807):
> #0 0x00007fff845f0eca in poll ()
> #1 0x000000010574f312 in XrdClientSock::RecvRaw ()
> #2 0x00000001057703b5 in XrdClientPhyConnection::ReadRaw ()
> #3 0x000000010576d5c4 in XrdClientMessage::ReadRaw ()
> #4 0x000000010576fa48 in XrdClientPhyConnection::BuildMessage ()
> #5 0x000000010577012b in SocketReaderThread ()
> #6 0x000000010579bcf6 in XrdSysThread_Xeq ()
> #7 0x00007fff845a7456 in _pthread_start ()
> #8 0x00007fff845a7309 in thread_start ()
>
> Thread 2 (process 807):
> #0 0x00007fff845a8eb6 in __semwait_signal ()
> #1 0x00007fff845a8d45 in nanosleep ()
> #2 0x00007fff845f5b14 in sleep ()
> #3 0x00000001057667ac in GarbageCollectorThread ()
> #4 0x000000010579bcf6 in XrdSysThread_Xeq ()
> #5 0x00007fff845a7456 in _pthread_start ()
> #6 0x00007fff845a7309 in thread_start ()
>
> Thread 1 (process 807):
> #0 0x00007fff845ebc90 in wait4 ()
> #1 0x00007fff8460023e in system ()
> #2 0x000000010110b782 in TUnixSystem::StackTrace ()
> #3 0x000000010110a26a in TUnixSystem::DispatchSignals ()
> #4 <signal handler called>
> #5 0x00000001020950cb in TKey::Create ()
> #6 0x000000010209849d in TKey::TKey ()
> #7 0x0000000102077075 in TDirectoryFile::WriteKeys ()
> #8 0x00000001020768fb in TDirectoryFile::SaveSelf ()
> #9 0x0000000102079337 in TDirectoryFile::Write ()
> #10 0x00000001020815f0 in TFile::Write ()
> #11 0x0000000101293f7e in G__G__Base2_10_0_53 ()
> #12 0x000000010190d9ea in Cint::G__CallFunc::Execute ()
> #13 0x00000001005b38c1 in PyROOT::TIntExecutor::Execute ()
> #14 0x00000001005b9762 in PyROOT::TMethodHolder<PyROOT::TScopeAdapter, PyROOT::TMemberAdapter>::CallSafe ()
> #15 0x00000001005b9916 in PyROOT::TMethodHolder<PyROOT::TScopeAdapter, PyROOT::TMemberAdapter>::Execute ()
> #16 0x00000001005b73f5 in PyROOT::TMethodHolder<PyROOT::TScopeAdapter, PyROOT::TMemberAdapter>::operator() ()
> #17 0x00000001005bdf1b in PyROOT::(anonymous namespace)::mp_call ()
> #18 0x000000010000aff3 in PyObject_Call ()
> #19 0x000000010008a51a in PyEval_EvalFrameEx ()
> #20 0x00000001000892e1 in PyEval_EvalFrameEx ()
> #21 0x00000001000892e1 in PyEval_EvalFrameEx ()
> #22 0x000000010008acce in PyEval_EvalCodeEx ()
> #23 0x000000010008935e in PyEval_EvalFrameEx ()
> #24 0x000000010008acce in PyEval_EvalCodeEx ()
> #25 0x000000010008935e in PyEval_EvalFrameEx ()
> #26 0x00000001000892e1 in PyEval_EvalFrameEx ()
> #27 0x000000010008acce in PyEval_EvalCodeEx ()
> #28 0x000000010008ad61 in PyEval_EvalCode ()
> #29 0x00000001000a265a in Py_CompileString ()
> #30 0x00000001000a2723 in PyRun_FileExFlags ()
> #31 0x0000000100083196 in _PyBuiltin_Init ()
> #32 0x0000000100089187 in PyEval_EvalFrameEx ()
> #33 0x000000010008acce in PyEval_EvalCodeEx ()
> #34 0x000000010008ad61 in PyEval_EvalCode ()
> #35 0x00000001000a265a in Py_CompileString ()
> #36 0x00000001000a2723 in PyRun_FileExFlags ()
> #37 0x00000001000a423d in PyRun_SimpleFileExFlags ()
> #38 0x00000001000b0286 in Py_Main ()
> #39 0x0000000100000e6c in start ()
> ===========================================================
> The lines below might hint at the cause of the crash.
> If they do not help you then please submit a bug report at
> http://root.cern.ch/bugs. Please post the ENTIRE stack trace
> from above as an attachment in addition to anything else
> that might help us fixing this issue.
> ===========================================================
> #5 0x00000001020950cb in TKey::Create ()
> #6 0x000000010209849d in TKey::TKey ()
> #7 0x0000000102077075 in TDirectoryFile::WriteKeys ()
> #8 0x00000001020768fb in TDirectoryFile::SaveSelf ()
> #9 0x0000000102079337 in TDirectoryFile::Write ()
> #10 0x00000001020815f0 in TFile::Write ()
> ===========================================================
>
>
>
> ==> 18632.arnor.cern.ch.afarbin.output <==
> 11800
>
> ==> 18519.arnor.cern.ch.afarbin.output <==
> Traceback (most recent call last):
> File "/Volumes/DataA_1/afarbin/Runs/SPyRoot/trunk/bin/PyRootBatch", line 70, in <module>
> execfile(sys.argv[1])
> File "Do0LeptonAnalysis.py", line 434, in <module>
> RH= RunThisStep(NEvents,SampleNumber)
> File "Do0LeptonAnalysis.py", line 428, in RunThisStep
> RH.Loop(MaxEntries=NEvents, doPickle=True, pickleDir=NFSPath+out_dir)
> File "/Volumes/DataA_1/afarbin/Runs/SPyRoot/trunk/python/RunHandler.py", line 62, in Loop
> res = self.Algo.Loop(newSample, MaxEntries, gd, firstEntry)
> File "/Volumes/DataA_1/afarbin/Runs/SPyRoot/trunk/python/TTreeAlgorithm.py", line 377, in Loop
>
> ==> 18824.arnor.cern.ch.afarbin.output <==
> 4400
>
> ==> 18519.arnor.cern.ch.afarbin.output <==
> if not self.finalize(TheSample, AllEntriesData, GlobalData):
> File "/Volumes/DataA_1/afarbin/Runs/SPyRoot/trunk/python/TTreeAlgorithm.py", line 236, in finalize
> if not Alg.finalize(TheSample, AllEntriesData, GlobalData):
> File "/Volumes/DataA_1/afarbin/Runs/SPyRoot/trunk/python/WriterAlgorithm.py", line 179, in finalize
> self.file.Write()
> TypeError: none of the 2 overloaded methods succeeded. Full details:
> problem in C++; program state has been reset
> problem in C++; program state has been reset
>
>
|