Hi Alden, Looks like the SEGV happens in the root package. Given that this occurs less than more and that the call stack at the time is quite deep, I suspect that the thread stack size may be too small. There is a way to increase that in root but I am not a root expert; so thers should weigh in. Andy On Wed, 22 Sep 2010, Alden Stradling wrote: > We're trying to track down some errors on our OS X cluster (10.6.4, Mac Pro Early 2009). Most of the runs go great, but we are seeing occasional segfaults, as seen below. I will have a larger assortment of error logs soon, but in the meantime -- does this look familiar to anyone? > > > Thanks, > > Alden > > =========================================================== > There was a crash. > This is the entire stack trace of all threads: > =========================================================== > > Thread 5 (process 807): > #0 0x00007fff845f0eca in poll () > #1 0x000000010574f312 in XrdClientSock::RecvRaw () > #2 0x00000001057703b5 in XrdClientPhyConnection::ReadRaw () > #3 0x000000010576d5c4 in XrdClientMessage::ReadRaw () > #4 0x000000010576fa48 in XrdClientPhyConnection::BuildMessage () > #5 0x000000010577012b in SocketReaderThread () > #6 0x000000010579bcf6 in XrdSysThread_Xeq () > #7 0x00007fff845a7456 in _pthread_start () > #8 0x00007fff845a7309 in thread_start () > > Thread 4 (process 807): > #0 0x00007fff845f0eca in poll () > #1 0x000000010574f312 in XrdClientSock::RecvRaw () > #2 0x00000001057703b5 in XrdClientPhyConnection::ReadRaw () > #3 0x000000010576d5c4 in XrdClientMessage::ReadRaw () > #4 0x000000010576fa48 in XrdClientPhyConnection::BuildMessage () > #5 0x000000010577012b in SocketReaderThread () > #6 0x000000010579bcf6 in XrdSysThread_Xeq () > #7 0x00007fff845a7456 in _pthread_start () > #8 0x00007fff845a7309 in thread_start () > > Thread 3 (process 807): > #0 0x00007fff845f0eca in poll () > #1 0x000000010574f312 in XrdClientSock::RecvRaw () > #2 0x00000001057703b5 in XrdClientPhyConnection::ReadRaw () > #3 0x000000010576d5c4 in XrdClientMessage::ReadRaw () > #4 0x000000010576fa48 in XrdClientPhyConnection::BuildMessage () > #5 0x000000010577012b in SocketReaderThread () > #6 0x000000010579bcf6 in XrdSysThread_Xeq () > #7 0x00007fff845a7456 in _pthread_start () > #8 0x00007fff845a7309 in thread_start () > > Thread 2 (process 807): > #0 0x00007fff845a8eb6 in __semwait_signal () > #1 0x00007fff845a8d45 in nanosleep () > #2 0x00007fff845f5b14 in sleep () > #3 0x00000001057667ac in GarbageCollectorThread () > #4 0x000000010579bcf6 in XrdSysThread_Xeq () > #5 0x00007fff845a7456 in _pthread_start () > #6 0x00007fff845a7309 in thread_start () > > Thread 1 (process 807): > #0 0x00007fff845ebc90 in wait4 () > #1 0x00007fff8460023e in system () > #2 0x000000010110b782 in TUnixSystem::StackTrace () > #3 0x000000010110a26a in TUnixSystem::DispatchSignals () > #4 <signal handler called> > #5 0x00000001020950cb in TKey::Create () > #6 0x000000010209849d in TKey::TKey () > #7 0x0000000102077075 in TDirectoryFile::WriteKeys () > #8 0x00000001020768fb in TDirectoryFile::SaveSelf () > #9 0x0000000102079337 in TDirectoryFile::Write () > #10 0x00000001020815f0 in TFile::Write () > #11 0x0000000101293f7e in G__G__Base2_10_0_53 () > #12 0x000000010190d9ea in Cint::G__CallFunc::Execute () > #13 0x00000001005b38c1 in PyROOT::TIntExecutor::Execute () > #14 0x00000001005b9762 in PyROOT::TMethodHolder<PyROOT::TScopeAdapter, PyROOT::TMemberAdapter>::CallSafe () > #15 0x00000001005b9916 in PyROOT::TMethodHolder<PyROOT::TScopeAdapter, PyROOT::TMemberAdapter>::Execute () > #16 0x00000001005b73f5 in PyROOT::TMethodHolder<PyROOT::TScopeAdapter, PyROOT::TMemberAdapter>::operator() () > #17 0x00000001005bdf1b in PyROOT::(anonymous namespace)::mp_call () > #18 0x000000010000aff3 in PyObject_Call () > #19 0x000000010008a51a in PyEval_EvalFrameEx () > #20 0x00000001000892e1 in PyEval_EvalFrameEx () > #21 0x00000001000892e1 in PyEval_EvalFrameEx () > #22 0x000000010008acce in PyEval_EvalCodeEx () > #23 0x000000010008935e in PyEval_EvalFrameEx () > #24 0x000000010008acce in PyEval_EvalCodeEx () > #25 0x000000010008935e in PyEval_EvalFrameEx () > #26 0x00000001000892e1 in PyEval_EvalFrameEx () > #27 0x000000010008acce in PyEval_EvalCodeEx () > #28 0x000000010008ad61 in PyEval_EvalCode () > #29 0x00000001000a265a in Py_CompileString () > #30 0x00000001000a2723 in PyRun_FileExFlags () > #31 0x0000000100083196 in _PyBuiltin_Init () > #32 0x0000000100089187 in PyEval_EvalFrameEx () > #33 0x000000010008acce in PyEval_EvalCodeEx () > #34 0x000000010008ad61 in PyEval_EvalCode () > #35 0x00000001000a265a in Py_CompileString () > #36 0x00000001000a2723 in PyRun_FileExFlags () > #37 0x00000001000a423d in PyRun_SimpleFileExFlags () > #38 0x00000001000b0286 in Py_Main () > #39 0x0000000100000e6c in start () > =========================================================== > The lines below might hint at the cause of the crash. > If they do not help you then please submit a bug report at > http://root.cern.ch/bugs. Please post the ENTIRE stack trace > from above as an attachment in addition to anything else > that might help us fixing this issue. > =========================================================== > #5 0x00000001020950cb in TKey::Create () > #6 0x000000010209849d in TKey::TKey () > #7 0x0000000102077075 in TDirectoryFile::WriteKeys () > #8 0x00000001020768fb in TDirectoryFile::SaveSelf () > #9 0x0000000102079337 in TDirectoryFile::Write () > #10 0x00000001020815f0 in TFile::Write () > =========================================================== > > > > ==> 18632.arnor.cern.ch.afarbin.output <== > 11800 > > ==> 18519.arnor.cern.ch.afarbin.output <== > Traceback (most recent call last): > File "/Volumes/DataA_1/afarbin/Runs/SPyRoot/trunk/bin/PyRootBatch", line 70, in <module> > execfile(sys.argv[1]) > File "Do0LeptonAnalysis.py", line 434, in <module> > RH= RunThisStep(NEvents,SampleNumber) > File "Do0LeptonAnalysis.py", line 428, in RunThisStep > RH.Loop(MaxEntries=NEvents, doPickle=True, pickleDir=NFSPath+out_dir) > File "/Volumes/DataA_1/afarbin/Runs/SPyRoot/trunk/python/RunHandler.py", line 62, in Loop > res = self.Algo.Loop(newSample, MaxEntries, gd, firstEntry) > File "/Volumes/DataA_1/afarbin/Runs/SPyRoot/trunk/python/TTreeAlgorithm.py", line 377, in Loop > > ==> 18824.arnor.cern.ch.afarbin.output <== > 4400 > > ==> 18519.arnor.cern.ch.afarbin.output <== > if not self.finalize(TheSample, AllEntriesData, GlobalData): > File "/Volumes/DataA_1/afarbin/Runs/SPyRoot/trunk/python/TTreeAlgorithm.py", line 236, in finalize > if not Alg.finalize(TheSample, AllEntriesData, GlobalData): > File "/Volumes/DataA_1/afarbin/Runs/SPyRoot/trunk/python/WriterAlgorithm.py", line 179, in finalize > self.file.Write() > TypeError: none of the 2 overloaded methods succeeded. Full details: > problem in C++; program state has been reset > problem in C++; program state has been reset > >