Hello,
I have been trying to produce some VubRecoilUser
ntuples. Unfortunately, a very large fraction
of my jobs crashed.
My code is in the analysis-30 test release:
~penguin/vubrecoil/vub30
I did edit VubXlnu.cc a bit to make it keep events
even if there was no best lepton, so that I could
study the breco sample before and after the lepton
requirement. However, the code did compile and link,
and SOME of my jobs ran OK, so I don't think that's the
problem.
For SP-1235 and SP-1237, most of the errors were
exit code 134. This usually means "aborted and core dumped."
I have posted a sample of my core dump messages at:
http://www.slac.stanford.edu/~penguin/cores.html
The most common pre-core-dump message was:
VubXlnu::VubRecoilHelper.cc(256):reco/recoil MC association is mixed
However, this message appears frequently in the log files
for the successful jobs, as well.
For SP-2575, SP-3037, SP-6333, SP-6334, SP-3429, and SP-1005,
the most common error was that the job simply exited without
processing any events. A ROOT file is produced, but it is empty.
Most of the data jobs ran successfully.
My log files are in:
~penguin/vubrecoil/vub30/workdir/log
You can see the results of all the jobs in:
~penguin/vub30/workdir/chklog.txt
which is the output of the chklog script in:
/nfs/farm/babar/AWG11/PID/users/penguin/owl/workdir/chklog
run over my log files.
I tried debugging one of the core dumped jobs,
but as I had removed the actual core files this meant
running the job interactively in gdb, and after
two hours it still hadn't crashed, so I killed it.
Does anyone know why so many of my jobs crashed?
Thanks,
sheila
|