Print

Print


Hello,

I have been trying to produce some VubRecoilUser 
ntuples.  Unfortunately, a very large fraction 
of my jobs crashed.

My code is in the analysis-30 test release:

~penguin/vubrecoil/vub30

I did edit VubXlnu.cc a bit to make it keep events 
even if there was no best lepton, so that I could 
study the breco sample before and after the lepton 
requirement.  However, the code did compile and link, 
and SOME of my jobs ran OK, so I don't think that's the 
problem.

For SP-1235 and SP-1237, most of the errors were 
exit code 134.  This usually means "aborted and core dumped."
I have posted a sample of my core dump messages at:

http://www.slac.stanford.edu/~penguin/cores.html

The most common pre-core-dump message was:

VubXlnu::VubRecoilHelper.cc(256):reco/recoil MC association is mixed

However, this message appears frequently in the log files 
for the successful jobs, as well.

For SP-2575, SP-3037, SP-6333, SP-6334, SP-3429, and SP-1005, 
the most common error was that the job simply exited without 
processing any events.  A ROOT file is produced, but it is empty.

Most of the data jobs ran successfully.

My log files are in:

~penguin/vubrecoil/vub30/workdir/log

You can see the results of all the jobs in:

~penguin/vub30/workdir/chklog.txt

which is the output of the chklog script in:

/nfs/farm/babar/AWG11/PID/users/penguin/owl/workdir/chklog

run over my log files.

I tried debugging one of the core dumped jobs, 
but as I had removed the actual core files this meant 
running the job interactively in gdb, and after 
two hours it still hadn't crashed, so I killed it.

Does anyone know why so many of my jobs crashed?

Thanks,

sheila