Hi Sheila,
I had a quick look at your log files, and noticed again that you are
running over >130000 events per job. I believe there are just too many
memory leaks and VubRecoilUserApp can't handle such a number of events...
This is the end of your log/SP-1235-BSemiExcl-R18b-34.log:
Exited with exit code 134.
Resource usage summary:
CPU time : 6107.12 sec.
Max Memory : 1760 MB
Max Swap : 2945 MB
Those number are definitively too large.
My suggestion is to reduce the number of events per job to ~70000.
Cheers,
Roberto
> Hello everyone,
>
> I am still having trouble making ntuples.
>
> The details are about the same as before:
>
> Test release: ~penguin/vubrecoil/vub30
> Edits: VubXlnu.cc edited to keep events even if they
> don't have a best lepton
>
> Record of which jobs failed/succeeded: workdir/ck3.txt
> CPU times for failed jobs: workdir/cputimes.txt
>
> Error message for failed jobs is always something
> very similar to:
>
> /nfs/farm/lsb_spool/1139883549.791383: line 8: 1222 Aborted
> VubRecoilUserApp VubXlnu.tcl SP-1235-BSemiExcl-R18b-2 MC
>
> I ran started a gdb session for one of the jobs. It took a few hours
> before it finally crashed. You can read what happened in:
>
> ~penguin/vubrecoi/vub30/workdir/gdb.txt
>
> What confuses me is that a lot of the jobs ran OK.
> If it was a problem with the code, I would have thought
> all the jobs would crash. I'm convinced it's not CPU time
> (see workdir/cputimes.txt).
>
> I am not sure if it could be one of those "memory leaks"
> because I don't understand them very well. I will look
> into that. But in the meantime, any help/suggestions
> are welcome.
>
> Thanks,
>
> sheila
>
>
|