Print

Print


Hi Sheila,

I had a quick look at your log files, and noticed again that you are 
running over >130000 events per job. I believe there are just too many 
memory leaks and VubRecoilUserApp can't handle such a number of events...
This is the end of your log/SP-1235-BSemiExcl-R18b-34.log:

Exited with exit code 134.

Resource usage summary:

     CPU time   :   6107.12 sec.
     Max Memory :      1760 MB
     Max Swap   :      2945 MB

Those number are definitively too large.

My suggestion is to reduce the number of events per job to ~70000.

Cheers,

Roberto

> Hello everyone,
> 
> I am still having trouble making ntuples.
> 
> The details are about the same as before:
> 
> Test release: ~penguin/vubrecoil/vub30
> Edits: VubXlnu.cc edited to keep events even if they 
> don't have a best lepton
> 
> Record of which jobs failed/succeeded: workdir/ck3.txt
> CPU times for failed jobs: workdir/cputimes.txt
> 
> Error message for failed jobs is always something 
> very similar to:
> 
> /nfs/farm/lsb_spool/1139883549.791383: line 8:  1222 Aborted
> VubRecoilUserApp VubXlnu.tcl SP-1235-BSemiExcl-R18b-2 MC
> 
> I ran started a gdb session for one of the jobs.  It took a few hours
> before it finally crashed.  You can read what happened in:
> 
> ~penguin/vubrecoi/vub30/workdir/gdb.txt
> 
> What confuses me is that a lot of the jobs ran OK.  
> If it was a problem with the code, I would have thought 
> all the jobs would crash.  I'm convinced it's not CPU time 
> (see workdir/cputimes.txt).
> 
> I am not sure if it could be one of those "memory leaks" 
> because I don't understand them very well.  I will look 
> into that.  But in the meantime, any help/suggestions 
> are welcome.
> 
> Thanks,
> 
> sheila
> 
>