Print

Print


	Hi Daniele,

did you really run 300 root jobs over the same partition ? This is
nuts, IMHO, unless your jobs are completely CPU-limited. Looking at
your logfile, your job ran for 12 mins, using 92 sec CPU time. In
other words, CPU utilization was ~13%. This is not good. I see that
you ran on a bronco -- so NFS client problems were probably
milder. You probably had to really push the server hard to start
seeing this errors. 

It is clear that the way you run your jobs is not optimal. You run
many jobs, all in long queue, and each uses a couple of minutes of
CPU time, while being I/-limited. You would be much better served
(more efficient, too) by running a few jobs in parallel, each using a
few hours of CPU time. You can get that by either chaining the files,
or writing smarter macros. 

I do not know what the difference between sulky09 (AWG8 disk) and
sulky25 (AWG18) is (Fabrizio could find out). They use somewhat
different diskpacks, I guess -- one partition is 500 GB, and another
is 600 GB, so it is not inconceivable that AWG18 disk is slower. Such
disks usually are better optimized for direct access though, so again,
you will see a much better throughput if you convert your trees to
non-split mode with one (or few) branches, and run fewer parallel
jobs.

								Yury

On Thu, Oct 31, 2002 at 10:23:36AM -0800, Daniele del Re wrote:
> 
> Hi Yury,
> 
>  one example is
> 
>  ~daniele/scra/newchains_1030/data-2
> 
>  and the tipical message is
> 
>  Error in <TFile::TFile>: file /nfs/farm/babar/AWG18/ISL/sx-080702/data/2000/output/outputdir/AlleEvents_2000_on-1095.root does not exist
> 
>  on AWG8 this pathology happened just few times when there were >~300 jobs
> reading the same disk if I remember correctly.
> 
>  Do you know which is the difference between AWG8 and AWG18?
> 
>  My proposal is to split things on different disks, if possible.
> 
>  Thanks a lot,
> 
>  Daniele
> 
> On Thu, 31 Oct 2002, Yury G. Kolomensky wrote:
> 
> > 	Hi Daniele,
> >
> > do you have an example of a log file for these jobs ? I do not know
> > exactly what servers these disks have been installed on, but we
> > noticed in E158, where most of the data were sitting on one
> > (relatively slow) server, jobs were limited by I/O throughput to about
> > 2 MB/sec. This limit comes from the random access pattern that split
> > ROOT trees provide. If your job is sufficiently fast, you can saturate
> > I/O limit quite quickly -- with 2-3 jobs. If you submit too many jobs
> > (tens or even hundreds), the server will thrash to the point that the
> > clients will receive NFS timeouts. ROOT usually does not like that --
> > you may see error messages in the log file about files not found (when
> > the files are actually on disk), or about problems uncompressing
> > branches. These are usually more severe on Linux clients, where the
> > NFS client implementation is not very robust..
> >
> > There are several ways to cope with this problem:
> >
> > 1) Submit fewer jobs at one time. I would not submit more than 10
> >    I/O-limited jobs in parallel.
> > 2) Place your data on different servers. That means, different sulky
> >    servers is best. Even if you are on the same sulky server but split
> >    your data onto different partitions, you still get the benefit of
> >    parallelizing disk access
> > 3) Re-write your jobs to first copy your data onto a local disk on the
> >    batch worker (for instance, /tmp), then run on the local copy, then
> >    delete the local copy. The benefit of that is that the cp command
> >    will access the file in direct-access mode (with 10-20 MB/sec
> >    throughput, depending on the network interface throughput).
> > 4) Make your ntuples non-split (very highly recommended). This usually
> >    increases the throughput by a factor of 10-20. If your typical job
> >    reads most of the branches of the tree, making tree split makes no
> >    sense. Non-split trees provide direct access to disk, which is much
> >    more optimal.
> >
> > 							Yury
> >
> >
> > On Thu, Oct 31, 2002 at 09:26:08AM -0800, Daniele del Re wrote:
> > >
> > > Hi all,
> > >
> > >  in the last two days I tried to run on data and MC on the new disk AWG18.
> > > No way. I got problems in the 80% of the jobs. Someone crashed, most of
> > > them have did not read a large number of root files (actually there).
> > >
> > >  This problem seems to be worse than ever. Do we have to contact
> > > computing people about this?
> > >
> > >  Daniele
> > >
> > >
> >
> >
> 
>