Print

Print


>  My proposal is to split things on different disks, if possible.

in my last email yesterday I've already mentioned that as soon as we have
the new disk available the generic 'good' will be moved...
Alessio

>
>  Thanks a lot,
>
>  Daniele
>
> On Thu, 31 Oct 2002, Yury G. Kolomensky wrote:
>
> > 	Hi Daniele,
> >
> > do you have an example of a log file for these jobs ? I do not know
> > exactly what servers these disks have been installed on, but we
> > noticed in E158, where most of the data were sitting on one
> > (relatively slow) server, jobs were limited by I/O throughput to about
> > 2 MB/sec. This limit comes from the random access pattern that split
> > ROOT trees provide. If your job is sufficiently fast, you can saturate
> > I/O limit quite quickly -- with 2-3 jobs. If you submit too many jobs
> > (tens or even hundreds), the server will thrash to the point that the
> > clients will receive NFS timeouts. ROOT usually does not like that --
> > you may see error messages in the log file about files not found (when
> > the files are actually on disk), or about problems uncompressing
> > branches. These are usually more severe on Linux clients, where the
> > NFS client implementation is not very robust..
> >
> > There are several ways to cope with this problem:
> >
> > 1) Submit fewer jobs at one time. I would not submit more than 10
> >    I/O-limited jobs in parallel.
> > 2) Place your data on different servers. That means, different sulky
> >    servers is best. Even if you are on the same sulky server but split
> >    your data onto different partitions, you still get the benefit of
> >    parallelizing disk access
> > 3) Re-write your jobs to first copy your data onto a local disk on the
> >    batch worker (for instance, /tmp), then run on the local copy, then
> >    delete the local copy. The benefit of that is that the cp command
> >    will access the file in direct-access mode (with 10-20 MB/sec
> >    throughput, depending on the network interface throughput).
> > 4) Make your ntuples non-split (very highly recommended). This usually
> >    increases the throughput by a factor of 10-20. If your typical job
> >    reads most of the branches of the tree, making tree split makes no
> >    sense. Non-split trees provide direct access to disk, which is much
> >    more optimal.
> >
> > 							Yury
> >
> >
> > On Thu, Oct 31, 2002 at 09:26:08AM -0800, Daniele del Re wrote:
> > >
> > > Hi all,
> > >
> > >  in the last two days I tried to run on data and MC on the new disk AWG18.
> > > No way. I got problems in the 80% of the jobs. Someone crashed, most of
> > > them have did not read a large number of root files (actually there).
> > >
> > >  This problem seems to be worse than ever. Do we have to contact
> > > computing people about this?
> > >
> > >  Daniele
> > >
> > >
> >
> >
>
>
>