Print

Print


  Hi Andy and Enrica,

On Fri, Feb 04, 2005 at 06:15:14PM -0800, Andrew Hanushevsky wrote:
> My understanding was that the xrootd version being used was imcompatible
> with the particular Linux version being used. Once that problem was
> resolved, things were working fine. Are you sure this isn't the problem
> here?

  We saw strange things like this when Gregory tried to use the RH72/gcc2.95.3
binary on a RHEL3 system. Once he switched to using the RHEL3 binary on RHEL3
the problem disappeared (and we didn't investigate further). I should note,
however, that it appeared even trying to access files with a >single< client.
The fact that that Enrica reports seeing problems only when running
with many clients probably indicates that there really is some memory issue 
that comes up with many clients (250-300 jobs).

> >    More specifically, I'd be also interested to hear about:
> >   - is there a rough formula which allows to compute, given a machine
> >     with N GB of RAM and M GB of swap, the maximum number of connections
> >     which can be tolerated? Apart from performance issues, is it only the
> >     sum of physical memory and swap space which matters?
> It's really onlythe sum of memory and swap that matters. I recommend 16GB of
> swap space in a mindless way (i.e., just give it 16GB no matter what the
> memory size). Yes, you can fine tune it but it's hardly worth it since 16GB
> of disk space is a pretty small impact for the normal sized server.

  I note that the canonical 16GB number (often for machines with 2GB of 
real memory) came from some stupidity/feature of virtual memory management 
in solaris, no? A priori there is no known reason (yet) why linux needs such 
a large ratio of swap/real memory, is that correct? (That said, I agree that 
it isn't a huge amount of swap space for a server these days. Trying to 
increase it would be worthwhile. If that doesn't help, then we know something 
else is probably wrong.)

> >    At CNAF we are planning for next week a massive sparsification of
> > files across our data servers to help reduce the chance of a single
> > machine being hit too hard, yet it would still be nice to improve our
> > xrootd configuration.
> We are fine-tuning some algorithms in xrootd in order to improve memory
> managment. The current developement release looks very hopeful (I'm doing
> the analysis of a memory trace right now). The latest test release has
> significantly reduced the xrootd memory footprint without significantly
> increasing the overhead. I'd say that would be your best bet (once we sign
> off on it).

  We really need to get the xrootd testbed hardware set up so that we
can dream up some real tests instead of deploying the latest development
version in the production system at CNAF.... One of the machines is
a linux server, is that correct? We could easily duplicate this sort
of situation at SLAC (250-300++ clients hitting various versions of xrootd
with various amounts of swap space). While SLAC still has solaris servers, 
most of the rest of the world is more likely to be running linux servers 
these days... 

  Andy, assuming that the swap space and memory issues are dealt with (and
ignoring the random I/O and file descriptor limits), is there anything
else in xrootd itself that you expect will limit the number of clients?

                                   Pete

-------------------------------------------------------------------------
Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 767-4644
Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
-------------------------------------------------------------------------