Hi Fulvio,
Did you try upping the swap to 15GB?
regards,
Stephen.
On Fri, 4 Feb 2005, Fulvio Galeazzi wrote:
>
> Hi,
> may be I got lost with my emails, but I could not find any reply to
> this thread. I think there was some email exchange at some point, but I
> am still missing the whole picture.
>
> So, could somebody please explain how to deal with memory exaustion
> on the data servers?
> More specifically, I'd be also interested to hear about:
> - is there a rough formula which allows to compute, given a machine
> with N GB of RAM and M GB of swap, the maximum number of connections
> which can be tolerated? Apart from performance issues, is it only the
> sum of physical memory and swap space which matters? Can somebody
> post a working configuration used at some site (may be SLAC) to serve
> as a starting point?
> - related to point above... In order to avoid having xrootd go in a
> funny state, would it be possible/advisable to limit the number of
> incoming connections? What will happen once the limit is hit? The
> extra incoming connections will just wait (I guess, sent back to the
> load-balancer) without complaining? Would this cause too much traffic
> around the load-balancer at some point?
>
> At CNAF we are planning for next week a massive sparsification of
> files across our data servers to help reduce the chance of a single
> machine being hit too hard, yet it would still be nice to improve our
> xrootd configuration.
> At CNAF we have ~20 machines each serving between 1 and 1.5 TB of
> BaBar data, and these have never given us troubles. Where we do have
> troubles is with the 2 "diskservers" (now three) serving 30 TB of data
> (8-12 TB each). These diskservers have, as Enrica already noted, 2 GB
> RAM and 4 GB swap each.
>
> Thanks a lot for your help!
>
> Ciao ciao
>
> Fulvio
>
> Enrica Antonioli wrote:
> > Hi all,
> >
> > we found a problem that seems to be a cause of failure of jobs submitted
> > to CNAF farm, similar to the one reported by Gregory Schott, but this
> > case is related to a data server and not to a redirector.
> >
> > We have a disk server with 20 TB (full of collections) and 2GB of
> > memory, 4GB swap.
> > After a submission of jobs that need access to collections stored on
> > this diskserv (~250-300 jobs running), xrootd started to show problems
> > in accessing data, as you can see from the following piece of xrdlog:
> >
> > [...]
> > 050113 18:45:19 1260 XrootdXeq: User logged in as
> > kflood.27447:[log in to unmask]
> > 050113 18:45:48 1260 XrootdXeq: User logged in as
> > kflood.6771:[log in to unmask]
> > 050113 18:46:44 1260 XrootdXeq: User logged in as
> > kflood.26481:[log in to unmask]
> > 050113 18:46:46 1260 XrdScheduler: Unable to create worker thread ;
> > cannot allocate memory
> > 050113 18:46:53 1260 XrdScheduler: Unable to create worker thread ;
> > cannot allocate memory
> > 050113 18:46:53 1260 XrootdXeq: User logged in as
> > kflood.19414:[log in to unmask]
> > 050113 18:47:25 1260 XrdScheduler: Unable to create worker thread ;
> > cannot allocate memory
> > 050113 18:47:33 1260 XrdScheduler: Unable to create worker thread ;
> > cannot allocate memory
> > [...]
> >
> > This message is repeated continuously, and xrootd doesn't answer
> > anymore, the only way to recover is to restart the olbd service.
> >
> > This is what top shows now that everything seems ok (at the moment I'm
> > not able to post the result of top when the problem is present):
> >
> > CPU states: cpu user nice system irq softirq iowait idle
> > total 0.2% 0.0% 1.9% 0.2% 1.3% 25.0% 71.3%
> > cpu00 0.0% 0.0% 2.0% 1.0% 3.0% 15.4% 78.6%
> > cpu01 0.2% 0.0% 1.6% 0.0% 0.6% 35.0% 62.6%
> > cpu02 0.0% 0.0% 2.0% 0.0% 0.2% 17.6% 80.2%
> > cpu03 0.6% 0.0% 2.2% 0.0% 1.4% 32.0% 63.8%
> > Mem: 2061104k av, 2043568k used, 17536k free, 0k shrd, 77072k buff
> > 1549820k actv, 191940k in_d, 30736k in_c
> > Swap: 4096532k av, 0k used, 4096532k free 1683772k cached
> >
> > We read also /var/log/messages, but we didn't find anything related to
> > the moment the problem appeared the first time.
> >
> > Do you have any suggestions-ideas?
> >
> > Cheers,
> > Enrica
> >
> >
>
--
/------------------------------------+-------------------------\
|Stephen J. Gowdy | SLAC, MailStop 34, |
|http://www.slac.stanford.edu/~gowdy/ | 2575 Sand Hill Road, |
|http://calendar.yahoo.com/gowdy | Menlo Park CA 94025, USA |
|EMail: [log in to unmask] | Tel: +1 650 926 3144 |
\------------------------------------+-------------------------/
|