A bit more info to throw into the mix.
Because I was going to be away yesterday and not around to restart
services and since we didn't see the problem untill we added the new
servers. On Wendesday evening I stopped the xrootd and olbd services on
the data servers we had not yet started importing data to.#
This reduced the numbver of servers in our cluster from 21 to 10. 44
hours later the problem has not returned. So it does look like it is
some sort of scaling issue.
I'll leave it running like this for the weekend then restart the servers
and turn on debug output on some of the olbds on Monday.
Chris Brew ([log in to unmask]) +44 1235 446326
Particle Physics Department
Rutherford Appleton Laboratory
Chilton, Didcot. Oxfordshire.
OX11 0QX. United Kingdom.