I am running the XRootD 3.0.5 release from the yum repo and am seeing
rather severe performance issues. I have a system of 12 active batch
machines with a total of just over 200 batch slots. When the batch
system is full of jobs -- and this happens with a number of different
types of jobs -- I see very poor performance from the xrootd server.
I see load averages of 140 to 200 or more and I see a high percent of
CPU time in wait states (70-80%) with a large number of interrupts
(10-15K), both of these reported by dstat.
The server is a Dell R710 with MD1000 disk arrays attached with SAS
interconnects. There is a 10G network interface and I have measured
over 800 Gb/s total network throughput using xrdcp simultaneously on 10
machines.
There are two namespaces, managed by two xrootd servers. When the main
namespace gets clogged up by batch jobs, the second namespace still
performs well.
Is it expected that an xrootd process would not be able to handle 200
simultaneous data flows? If not, how should I go about debugging this
system?
Thanks.
Paul T. Keener
Department of Physics and Astronomy
University of Pennsylvania
|