Hi Patrick,
Please do *not* limit the threads via the xrd.sched directive. What was
the reaosn behind doing this? By limiting the threads you make it
difficult if not impossible for the server to handle varying loads.
As for the broken pipe, I assume that xroot.18727:63 is a client. If the
client crashes or simply disconnects the broken pipe message will appear.
I suspect that because the threads were limited, the xrootd server could
not get back to the client in the window that the client was willing to
wait. So, you can see that it disconnected and reconnected thinking there
were server issues. This scenario (many times caused by limiting the
threads) leads into a death spiral where more and more clients diconnect
and then reconnect eating up even more threads making it impossible for
the server to do anything but service reconnections.
Andy
On Fri, 25 Mar 2011, Patrick McGuigan wrote:
> Hi,
>
> At least one of our data servers is running into issues with load.
>
> The data server is started with the
> xrd.sched mint 8 maxt 25 avlt 4
>
> In the xrootd log I am seeing entries like:
>
> 110325 19:11:34 25864 XrootdXeq: xroot.18727:63@gk03 login
> 110325 19:11:38 25864 ofs_stat: xroot.18727:80@gk03 Unable to locate
/xrd/datadisk/step09/ESD/SWT2_CPB_DATADISK; No such file or directory
> 110325 19:11:38 25864 XrdLink: Unable to send to xroot.18727:80@gk03; broken pipe
> 110325 19:11:38 25864 XrootdXeq: xroot.18727:80@gk03 disc 0:01:12 (send failure)
>
> I am wondering what can be/should be done about the broken pipe/send failures. What are these?
>
>
> Since the data server is getting swamped, I am also seeing:
> 110325 19:11:59 25864 XrdScheduler: Thread limit has been reached!
>
> I am interested in learning what is the best way to diagnose the problem to determine what to change on the system.
>
> Regards,
>
> Patrick
>
|