Print

Print


Hi Patrick,

Please do *not* limit the threads via the xrd.sched directive. What was 
the reaosn behind doing this? By limiting the threads you make it 
difficult if not impossible for the server to handle varying loads.

As for the broken pipe, I assume that xroot.18727:63 is a client. If the 
client crashes or simply disconnects the broken pipe message will appear. 
I suspect that because the threads were limited, the xrootd server could 
not get back to the client in the window that the client was willing to 
wait. So, you can see that it disconnected and reconnected thinking there 
were server issues. This scenario (many times caused by limiting the 
threads) leads into a death spiral where more and more clients diconnect 
and then reconnect eating up even more threads making it impossible for 
the server to do anything but service reconnections.

Andy

On Fri, 25 Mar 2011, Patrick McGuigan wrote:

> Hi,
>
> At least one of our data servers is running into issues with load.
>
> The data server is started with the
> xrd.sched mint 8 maxt 25 avlt 4
>
> In the xrootd log I am seeing entries like:
>
> 110325 19:11:34 25864 XrootdXeq: xroot.18727:63@gk03 login
> 110325 19:11:38 25864 ofs_stat: xroot.18727:80@gk03 Unable to locate 
/xrd/datadisk/step09/ESD/SWT2_CPB_DATADISK; No such file or directory
> 110325 19:11:38 25864 XrdLink: Unable to send to xroot.18727:80@gk03; broken pipe
> 110325 19:11:38 25864 XrootdXeq: xroot.18727:80@gk03 disc 0:01:12 (send failure)
>
> I am wondering what can be/should be done about the broken pipe/send failures.  What are these?
>
>
> Since the data server is getting swamped, I am also seeing:
> 110325 19:11:59 25864 XrdScheduler: Thread limit has been reached!
>
> I am interested in learning what is the best way to diagnose the problem to determine what to change on the system.
>
> Regards,
>
> Patrick
>