Print

Print


Hi Erik,

We had problems with CRLs at UCSD, also affecting xrootd, last couple of days 
but it never caused the servers to lock up.

I see you are running 3.3.6 on all servers other than cms-a026.rcac.purdue.edu 
that is running 3.3.1:
http://xrootd.t2.ucsd.edu/dump_cache.jsp?pred=%25%2FCMS%3A%3APurdue%3A%3AXrdReport%2F%25%2Fver&submit=Filter

When did this start? Is it correlated to an upgrade of some sort?

You say servers stop logging anything and the only solution is to restart them 
... does that means the state is unrecoverable? Do you see any process activity 
at all?

The thing that would really help is output of gcore.

Cheers,
Matevz

On 05/01/14 08:38, Erik Gough wrote:
> Hello,
>
> We have been experiencing an issue at Purdue where our redirector and
> servers will stop responding to client requests to read data.  It looks
> like the issue happens during the authentication process.  The xrootd
> process stops logging anything during this time.  The only solution I
> have found is to restart the xrootd process.  After that, things start
> working normally again.
>
> I attached output from strace, netstat, lsof and limits.  Strace shows a
> bunch of read/writes for what looks like lcmaps logging.  Netstat shows
> a ton of connections in a CLOSE_WAIT state, but not to the point were
> the process is going to run out of FDs.
>
> Also I attached two attempts at xrdcp from an unresponsive server.  One
> when getting redirected from our redirector and one when copying
> directly from the server.
>
> Other processes on the same server like gridftp are still authenticating
> with gums properly during this time.
>
> Can you help?  We are facing servers becoming responsive on a daily
> basis.  Please let me know if you need more information.
>
> There is also an OSG ticket on this issue:
> https://ticket.grid.iu.edu/20867
>
> Thanks,
> -Erik
>
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1