Print

Print


Hello,

At our site (Wisconsin CMS T2), we have more than 64 data servers, so we 
configured some supervisors.  We added more than currently needed, under 
the assumption that this couldn't hurt.  However, after a recent upgrade 
(and restart) of xrootd, I find that the xrd locateall command now hangs 
forever.  It gets a response from the central redirector, but then it 
connects to one of the supervisors and hangs.  The supervisor logs 
indicate that it is stalling the client over and over:

XrootdProtocol: stalling client for 10 sec
XrootdProtocol: stalling client for 10 sec
XrootdProtocol: stalling client for 10 sec

By querying the supervisors directly with locateall, I find that some of 
them respond and some of them keep stalling.  I'm guessing that the ones 
that are stalling don't have any data servers associated with them.  The 
supervisors that do respond to locateall show me that there is a very 
uneven distribution of data servers associated with each supervisor: 61, 
26, 16, 8.  The other five supervisors stall and therefore presumably 
have none.

I do not know why this behavior did not show up before the upgrade.  
Perhaps it is a feature of the new version?  Or perhaps it is an effect 
of the startup order?

We are currently running xrootd-server-3.1.1-1.osg.el5.xu.

Any ideas/suggestions?  I can easily reduce the number of supervisors, 
but I'd like to know if the behavior I am seeing is expected.

Thanks,
--Dan

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1