Print

Print


Hi Dan,

OK, you have uncovered a bug. Indeed, the stalls are comming from 
supervisors that don't have data servers. It's an xrdadmin command problem 
as it's trying to do a deep locate irrespective of file presence. Normal 
stats, opens, etc won't show this behaviour. It's isolated to the locate 
command.

As you guessed, the distribution of servers to supervisors is random (it's 
designed that way) and you happened to get into a situation where one or 
more didn't get any subscribers. This becomes more likely as you increase 
the number of supervisors beyond what is necessary. The immediate solution 
is to reduce the number of supervisors. The real solution is to allow 
supervisors the flexibility to not have any subscribers. So, could you 
post a bug report about this problem so we don't forget to fix it? Thanks 
for bringing this to our attention.

Andy

On Fri, 30 Mar 2012, Dan Bradley wrote:

> Hello,
>
> At our site (Wisconsin CMS T2), we have more than 64 data servers, so we 
> configured some supervisors.  We added more than currently needed, under the 
> assumption that this couldn't hurt.  However, after a recent upgrade (and 
> restart) of xrootd, I find that the xrd locateall command now hangs forever. 
> It gets a response from the central redirector, but then it connects to one 
> of the supervisors and hangs.  The supervisor logs indicate that it is 
> stalling the client over and over:
>
> XrootdProtocol: stalling client for 10 sec
> XrootdProtocol: stalling client for 10 sec
> XrootdProtocol: stalling client for 10 sec
>
> By querying the supervisors directly with locateall, I find that some of them 
> respond and some of them keep stalling.  I'm guessing that the ones that are 
> stalling don't have any data servers associated with them.  The supervisors 
> that do respond to locateall show me that there is a very uneven distribution 
> of data servers associated with each supervisor: 61, 26, 16, 8.  The other 
> five supervisors stall and therefore presumably have none.
>
> I do not know why this behavior did not show up before the upgrade.  Perhaps 
> it is a feature of the new version?  Or perhaps it is an effect of the 
> startup order?
>
> We are currently running xrootd-server-3.1.1-1.osg.el5.xu.
>
> Any ideas/suggestions?  I can easily reduce the number of supervisors, but 
> I'd like to know if the behavior I am seeing is expected.
>
> Thanks,
> --Dan
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1