Hello,
At our site (Wisconsin CMS T2), we have more than 64 data servers, so we
configured some supervisors. We added more than currently needed, under
the assumption that this couldn't hurt. However, after a recent upgrade
(and restart) of xrootd, I find that the xrd locateall command now hangs
forever. It gets a response from the central redirector, but then it
connects to one of the supervisors and hangs. The supervisor logs
indicate that it is stalling the client over and over:
XrootdProtocol: stalling client for 10 sec
XrootdProtocol: stalling client for 10 sec
XrootdProtocol: stalling client for 10 sec
By querying the supervisors directly with locateall, I find that some of
them respond and some of them keep stalling. I'm guessing that the ones
that are stalling don't have any data servers associated with them. The
supervisors that do respond to locateall show me that there is a very
uneven distribution of data servers associated with each supervisor: 61,
26, 16, 8. The other five supervisors stall and therefore presumably
have none.
I do not know why this behavior did not show up before the upgrade.
Perhaps it is a feature of the new version? Or perhaps it is an effect
of the startup order?
We are currently running xrootd-server-3.1.1-1.osg.el5.xu.
Any ideas/suggestions? I can easily reduce the number of supervisors,
but I'd like to know if the behavior I am seeing is expected.
Thanks,
--Dan
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
|