Print

Print


On Jul 6, 2011, at 6:05 PM, Andrew Hanushevsky wrote:

> Hi Brian,
> 
> OK, so at the moment there really isn't anything you can do. The message looks more ominous than need be. It just means that there is now an internal queue of requests building up. So,things aren't as responsive as they could be. The message gets repeated every 4K tries of getting a new thread. Would be interesting to see how often the message goes out.
> 

Just once: it set off the nagios alert as tests stopped working, and we took corrective action.

> The longer term solution is to run more than one global redirector and set then up in load balancing mode.
> 

Noted.  Will put on the TODO list.

> An even longer term solution is to not run each select in a separate thread but simply have a fixed pool of threads that execute that code. This is the first time I've seen you over-run the redirector which means you should be getting thousands of requests per second.

Yeah, that sounds about right: from "lsof", there were 500 or so clients.  Each client will make multiple queries at startup.

> Can you send me two summary statistics each separated by about 10 seconds? Use the xrd command to connect to the redirector xrootd and issue "query 1 a".

We fixed the underlying problem; I bet the statistics below are quite boring.

Brian

<statistics tod="1309993779" ver="v20110517-c443225" src="xrootd.unl.edu:1094" tos="1307994744" pgm="xrootd" ins="MAIN" pid="17133"><stats id="info"><host>xrootd.unl.edu</host><port>1094</port><name>MAIN</name></stats><stats id="buff"><reqs>177314</reqs><mem>1901568</mem><buffs>1685</buffs><adj>0</adj></stats><stats id="link"><num>460</num><maxn>0</maxn><tot>177318</tot><in>2968025505</in><out>3941314254</out><ctime>592626354</ctime><tmo>158307</tmo><stall>0</stall><sfps>1957955834</sfps></stats><stats id="poll"><att>459</att><en>20341649</en><ev>20341295</ev><int>0</int></stats><stats id="proc"><usr><s>246</s><u>95380</u></usr><sys><s>456</s><u>928556</u></sys></stats><stats id="xrootd"><num>177300</num><ops><open>20412434</open><rf>2</rf><rd>0</rd><pr>0</pr><wr>0</wr><sync>0</sync><getf>0</getf><putf>0</putf><misc>323027</misc></ops><aio><num>0</num><max>0</max><rej>0</rej></aio></stats><stats id="ofs"><role>meta manager</role><opr>0</opr><opw>0</opw><opp>0</opp><ups>0</ups><han>0</han><rdr>229863</rdr><bx

root://xrootd.unl.edu:1094//>query 1 a
<statistics tod="1309993784" ver="v20110517-c443225" src="xrootd.unl.edu:1094" tos="1307994744" pgm="xrootd" ins="MAIN" pid="17133"><stats id="info"><host>xrootd.unl.edu</host><port>1094</port><name>MAIN</name></stats><stats id="buff"><reqs>177315</reqs><mem>1901568</mem><buffs>1685</buffs><adj>0</adj></stats><stats id="link"><num>461</num><maxn>0</maxn><tot>177319</tot><in>2968025505</in><out>3941314254</out><ctime>592626354</ctime><tmo>158307</tmo><stall>0</stall><sfps>1957955834</sfps></stats><stats id="poll"><att>460</att><en>20341988</en><ev>20341634</ev><int>0</int></stats><stats id="proc"><usr><s>246</s><u>111381</u></usr><sys><s>456</s><u>936556</u></sys></stats><stats id="xrootd"><num>177301</num><ops><open>20412773</open><rf>2</rf><rd>0</rd><pr>0</pr><wr>0</wr><sync>0</sync><getf>0</getf><putf>0</putf><misc>323030</misc></ops><aio><num>0</num><max>0</max><rej>0</rej></aio></stats><stats id="ofs"><role>meta manager</role><opr>0</opr><opw>0</opw><opp>0</opp><ups>0</ups><han>0</han><rdr>229864</rdr><b