Print

Print


Wilko is posting a bug report about the truncation. There shouldn't be any.

Andy

-----Original Message----- 
From: Brian Bockelman
Sent: Wednesday, July 06, 2011 4:45 PM
To: Wilko Kroeger
Cc: Andrew Hanushevsky ; xrootd-dev
Subject: Re: Hitting thread limits?

Good catch Wilko!

root://xrootd.unl.edu:1094//>query 1 s
<statistics tod="1309995851" ver="v20110517-c443225" 
src="xrootd.unl.edu:1094" tos="1307994744" pgm="xrootd" ins="MAIN" 
pid="17133"><stats 
id="sched"><jobs>20657108</jobs><inq>0</inq><maxinq>71</maxinq><threads>69</threads><idle>49</idle><tcr>8708</tcr><tde>8639</tde><tlimr>0</tlimr></stats></statistics>

root://xrootd.unl.edu:1094//>query 1 s
<statistics tod="1309995854" ver="v20110517-c443225" 
src="xrootd.unl.edu:1094" tos="1307994744" pgm="xrootd" ins="MAIN" 
pid="17133"><stats 
id="sched"><jobs>20657306</jobs><inq>0</inq><maxinq>71</maxinq><threads>69</threads><idle>48</idle><tcr>8708</tcr><tde>8639</tde><tlimr>0</tlimr></stats></statistics>

root://xrootd.unl.edu:1094//>query 1 s
<statistics tod="1309995857" ver="v20110517-c443225" 
src="xrootd.unl.edu:1094" tos="1307994744" pgm="xrootd" ins="MAIN" 
pid="17133"><stats 
id="sched"><jobs>20657530</jobs><inq>0</inq><maxinq>71</maxinq><threads>69</threads><idle>58</idle><tcr>8708</tcr><tde>8639</tde><tlimr>0</tlimr></stats></statistics>

Is it possible to get a few more bytes for "1 a"?

Brian

On Jul 6, 2011, at 6:42 PM, Wilko Kroeger wrote:

>
> Hello Andy
>
> xrd shoes only the first 1024 bytes. Running xrd with "query 1 s" should 
> work.
>
> Cheers,
>  Wilko
>
>
> On Wed, 6 Jul 2011, Andrew Hanushevsky wrote:
>
>> Hi Brian,
>>
>> That's weird, what happened to the 'id="sched"' part of the statistics?
>>
>> Andy
>>
>> -----Original Message----- From: Brian Bockelman
>> Sent: Wednesday, July 06, 2011 4:17 PM
>> To: Andrew Hanushevsky
>> Cc: xrootd-dev
>> Subject: Re: Hitting thread limits?
>>
>>
>> On Jul 6, 2011, at 6:05 PM, Andrew Hanushevsky wrote:
>>
>>> Hi Brian,
>>> OK, so at the moment there really isn't anything you can do. The message 
>>> looks more ominous than need be. It just means that there is now an 
>>> internal queue of requests building up. So,things aren't as responsive 
>>> as they could be. The message gets repeated every 4K tries of getting a 
>>> new thread. Would be interesting to see how often the message goes out.
>>
>> Just once: it set off the nagios alert as tests stopped working, and we 
>> took corrective action.
>>
>>> The longer term solution is to run more than one global redirector and 
>>> set then up in load balancing mode.
>>
>> Noted.  Will put on the TODO list.
>>
>>> An even longer term solution is to not run each select in a separate 
>>> thread but simply have a fixed pool of threads that execute that code. 
>>> This is the first time I've seen you over-run the redirector which means 
>>> you should be getting thousands of requests per second.
>>
>> Yeah, that sounds about right: from "lsof", there were 500 or so clients. 
>> Each client will make multiple queries at startup.
>>
>>> Can you send me two summary statistics each separated by about 10 
>>> seconds? Use the xrd command to connect to the redirector xrootd and 
>>> issue "query 1 a".
>>
>> We fixed the underlying problem; I bet the statistics below are quite 
>> boring.
>>
>> Brian
>>
>> <statistics tod="1309993779" ver="v20110517-c443225" 
>> src="xrootd.unl.edu:1094" tos="1307994744" pgm="xrootd" ins="MAIN" 
>> pid="17133"><stats 
>> id="info"><host>xrootd.unl.edu</host><port>1094</port><name>MAIN</name></stats><stats 
>> id="buff"><reqs>177314</reqs><mem>1901568</mem><buffs>1685</buffs><adj>0</adj></stats><stats 
>> id="link"><num>460</num><maxn>0</maxn><tot>177318</tot><in>2968025505</in><out>3941314254</out><ctime>592626354</ctime><tmo>158307</tmo><stall>0</stall><sfps>1957955834</sfps></stats><stats 
>> id="poll"><att>459</att><en>20341649</en><ev>20341295</ev><int>0</int></stats><stats 
>> id="proc"><usr><s>246</s><u>95380</u></usr><sys><s>456</s><u>928556</u></sys></stats><stats 
>> id="xrootd"><num>177300</num><ops><open>20412434</open><rf>2</rf><rd>0</rd><pr>0</pr><wr>0</wr><sync>0</sync><getf>0</getf><putf>0</putf><misc>323027</misc></ops><aio><num>0</num><max>0</max><rej>0</rej></aio></stats><stats 
>> id="ofs"><role>meta 
>> manager</role><opr>0</opr><opw>0</opw><opp>0</opp><ups>0<!
/ups><han>0</han><rdr>229863</rdr><bx
>>
>> root://xrootd.unl.edu:1094//>query 1 a
>> <statistics tod="1309993784" ver="v20110517-c443225" 
>> src="xrootd.unl.edu:1094" tos="1307994744" pgm="xrootd" ins="MAIN" 
>> pid="17133"><stats 
>> id="info"><host>xrootd.unl.edu</host><port>1094</port><name>MAIN</name></stats><stats 
>> id="buff"><reqs>177315</reqs><mem>1901568</mem><buffs>1685</buffs><adj>0</adj></stats><stats 
>> id="link"><num>461</num><maxn>0</maxn><tot>177319</tot><in>2968025505</in><out>3941314254</out><ctime>592626354</ctime><tmo>158307</tmo><stall>0</stall><sfps>1957955834</sfps></stats><stats 
>> id="poll"><att>460</att><en>20341988</en><ev>20341634</ev><int>0</int></stats><stats 
>> id="proc"><usr><s>246</s><u>111381</u></usr><sys><s>456</s><u>936556</u></sys></stats><stats 
>> id="xrootd"><num>177301</num><ops><open>20412773</open><rf>2</rf><rd>0</rd><pr>0</pr><wr>0</wr><sync>0</sync><getf>0</getf><putf>0</putf><misc>323030</misc></ops><aio><num>0</num><max>0</max><rej>0</rej></aio></stats><stats 
>> id="ofs"><role>meta 
>> manager</role><opr>0</opr><opw>0</opw><opp>0</opp><ups>0!
</ups><han>0</han><rdr>229864</rdr><b
>>