Print

Print


	Agreed & you are right (bad recommendation as we found out
recently).

On 2011/04/13 17:00, Andrew Hanushevsky wrote:
> With all due respect, we strongly urge people to *not* do preemptive
> server restarts. They usually do more harm than good. If there is a leak
> of some sort we will fix it as quickly as possible. Thus far, at our
> testing here at SLAC we have run servers for as long as 6 months with no
> detectable leaks.
> 
> Andy
> 
> -----Original Message----- From: Jerome LAURET
> Sent: Wednesday, April 13, 2011 1:42 PM
> To: xrootd-l
> Subject: Re: leaking file handles
> 
> 
> STAR has been applying preemptive reboot for years for that
> reason (and other reasons of small tiny leaks here and there we have
> observed).  This approach has been near flawless and brought greater
> stability until recently (not sure why in 2011 and not before ... we
> are using Xrootd in production mode since 2007) when we however
> found, with help from Andy, this has caused another problem as reported
> in http://savannah.cern.ch/bugs/?80880 (so beware and take care of a
> similar patch).
> 
> Hope this helps,
> 
> 
> On 2011/04/13 16:06, Aaron van Meerten wrote:
>> Hi xrootd list,
>>
>> I'm running xrootd 3.0.3-pre8 from
>> http://newman.ultralight.org/repos/xrootd/x86_64/ at MWT2.
>>
>> I've recently begun setting up xrootd on our Tier3 cluster, and I've
>> noticed an interesting problem.  Each file I transfer into the xrootd
>> data servers ends up with an open FD by the xrootd process.  This
>> means that eventually even with a ulimit of 32768, we are still
>> running out of available file handles for our process.  It seems to
>> me that every file handle that's ever been opened/copies is staying
>> open even after the xrdcp that's transferring the data has finished
>> and exited successfully.  lsof on the process confirms that the
>> xrootd process is in fact holding open this many files.  This happens
>> on both of the data servers that I'm running, although it doesn't
>> seem to an issue on my redirector (which isn't running a Server-Side
>> Inventory, so it might still happen there if I was running in that
>> mode).
>>
>>
>> My solution so far is to simply restart the xrootd process, which
>> then restarts the timer on this problem.  However, that's clearly not
>> an ideal solution for production.
>>
>> I'm curious if anyone else has experienced this, or if there's a good
>> way to avoid it?
>>
>> Cheers,
>>
>> -Aaron van Meerten MidWest Tier2
> 

-- 
             ,,,,,
            ( o o )
         --m---U---m--
             Jerome