Print

Print


Hi Andy,

We are using Linux (2.6.18-164.2.1.el5).  I will look at rolling out a 
newer version of Xrootd and increasing the FD limit.

In the current configuration the data being written into the system is 
sent to only two dataservers.  The second data server (bigger disks, 
more memory, more cores) is not having the same problem.  Any suspicions 
on why only one data server is getting crushed?

Patrick


Andrew Hanushevsky wrote:
> Yes, I would recommend upgrading to 20100315-1007 as it fixes a couple 
> of issues in this area which would allow sockets for closed connections 
> to remain open far longer than they should be. The issue was very 
> pronounced in Solaris, not as much in Linux (which OS are you using?). 
> In any case, *please* increase the FD hard limit to at least 8-16K (32K 
> would be best).
> 
> Andy
> 
> ----- Original Message ----- From: "Tanya Levshina" <[log in to unmask]>
> To: "'Andrew Hanushevsky'" <[log in to unmask]>; "'Patrick McGuigan'" 
> <[log in to unmask]>; <[log in to unmask]>
> Cc: <[log in to unmask]>
> Sent: Monday, April 12, 2010 3:11 PM
> Subject: RE: Overloaded Xrootd dataserver?
> 
> 
>> Hi,
>>
>> We should add these recommendation to OSG Release Documentation.
>> Patrick, if the increasing the number opened files will not help  and 
>> if the
>> "CLOSE_WAIT" problem has been solved for  xrootd 20100315-1007 
>> release, you
>> can probably upgrade xrootd from ITB cache.
>>
>> Thanks,
>> Tanya
>>
>>
>> -----Original Message-----
>> From: [log in to unmask]
>> [mailto:[log in to unmask]] On Behalf Of Andrew
>> Hanushevsky
>> Sent: Monday, April 12, 2010 4:58 PM
>> To: Patrick McGuigan; [log in to unmask]
>> Cc: [log in to unmask]
>> Subject: Re: Overloaded Xrootd dataserver?
>>
>> Hi Patrick,
>>
>> Please tell me the release you are running. We did put in a CLOSE_WAIT 
>> fix
>> recently. That aside, we always recommed setting the FD limit to as 
>> high as
>> practical for your OS (at least 8K and preferably 16K to 32K). 1K is not
>> recommended and will likely lead to problems regardless of any extant 
>> bugs.
>>
>> Andy
>>
>> ----- Original Message ----- From: "Patrick McGuigan" <[log in to unmask]>
>> To: <[log in to unmask]>
>> Cc: <[log in to unmask]>
>> Sent: Monday, April 12, 2010 2:08 PM
>> Subject: Overloaded Xrootd dataserver?
>>
>>
>>> Hi,
>>>
>>> I am having an issue with one of our data servers and it may be getting
>>> overloaded with requests from clients.
>>>
>>> The symptoms are that the load on the SRM machine will get very large
>>> because threads there are talking through XrootdFS for various 
>>> connections
>>
>>> to the dataserver.  Various activities related to Xrootd will fail (SRM
>>> get's hung, gridftp servers won't send data).
>>>
>>> When logged into the dataserver and running strace on the xrootd 
>>> service I
>>
>>> see that it has a problem in accept() because of too many open files.
>>>
>>> If I do a netstat I see that xrootd is holding a large number of sockets
>>> in a CLOSE_WAIT state.
>>>
>>> I am trying to understand if the problems that I am seeing are 
>>> because the
>>
>>> limits (1024 open FD's) given to xrootd are too small or if the problem
>>> with xrootd is that the service is too overloaded and this is causing
>>> xrootd to hang on to too many sockets.
>>>
>>> Regards,
>>>
>>> Patrick
>>>
>>
>>
>