If you're going to enable keepalive in the client -
You might want to think about manually tuning the keepalive timeouts down from the defaults (2 hours). I recently adjusted it down to around 5 minutes in HTCondor because 2 hours was "too late" to detect the disconnect to recover the jobs.
There's a socket option to do this in Linux (which travels under a different name in Mac OS X... not sure about Solaris). Again, we've not seen any kernel scalability issues from doing this.
Brian
On Aug 28, 2014, at 5:09 AM, Lukasz Janyst <[log in to unmask]> wrote:
> Hi Brian,
>
> for the server-side, it is Andy's call.
>
> We have seen silent disconnection problems with ALICE sites in the past, this is why I set up the keepalive functionality for sockets in the old client. I will do the same for the new one as well.
>
> Chers,
> Lukasz
>
> On 08/25/2014 02:52 PM, Brian Bockelman wrote:
>> Hi Lukasz, all,
>>
>> Can we enable keepalive by default? I don't look forward to the task of asking every site for a configuration change.
>>
>> At least on the linux platform, we have observed the kernel is able to handle tens-of-thousands of sockets with keepalive enabled; it doesn't appear to be a scalability issue. There doesn't appear to be any protocol built-in features we could use on the server side (although this doesn't appear to be needed on the client side).
>>
>> Brian
>>
>> On Aug 25, 2014, at 2:08 AM, Lukasz Janyst <[log in to unmask]> wrote:
>>
>>> On 08/22/2014 06:59 PM, Matevz Tadel wrote:
>>>>> Does the Xrootd server at least enable TCP keepalive? That'll close
>>>>> out dead connections after 2 hours.
>>>>
>>>> I don't think so ... I see things hanging up to 24 hours easily (when
>>>> collector decides to give up on the session). Can this timeout be set at
>>>> socket creation time?
>>>
>>> Typically, this is handled by the TCP stack, but the routers/firewalls on the way often mess things up. To enable the OS keepalive for xrootd sockets you need to ask for it: http://xrootd.org/doc/prod/xrd_config.htm#_Toc310725344
>>>
>>> Cheers,
>>> Lukasz
>>>
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the XROOTD-DEV list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
|