Print

Print


I guess I don't get what keepalive's would solve relative to the client 
other than somewhat faster recovery in the rare case that a server goes 
away. A lot of work for handling a <10% problem. The bigger problem is 
client's going away and the server not being told that this has happened. 
This is particularly bad when the client is a virtual machine as some 
hyervisors handle this correctly and some do not. Firewalls and NAT boxes 
make this even more problematic.

I see the point of enabling keepalive by default. However,as a practical 
measure, this actually is a big change as the one would need to implement 
a way to turn it off (the current implementation simply allows you to 
turn it on); let alone allowing a keepalive time specification.

Additionally, I am not at all convinced that, at scale, it would actually 
solve the problem. Brian are you always running with keepalive on and it 
actually solves all of your vaporozing client issues?

Andy

P.S. I agrre that the keepalive mechanism in TCPwon't cause a scalability 
issue, This is a particular issue with proxies and NAT boxes that can't 
track all of the connections in real time. In this case you may get a 
false indication that the client is dead. As I said, in the xroot world 
that shouldn't matter as the client would simply reconnect.

On Thu, 28 Aug 2014, Brian Bockelman wrote:

> If you're going to enable keepalive in the client -
>
> You might want to think about manually tuning the keepalive timeouts down from the defaults (2 hours).  I recently adjusted it down to around 5 minutes in HTCondor because 2 hours was "too late" to detect the disconnect to recover the jobs.
>
> There's a socket option to do this in Linux (which travels under a different name in Mac OS X... not sure about Solaris).  Again, we've not seen any kernel scalability issues from doing this.
>
> Brian
>
> On Aug 28, 2014, at 5:09 AM, Lukasz Janyst <[log in to unmask]> wrote:
>
>> Hi Brian,
>>
>>    for the server-side, it is Andy's call.
>>
>>    We have seen silent disconnection problems with ALICE sites in the past, this is why I set up the keepalive functionality for sockets in the old client. I will do the same for the new one as well.
>>
>> Chers,
>>   Lukasz
>>
>> On 08/25/2014 02:52 PM, Brian Bockelman wrote:
>>> Hi Lukasz, all,
>>>
>>> Can we enable keepalive by default?  I don't look forward to the task of asking every site for a configuration change.
>>>
>>> At least on the linux platform, we have observed the kernel is able to handle tens-of-thousands of sockets with keepalive enabled; it doesn't appear to be a scalability issue.  There doesn't appear to be any protocol built-in features we could use on the server side (although this doesn't appear to be needed on the client side).
>>>
>>> Brian
>>>
>>> On Aug 25, 2014, at 2:08 AM, Lukasz Janyst <[log in to unmask]> wrote:
>>>
>>>> On 08/22/2014 06:59 PM, Matevz Tadel wrote:
>>>>>> Does the Xrootd server at least enable TCP keepalive?  That'll close
>>>>>> out dead connections after 2 hours.
>>>>>
>>>>> I don't think so ... I see things hanging up to 24 hours easily (when
>>>>> collector decides to give up on the session). Can this timeout be set at
>>>>> socket creation time?
>>>>
>>>>   Typically, this is handled by the TCP stack, but the routers/firewalls on the way often mess things up. To enable the OS keepalive for xrootd sockets you need to ask for it: http://xrootd.org/doc/prod/xrd_config.htm#_Toc310725344
>>>>
>>>> Cheers,
>>>>   Lukasz
>>>>
>>>> ########################################################################
>>>> Use REPLY-ALL to reply to list
>>>>
>>>> To unsubscribe from the XROOTD-DEV list, click the following link:
>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>>
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-DEV list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1