Print

Print


Hi Lukasz,

The client is always free to send a keep-alive to the server (a.k.a. 
kXR_ping request). The timeout for that can be set very short since the 
response occurs at the top-most layer in the server and turn-around should 
be in the microsecond range + RTT. That will tell you if the connection is 
alive. The server can send a ping to the client as well. But neither of 
these actions will solve many router issues. That's why you need reasonable 
timeouts. For instance (and why server-side pings are not all that useful), 
a server-side ping will do little if the router already broke the 
connection. Yes, the server-side connection *might* get closed but the 
client will never know that.

So, to better address the problem I would proceed as follows:
a) If you find out that a request has not been serviced in the normal 
time-out window,
b) Issue a ping with a much shorter timeout.
c) If you don't get a response at that point, tear down the connection and 
try again.

For optimization purposes, I would keep track of the time between 
reconnections. That should provide a window (eventually) of when you must 
send a ping to keep the dumb router from dropping the connection.

Andy

-----Original Message----- 
From: Lukasz Janyst
Sent: Thursday, March 03, 2011 3:32 AM
To: Fabrizio Furano
Cc: Lukasz Janyst ; [log in to unmask] ; Gerardo Ganis ; Brian Bockelman 
; Andrew Hanushevsky ; [log in to unmask] ; [log in to unmask] ; Dirk 
Duellmann ; [log in to unmask] ; [log in to unmask]
Subject: Re: [sr #119348] Root reports an error while unzipping the buckets 
fetched via xroot

Hi Fabrizio,

   thanks for the info. As I say later in my comment, this would
indeed help, in the sense that the client would get the response
eventually, instead of hanging or crashing, but wouldn't eliminate the
real problem: how to handle misbehaving networks.

Cheers,
   Lukasz

2011/3/3 Fabrizio Furano <[log in to unmask]>:
> Hi Lukasz,
>
>  something is fishy here:
>
>> Of course, on every request timeout I could assume that the connection is
>> just broken even though the socket is in a valid state and reconnect, no
>> problem about that.
>
>  ... in the sense that this is supposed to be the default, normal 
> behavior.
> On every request timeout the connection must be considered broken, and
> completely wiped out. If it does not do it, then this is the issue to
> consider first, imo.
>
>  Fabrizio
>
>