Follow-up Comment #10, sr #119348 (project xrootd): There are two issues here: * ROOT doesn't handle the error code properly. To be fixed, no question about it. * The connection is broken by the router in such a way that the client has no clue that something is wrong. The socket is in a valid state, the client writes the request data to the socket, the operation succeeds (the write syscall returns success) and the data is stuck in the OS TCP send queue (because it was never ACKed). The request timeout passes and the client just writes the request to the socket one more time and the write operation succeeds again but again the request data is stuck in the send queue. Of course, on every request timeout I could assume that the connection is just broken even though the socket is in a valid state and reconnect, no problem about that. But, consider the implications for the clients requiring long standing connections having in mind the fact that the default request timeout is in the order of five minutes: every request that is sent some seconds after the previous one would take 5 minutes to complete. Yes, I could make this timeout shorter but that would mean, reconnection (hence reauthentication) every couple of seconds if the particular use case demanded such an access pattern. I don't believe that this is an acceptable solution either. The problem can really be solved by sending the probes over the wire to check that the connection is alive. On Linux you can tweak the TCP stack to do that for you transparently in the way that fits the particular needs of every use case but other operating systems are clearly inferior. So, I think that the question really is: Do we want to support other operating systems as well as we support Linux or not? _______________________________________________________ Reply to this item at: <http://savannah.cern.ch/support/?119348> _______________________________________________ Message sent via/by LCG Savannah http://savannah.cern.ch/