Follow-up Comment #9, sr #119348 (project xrootd): We know the problem but the proposed solutions seem more like bypasses than real solutions. Doing timeouts at the server-side is a very costly solution. It seems better done via the client as the client is logically closer to the problem than the server. Also, the protocol spec clearly says that the client must be able to handle a closed/failed connection at *any* time. It's up to the client to gracefully recover. Having a ping mechanism does little (though the protocol does indeed have a ping mechanism that we never implemented simply because we didn't see a need for more chatter). Yes, I agree that at times there are badly behaving NATS that drop connections in such a way that they never get immediately closed on the server side. The protocol largely deals with that by having the client kill those connections when it reconnects (I agree that it requires a client reconnect -- but that's what keepalive is there for to put the final nail in the coffin). After reading all the commenta, I'm no longer sure what is being solved here. If the client sees that there is no response it's supposed to retry the operation (newer clients do this in dropped connection cases). If the client returns an error to the root layer then it's up to root to accept that return code, not to merely ignore it (which some versions do). So, the bottom line is that this should be properly solved. A band-aid solution is rarely a solution at all and, from experience, will merely solve someone's problem at the expense of someone else. Andy _______________________________________________________ Reply to this item at: <http://savannah.cern.ch/support/?119348> _______________________________________________ Message sent via/by LCG Savannah http://savannah.cern.ch/