LISTSERV 16.5 - XROOTD-DEV Archives

Follow-up Comment #9, sr #119348 (project xrootd):

We know the problem but the proposed solutions seem more like bypasses than
real solutions. Doing timeouts at the server-side is a very costly solution.
It seems better done via the client as the client is logically closer to the
problem than the server.

Also, the protocol spec clearly says that the client must be able to handle a
closed/failed connection at *any* time. It's up to the client to gracefully
recover. Having a ping mechanism does little (though the protocol does indeed
have a ping mechanism that we never implemented simply because we didn't see a
need for more chatter). 

Yes, I agree that at times there are badly behaving NATS that drop
connections in such a way that they never get immediately closed on the
server side. The protocol largely deals with that by having the client kill
those connections when it reconnects (I agree that it requires a client
reconnect -- but that's what keepalive is there for to put the final nail in
the coffin).

After reading all the commenta, I'm no longer sure what is being solved here.
If the client sees that there is no response it's supposed to retry the
operation (newer clients do this in dropped connection cases). If the client
returns an error to the root layer then it's up to root to accept that return
code, not to merely ignore it (which some versions do).

So, the bottom line is that this should be properly solved. A band-aid
solution is rarely a solution at all and, from experience, will merely solve
someone's problem at the expense of someone else.

Andy

    _______________________________________________________

Reply to this item at:

  <http://savannah.cern.ch/support/?119348>

_______________________________________________
  Message sent via/by LCG Savannah
  http://savannah.cern.ch/