Print

Print


On Aug 22, 2014, at 10:45 AM, Matevz Tadel <[log in to unmask]> wrote:

> Hi everybody,
> 
> As things scale up there are more and more cases where users' jobs get -9 killed by the batch system (or by the user realizing they did something stupid).
> 
> Servers know nothing about this as xrootd never checks the sockets to see if there's anybody still at the other end. Consequentially, monitoring thinks the file is still open ... the inactivity cut off I have in XrdMon collector is 1 day! Whatever happens, the close time is wildly off.
> 
> At the moment I have like 80% of open files on collector in this state ... close to 10,000 coming just form EOS at FNAL. Grrr, etc.
> 
> Does it make sense to add a configuration option to make servers perform aliveness checks on connected clients?
> 
> I know, client applications should be shutdown properly ...

Hi Matevz,

I'm not sure this explanation completely makes sense.  "kill -9" of a client just kills the process; the TCP socket is closed on the network level by the OS.  So, there should no longer be a valid TCP socket at the server.

However - if there's a stateful firewall in the way (which doesn't apply to FNAL FWIW), it may do all sorts of screwy things to the connection state.

Does the Xrootd server at least enable TCP keepalive?  That'll close out dead connections after 2 hours.

Brian

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1