Hi Andy; I'm doing many writing tests with xrootd (and my java client) and I realized there's a strange behaviour of the server that I do not fully understand: suppose that I do a brutal ethernet cable unplug during a data recv at the client side (this can simulate a serious kernel crash in which the TCP stack "disappears"). What I see is that the xrootd server doesn't realize the client disconnection... Indeed, the client didn't disconnect at all. But as we know the architecture defines a fault tolerance even for socket read/write timeouts generated by serious cataclysm like this. And my client does exactly that closing the physical connection that timed out and creating a new one. When I plug back the ethernet cable in the computer it seems that xrootd doesn't detect, for a while, that the old physical connection is actually closed (it seems that the TCP closure handshake do not occur anymore...), while the new physical connection succesfully connects to xrootd. Then when the client tries to re-open the file in "UPDATE" mode it receives a "kXR_FileLocked" error. It is right and expected to me, because the missing "disconnection" notification to the server didn't trigger any closure of the remote file. In fact after a "normal" client crash xrootd detects the client disconnection and logs it with the well-known message 041216 17:29:11 20099 XrdLink: anon.0:11@nbgrid01 disconnected after 0:00:02 (link read error) 041216 17:29:11 20099 anon.0:11@nbgrid01 XrootdFile: closing r /data/dorigoa/testfile.tar 041216 17:29:11 20099 anon.0:11@nbgrid01 XrootdFile: closing w /data/dorigoa/testfile_copy.tar.1103214553901 In the case described above a message like this message isn't fired out, and files remain open. Then I did think that I could resolve this by sending an explicit close command (kXR_close); but xrootd refuses to execute the command saying: "close does not refer to an open file" and I'm sure that command is trying to close the right filehandle (I made many cross-check with the client and server log files). Please read the log in the following: 041216 16:04:10 19906 anon.0:11@nbgrid01 XrootdProtocol: 0001 fh=1 write 2000000@2000000 041216 16:04:10 19906 anon.0:11@nbgrid01 XrootdResponse: 0001 sending OK 041216 16:04:15 19906 anon.0:11@nbgrid01 XrootdProtocol: 0001 request timeout; read 0 of 24 bytes 041216 16:04:15 19906 anon.0:11@nbgrid01 XrdPoll: sending poller 0 enable for link 11 041216 16:04:15 19906 XrdPoll: Poller 0 enabled fd 11 entry 1 now at 2 041216 16:04:32 19906 XrdInet: Accepted connection from nbgrid01 041216 16:04:32 19906 XrdSched: running 4@nbgrid01 inq=0 041216 16:04:32 19906 XrdProtocol: matched protocol daemon 041216 16:04:32 19906 ?:14@nbgrid01 XrdPoll: FD 14 attached to poller 1; num=1 041216 16:04:32 19906 ?:14@nbgrid01 XrootdProtocol: 0000 req=3007 dlen=0 041216 16:04:32 19906 anon.0:14@nbgrid01 XrootdResponse: 0000 sending OK 041216 16:04:32 19906 XrootdXeq: User logged in as anon.0:14@nbgrid01 041216 16:04:32 19906 anon.0:14@nbgrid01 XrootdProtocol: 0000 req=3003 dlen=0 041216 16:04:32 19906 anon.0:14@nbgrid01 XrootdResponse: 0000 sending err 3004: close does not refer to an open file 041216 16:04:32 19906 anon.0:14@nbgrid01 XrootdProtocol: 0000 req=3010 dlen=26 041216 16:04:32 19906 anon.0:14@nbgrid01 XrootdProtocol: 0000 open r /data/dorigoa/testfile.tar fh=0 041216 16:04:32 19906 anon.0:14@nbgrid01 XrootdResponse: 0000 sending 4 data bytes; rc=0 041216 16:04:32 19906 anon.0:14@nbgrid01 XrootdProtocol: 0000 req=3013 dlen=0 041216 16:04:32 19906 anon.0:14@nbgrid01 XrootdProtocol: 0000 fh=0 read 0@0 041216 16:04:32 19906 anon.0:14@nbgrid01 XrootdResponse: 0000 sending OK 041216 16:04:32 19906 anon.0:14@nbgrid01 XrootdProtocol: 0001 req=3003 dlen=0 041216 16:04:32 19906 anon.0:14@nbgrid01 XrootdResponse: 0001 sending err 3004: close does not refer to an open file 041216 16:04:32 19906 anon.0:14@nbgrid01 XrootdProtocol: 0001 req=3010 dlen=45 041216 16:04:32 19906 XrootdXeq: Output file /data/dorigoa/testfile_copy.tar.1103209452144 is already opened by 1 writer; open denied. As you can see there's a time gap between 16:04:15 and 16:04:32; in fact at 16:04:15 I unplugged the ethernet cable and plugged back in at 16:04:32. Then you can see xrootd accepts a new connection but do not detect any disconnection; and the file /data/dorigoa/testfile_copy.tar.1103209452144 remains open. Now I think this is not a bug in the code of course, it is something related to the architecture and I would like to hear some comment from you... [ I could do a workaround by remembering old physical connections that timed out and retry to close them before starting any other communication but after a new physical tcp channel succesfully conntected to xrootd (i.e. after I'm sure the ethernet link is UP); but it sounds too much tricky to me... ] thanks and sorry for the long email... ;) Alvise