Hi Andy; It seems that opening with FORCE flag ON is the only and the best solution. The 'keepalive' param for xrd.network is not solving the problem. thanks Alvise Andrew Hanushevsky wrote: >Hi Alvise, > > > >>suppose that I do a brutal ethernet cable unplug during a data recv at >>the client side (this can simulate a serious kernel crash in which the >>TCP stack "disappears"). >>What I see is that the xrootd server doesn't realize the client >>disconnection... Indeed, the client didn't disconnect at all. >> >> >This varies by operating system. However, in general, the server-side rarely >recognizes that a connection just "dropped" unless there is actual activity >on the connection. You can try forcing this by specifying the "keepalive" >option in the "xrd.network" directive. However, it may still take a couple >of hours before the connection is actually close (again, determined by the >implementation in the kernel). > >But as we > > >>know the architecture defines a fault tolerance even for socket >>read/write timeouts generated by serious cataclysm like this. And my >>client does exactly that closing the physical connection that timed out >>and creating a new one. When I plug back the ethernet cable in the >>computer it seems that xrootd doesn't detect, for a while, that the old >>physical connection is actually closed (it seems that the TCP closure >>handshake do not occur anymore...), while the new physical connection >>succesfully connects to xrootd. >> >> >Quite correct, that's part of the socket specification. The only way the old >connection will be automatically closed is if you managed to use the *same* >source port number. The sTCP specification clearly states that the >destination side must close the "old" connection when this happens. The >circumstances are pretty rare in practice. > > > >>Then when the client tries to re-open the file in "UPDATE" mode it >>receives a "kXR_FileLocked" error. It is right and expected to me, >> >> >Yes, this is why there is a "force" option on the open to tell xrootd to >ignore the lock. > > > >>Then I did think that I could resolve this by sending an explicit close >>command (kXR_close); but xrootd refuses to execute the command saying: >>"close does not refer to an open file" and I'm sure that command is >>trying to close the right filehandle (I made many cross-check with the >>client and server log files). Please read the log in the following: >> >> >Doesn't matter. xrootd assigns file handles by socket number. So,one socket >can't "steal" a file handle from another socket. > > > >>Now I think this is not a bug in the code of course, it is something >>related to the architecture and I would like to hear some comment from >>you... >> >> >It architecture, alright; but the architecture is determined by the TCP >specification and the socket implementation by the kernel. There is really >very little I can do about that. One could devise special circumstances >where you could manualy check if the connection closed, but right now, there >is no so check. We never put in a reverse "ping" in the protocol. Peraps we >should to avoid these kinds of bizzare end conditions. > > > >>[ I could do a workaround by remembering old physical connections that >>timed out and retry to close them before starting any other >>communication but after a new physical tcp channel succesfully >>conntected to xrootd (i.e. after I'm sure the ethernet link is UP); but >>it sounds too much tricky to me... ] >> >> >Actually, sounds quite impossible in most circumstances. However, you should >*always* close the previous connection that timed out. You can do that at >the time the time out occurs. Not that it would change a lot because the >kernel still won't be able to send the "synclose" request to the server. But >it's cleaner that way from the client's saide. > >Again I think you should try specifying "keepalive" on the xrd.network >directive. > >Andy > >