Print

Print


Acutally ,@ljanyst maybe you can provide some insight into these crashes. Together with @simonmichal we managed to come up with a pretty solid explanation of what happens and were able to also reproduce it.

Summary: The problem comes when we enter the retry logic either for a read or write operation. As you can see at this point in the code https://github.com/xrootd/xrootd/blob/master/src/XrdCl/XrdClFileStateHandler.cc#L1464 when the open is retried the OpenHandler object is built ad hoc and notably does not take any user handler. This is understandable as you don't have any longer a user handler for an open operation - since this was successful some time ago and now you are probably called from a StatefulHandler dealing with a failed read or write.

Therefore, this OpenHandler will no longer have a semaphore object which normally is contained in the user handler which in the default case is the SyncResponseHandler i.e. https://github.com/xrootd/xrootd/blob/master/src/XrdCl/XrdClFile.cc#L109. What this means is that if the open recovery takes longer than the StreamTimeout, the call (whatever is currently done: read/write) will return with an error and the XrdClFile object will be deleted but the new OpenHandler remains in the queue. When the response finally arrives from the server the OpenHandler will try to use the XrdClFileStateHandler which was just deleted leading to a seg fault. In the normal scenario the XrdClFile and implicitly the XrdClFileStateHandler can not be deleted until the OpenHandler is called because of the semaphore contained in the SyncRespon! seHandler : https://github.com/xrootd/xrootd/blob/master/src/XrdCl/XrdClMessageUtils.hh#L88

So, given this complex scenario and that we have clearly identified the problem, the question is how could the OpenHandler for the recovery be created so that the XrdClFileState handler is waiting on it to be called just like it happens when the user called XrdClFile::Open?


Reply to this email directly or view it on GitHub.



Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1