I've been able to reproduce this about 10 times (happens about at a rate of 3-jobs-in-1,000). Unfortunately, it take 24 hours to reproduce, I can't capture a corefile (running on the grid), and it requires me to load up the system enough to cause failures. So, I don't have too much to give you right now besides the linked logs.

A few common characteristics of all the failures:
1) The callback for file X triggers at the redirector when it gets the response to kXR_open for file Y.
2) kXR_open response is labelled as an invalid response (sometimes the response itself, sometimes the corresponding StatInfo). The segfault occurs in the XRootDMsgHandler's destructor.
2) The callback occurs well after the timeout of the XrdCl::File::Open call has expired.
3) The original XrdCl::File::Open encounters a socket error (for example, connection reset by peer). The MsgHandler's callback fires, but the user one does not (I think it retries the same call at the load balancer).


Reply to this email directly or view it on GitHub.



Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1