Hi Elvin, The handler pointer used in lines 420 - 447 is retrieved from the incoming queue using: InQueue::GetHandlerForMessage. This method actually removes the handler from InQueue in a mutex protected section. Stream::Tick calls InQueue::ReportTimeout that searches through the InQueue also in a mutex protected section. So, if the handler pointer is present in the Stream object it's not present in the Queue, therefore any timeout event caused by Stream::Tick will not be reported to the handler currently being processed by the Stream object in the polling thread. I have another idea though that I am currently exploring. CCing xrootd-dev for future (Michal's) reference in case I don't manage to fix it before leaving. Cheers, Lukasz On 03/12/2015 12:19 AM, Elvin Alin Sindrilaru wrote: > > Hi Lukasz, Andy, > > As part of my "apprenticeship" was also to look over the client code, I thought that this might be a good starting point. Although I am not even half way through with understanding that happens here, I noticed something that might be helpful in this situation. > > In XrdClStream.cc around line 420 the method pTransport->MessageReceived is called and inside this function there is a check done to see whether the initial request for this incoming message didn't time out and if so the handler is discarded. > > Now imagine that the check passes but after this call the initial request actually expires and the XrdClStream::Tick method runs - therefore the handler is called on the expired request and it is also destroyed (by a different thread - the one running the ticks). > > Now in the XrdClStream::OnIncoming method the thread can re-add the handler to the pIncomingQueue even if it was previously destroyed as a consequence of the timed out request. Therefore, a crash will happen when the handler is called the second time - for the legitimate message that we just processed. There is not lock to protect the "region" between lines 420 - 447 in XrdClStream.cc and the Tick thread can expire the current request in the meantime. > > Hope it makes sense ... > > Cheers, > Elvin > > > ________________________________________ > From: Andrew Hanushevsky [[log in to unmask]] > Sent: 11 March 2015 21:34 > To: Lukasz Janyst > Cc: Elvin Alin Sindrilaru > Subject: Curious Client Issue > > Hi Lukasz, > > I have seen several instances of client crashes closely preceded by an > operation timeout (be it an operation or a socket or whatever -- it's always > a timeout). The core file appears to indicate that an internal XrdCl > response handler was subsequently invoked on the a deleted XrdClFile object. > I suspect is because the XrdClFile::open() returned a timeout error and so > the object was deleted. However, there apparently is still something out > there that wants to process that object even after the timeout occurs. Have > you see this kind of thing before. Do you have any ideas of where to look > for a smoking gun? > > Andy > > > ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1